UPDATE: Normalized Ab titer datasets are now available

Ab titer data normalization was performed using median at day 0 as a normalization factor for CMI-PB data. Normalization was separately performed on 2020 and 2021 datasets.

Steps:

  1. Set missing values to the limit of detection
  2. Divide MFI values for a given Ig isotype and antigen pair by the overall median in a given dataset for the day 0 values

Code for reconstruction:

Updated Ab titer datasets are now available at
https://www.cmi-pb.org/downloads/cmipb_challenge_datasets/2021_cmipb_challenge/

Also, please note that the table structure has been changed. Now, column name ab_titer has been changed to MFI. In addition to that, the MFI_normalized column has been appended, containing median normalized MFI values for 2020 longitudinal and 2021 baseline datasets.

Feel free to reply here for any queries.

Thanks,
Pramod

Slides: https://docs.google.com/presentation/d/1u6Sfpb6VETzYzGHD5mbf017zRopc8jz77kzBe3vPsso/edit

1 Like

Hi Pramod, Thanks for this update. I started working with the data and I can definitely find the updated Ab titers but I do have a question about the CyTOF table. For some reason I can’t really convert from the Olink ID’s to the Protein names. I’m directly pulling the olink_prot_info table from https://www.cmi-pb.org:443/api/v2/olink_prot_info and for some reason these Olink ID’s don’t overlap with Olink ID’s from the main Olink tables (which I downloaded from the FTP link). Do you know why this would be? Thanks!

Joaquin

Hi @joreyna, I see the issue. There is some problem with API. Suddenly, it has rollbacked to some previous update, that’s why you don’t see the proper mapping, specifically for 2020 LD olink Ids. We are working on this. Herewith I am attaching olink_prot_info table.
olink_prot_info_manual_downloades.csv (6.4 KB)

In addition to that, I have simplified olink_prot_exp files and replaced olink Ids with Uniprot ids.

Thanks,
Pramod

Also, olink_prot_info is fixed now: https://www.cmi-pb.org/api/v2/olink_prot_info.

Thanks,
Pramod

@pramod - I’m concerned that the data continues to change, since I will need to restart my pipeline from the beginning upon every update. Can we consider a data freeze as soon as possible - especially if we plan to get a paper out with some preliminary results by end of summer? (looping in @joreyna)

Thanks, @akonst2, for raising this point. I thought of freezing data once you and @joreyna find data in the correct format and usable for predictions. Looks like I can freeze data now. So, I went ahead and created named/dated directory under the ‘2021_cmipb_challenge’ directory. I will create a new dated directory here if there is any change in datasets. Thanks.

1 Like

Perfect, thanks @Pramod!

1 Like

@Pramod, just to confirm, there was no titer normalization done for titers taken past day 0? If this is the case, we are training models on ‘raw’ 2020 titer values that will have different distributions from the raw 2021 titer values.

@akonst3 We perform a similar median (day 0) based normalization on both 2020 and 2021 ab titers. The normalized 2020 ab titer data is provided for training models. The baseline (day 0) values are provided for predictions from the normalized 2021 ab titers whereas longitudinal data (days 1,3,7,14) is held back.

I can see the difference between MFI and MFI_normalized in the 2020 and 2021 ab titer TSV files, but within “2022BD_plasma_ab_titer.tsv” (found at this link) both columns have the same values. Was the MFI accidentally normalized?