Separate batch corrected data for 2020,2021 and 2022

Mahita_Jarjapu · November 14, 2024, 12:29am

Hi,

The batch corrected data in the training datasets combines all the 2020, 2021 and 2022 datasets. Is there a way to obtain separate 2020, 2021 and 2022 training datasets where each dataset has been batch corrected?

Thanks,

Pramod · November 14, 2024, 12:43am

Hi @Mahita_Jarjapu ,

Thank you for your question regarding the availability of batch-corrected datasets for the separate years 2020, 2021, and 2022.

The batch-corrected data combines all datasets from the training years (2020, 2021, and 2022) and the challenge data (2023). This approach follows a standard pipeline to ensure consistency and minimize batch effects across years. Please note that there are various methods for batch effect correction, and we have applied a straightforward approach in this case.

This data can be accessed here. If you require separate datasets for 2020, 2021, and 2022, you may consider splitting the combined batch-corrected dataset into subsets corresponding to each year using subject_id and specimen_id mapping information.

I hope this helps.

Best,
Pramod