Hi,
The batch corrected data in the training datasets combines all the 2020, 2021 and 2022 datasets. Is there a way to obtain separate 2020, 2021 and 2022 training datasets where each dataset has been batch corrected?
Thanks,
Hi,
The batch corrected data in the training datasets combines all the 2020, 2021 and 2022 datasets. Is there a way to obtain separate 2020, 2021 and 2022 training datasets where each dataset has been batch corrected?
Thanks,
Hi @Mahita_Jarjapu ,
Thank you for your question regarding the availability of batch-corrected datasets for the separate years 2020, 2021, and 2022.
The batch-corrected data combines all datasets from the training years (2020, 2021, and 2022) and the challenge data (2023). This approach follows a standard pipeline to ensure consistency and minimize batch effects across years. Please note that there are various methods for batch effect correction, and we have applied a straightforward approach in this case.
This data can be accessed here. If you require separate datasets for 2020, 2021, and 2022, you may consider splitting the combined batch-corrected dataset into subsets corresponding to each year using subject_id and specimen_id mapping information.
I hope this helps.
Best,
Pramod