Linking reference code to updated training data

@pramod, I’m trying to follow the ‘Load dataset and setup experiments’ code chunk here: RPubs - LASSO models for the input dataset from the .RDS files in Index of /downloads/cmipb_challenge_datasets/current/2nd_challenge/processed_datasets/.

The issue is that there seems to be a mismatch in the column headings between the test dataset that you input (master_normalized_data_challenge2_train.RDS), and the RDS files in the link above.

When I change the column names to reflect the names in the available processed RDS file, I start to get additional errors. For example,

data.frame(df_source[[“abtiter_wide”]])
Error in (function (…, row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 729, 625, 27

Would you mind updating the reference code to work with the updated processed data?

1 Like

Hello,

Thank you for pointing that out. That was the old version of data file. I have updated the notebook. Please use
“master_processed_training_data.RDS” file, available here.

1 Like

Thank you! The code works now. I do have a follow-up question. In the same codechunk, ‘Load dataset and setup experiments’, I have found that this line

df_pivot <- data.frame(df_pivot %>%
   mutate(across(everything(),  ~ case_when(.x >=0 ~ .x))))

updates any negative values to NA. This will then lead to problems with model building and FC calculations. Since these values are not NA in the original processed training data, it doesn’t make sense (to me) to label them as NA here (or change them otherwise). I would also note that the FC function should potentially account for zero or negative values in the numerator or denominator (and likely whatever it ends up being, should be posted in the tasks so that participants are performing the same sort of calculation).

Hello,

Thanks for pointing that out. The former code was developed on normalized data that we had only one sample with negative values and decided to remove it by setting it at Na.

The new processed data seem to be a batch corrected data with combat. Unfortunately combat introduces negative values to the data. I’ll discuss different possible approaches with Pramod and will update you.

1 Like