Please reply with any questions you had from William’s presentation on 3/22.
user51_CMI-PB_challenge-2_William-Gibson.pdf (674.7 KB)
Please reply with any questions you had from William’s presentation on 3/22.
user51_CMI-PB_challenge-2_William-Gibson.pdf (674.7 KB)
Hi @William-Gibson,
I think your modelling approach is super interesting from multiple perspectives!
I had a couple of clarifying questions you might be able to help me understand the answers to:
For the subject-level RF classifier, you have the full feature set, then you remove the subject, and use the remaining data to train a model that predicts whether the input subject is ranked higher or lower than the held-out subject, is that correct?
What is the metric that you optimize via for in the hyperparameter selection (e.g. accuracy, ROC-AUC, precision)?
Could you explain a little bit more the quorum approach part of your model? Is that taking each of the prediction probabilities * predicted label for each subject in the training data for a given subject to produce the features for the ranking model?
Did you use a ranking objective in the XGBoost algorithm directly e.g. rank:ndcg?
Thanks for sharing your really cool model!
Eve
Hi @erichardson97 , thanks for the question!
Yes you’re correct! I optimized based on accuracy. I did not do any comparisons with ROC-AUC or precision optimizations. I did investigate if certain “individual” models had consistently higher accuracy across different train/test splits. As you can maybe picture, the models of individuals at the extreme ranks had the highest accuracy while those in the middle had lower accuracy. Though there were models in the middle that consistently had high accuracy (in the 0.8-0.9+ range). I also wanted to try this approach after implementing more robust imputation that Basu and team did.
For the quorum I was using the prediction probability (which I’m pretty sure is just the number of votes in the random forest / total votes) and multiplying that to the classifier output (which was either +1 or -1). I played around with different ways to scale the prediction probability and just ended up taking the prediction probability cubed. The overall idea was that a confident above/below rank should count more. So the final output for a given input would be a row of integers between -1 and 1 where each column is an individual model’s output for that input. I tried a few basic approaches, different variations of summing across each row to get a final value and ranking those for all input samples. I wanted to see if XGboost could be used to give better results (my hope was that it could learn the good predictors from the bad predictors). But also I hadn’t used XGboost before so I just wanted to mess with it a bit and see how it worked. It didn’t do a very good job in my submitted model but with some tweaks it performed better.
I didn’t know about the ranking objective! So now I want to go back and try that as well! I’m betting it will do a much better job
One of the cool things I was able to do was to look at individual models that keyed off features that didn’t really pop up in other models (so like CCL3 and CD4 T cells in top 2-3 features). From this subset, I looked at what they were predicting well and what they weren’t predicting well and saw that there was a consistent pattern, certain samples they’d get right consistently but others were around 50/50. I really want to dive in more and understand why but haven’t had the time. I felt a little validated though as there was a subset for monocyte fold change that really keyed off CCL3 and a subset that didn’t care about CCL3 at all (according to feature weights). Overall though, for monocytes the feature weights looked pretty inconsistent across individuals. I haven’t done any formal analysis outside of a few basic plots though would like to see if there are clusters based on feature weights and to see if that overlaps with consistently high accuracy prediction.
Hope all is well Eve, take care and all the best
William
Hey William,
Thanks for your detailed response! The more I hear the cooler the model is and w.r.t the ranking objective function for XGBoost, I found it significantly worse than RMSE, which I attributed to the loss of information imposed by using rank directly - I wonder if it would perform better given your featurization.
That sounds really interesting! Hopefully we will learn more about which features cropped up in other models also. Interested in what methods there are for calculating these sorts of quorum feature importances!
Thanks and congrats again - hope all well with you too!
Eve