Hello! I was hoping to split up the data similarly as done in the train/test TSV based on the year samples were collected. Is this possible with the API endpoints?
For instance, If I wanted to grab the training data from the plasma_ab_titer, I could do something like:
Thank you for your inquiry about splitting data directly through the API. We are currently reviewing the capabilities of our API to ensure it can support this functionality effectively. We will get back to you soon with more information.
Also, it is important to note that API resource embedding queries are intended for use with smaller tables. However, if you use them with large tables, such as RNASeq data, they are likely to fail. This is because transferring large data through a browser can be challenging.
Right now, splitting the subject tables by year is easy given that they exist as table columns. I think that’s good enough since we can use the values of the following query to split the data.
subject_endpoint = f'{base_url}/subject?dataset=eq.{year}_dataset'
response = requests.get(subject_endpoint, headers=base_headers)
data = StringIO(response.text)
Essentially, we can find what specimens/subjects belong to what year, and use those values to split everything nicely for us on our end using Python.
Wow great! I got away with writing something to do that on our end since I wasn’t sure when you folks would get to that! here’s something I wrote up that attempted to do something similar on our end:
If this is baked in directly on your end, I would prefer to rely on that from you folks
Thanks for your support! My team was curious when the non-day zero values for 2022 would be released. I figured that’s part of the next prediction challenge, but we just wanted to verify that! Thanks again for your responses!
@jgarcia longitudinal data for 2022 were already made available. You can access it via API ( /v4_1/). Please let me know if you come across any issues. Also, let me know if you are looking into specific datatype/assay?