-
Notifications
You must be signed in to change notification settings - Fork 516
Add dataset SUPPORT2 #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataset SUPPORT2 #614
Conversation
jhnwu3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should ask is this dataset in the test-resources?
Hi John, I just updated the dataset in the test-resources to be a minimal example. Also added link to the full dataset, please check. Thanks! |
|
Hey thanks for the quick response! There seems to be a conflict due to one of the upcoming merges. let me know if you can easily fix the merge conflict and recommit |
No problem! I’ve fixed the merge conflict and recommitted the changes. |
jhnwu3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last request, just curious if there's a task that was done with this dataset from earlier? That we could use as an example showcase on how to use this dataset with the tasks_class.
Otherwise, just happy to merge to enable people to keep exploring.
Submitted a data preprocessing task, with an example in the demo file to showcase how to use this dataset using this class, please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you change your example to use the PyHealth task here you wrote? To make it more complete?
- Can you also update the docs to point to this example here? Just in case people want to see how to use it? I will probably do another refactor/aggregate in the tutorials/additional_examples.rst here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
- Updated example in the task file
- Updated docstring in the task file to point to the example demo file
- Updated tutorials/additional_examples.rst to also point to the example demo file
…xample usage in tutorials
jhnwu3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Add dataset SUPPORT2 * Update test dataset under test-resources; add link to full dataset * Add a preprocessing task for dataset SUPPORT2 * Enhance documentation for Survival Preprocess task and add detailed example usage in tutorials
Contributor: Ningyuan Xie (nxie3@illinois.edu). From CS598 Deep Learning for Healthcare Final Project.
Contribution Type: New Dataset + New Task
Description:
Adds support for the SUPPORT2 (Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments) dataset and a corresponding survival preprocessing task. The SUPPORT2 dataset contains data on seriously ill hospitalized adults, including patient demographics, diagnoses, clinical measurements, and outcomes such as survival and hospital mortality.
Dataset:
BaseDatasetThe dataset is available for download from:
Task:
SurvivalPreprocessSupport2task class that extracts features and labels for survival probability prediction modelsThe dataset is commonly used for mortality prediction, length of stay prediction, and other clinical outcome prediction tasks. The dataset was originally described in: Knaus WA, Harrell FE, Lynn J, et al. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Ann Intern Med. 1995;122(3):191-203.
Files to Review:
Dataset Files:
pyhealth/datasets/support2.py- Main dataset implementationpyhealth/datasets/configs/support2.yaml- Dataset configuration file specifying table structure, patient ID column, and attributesdocs/api/datasets/pyhealth.datasets.Support2Dataset.rst- API documentationtest-resources/core/support2/support2.csv- Test data file (3 patients)Task Files:
pyhealth/tasks/survival_preprocess_support2.py- Task implementation that preprocesses SUPPORT2 data for survival prediction modelspyhealth/tasks/__init__.py- Updated to export the new taskexamples/survival_preprocess_support2_demo.py- Demo script showing complete workflowtests/core/test_support2.py- Comprehensive test suite including both dataset and task testsTesting:
All tests pass successfully:
Dataset Tests:
Task Tests: