Skip to content

Conversation

@ningyuan-xie
Copy link
Contributor

@ningyuan-xie ningyuan-xie commented Nov 16, 2025

Contributor: Ningyuan Xie (nxie3@illinois.edu). From CS598 Deep Learning for Healthcare Final Project.

Contribution Type: New Dataset + New Task

Description:

Adds support for the SUPPORT2 (Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments) dataset and a corresponding survival preprocessing task. The SUPPORT2 dataset contains data on seriously ill hospitalized adults, including patient demographics, diagnoses, clinical measurements, and outcomes such as survival and hospital mortality.

Dataset:

  • Dataset class that inherits from BaseDataset
  • YAML configuration file for data loading
  • Test suite with all tests passing
  • Documentation following PyHealth standards
  • Test data file for validation

The dataset is available for download from:

Task:

  • SurvivalPreprocessSupport2 task class that extracts features and labels for survival probability prediction models
  • Supports both 2-month and 6-month survival probability prediction horizons
  • Extracts 6 feature groups: demographics, disease codes, vitals, labs, clinical scores, and comorbidities
  • Comprehensive test suite with task-specific tests
  • Demo script demonstrating the complete workflow from dataset loading to preprocessed samples

The dataset is commonly used for mortality prediction, length of stay prediction, and other clinical outcome prediction tasks. The dataset was originally described in: Knaus WA, Harrell FE, Lynn J, et al. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Ann Intern Med. 1995;122(3):191-203.

Files to Review:

Dataset Files:

  • pyhealth/datasets/support2.py - Main dataset implementation
  • pyhealth/datasets/configs/support2.yaml - Dataset configuration file specifying table structure, patient ID column, and attributes
  • docs/api/datasets/pyhealth.datasets.Support2Dataset.rst - API documentation
  • test-resources/core/support2/support2.csv - Test data file (3 patients)

Task Files:

  • pyhealth/tasks/survival_preprocess_support2.py - Task implementation that preprocesses SUPPORT2 data for survival prediction models
  • pyhealth/tasks/__init__.py - Updated to export the new task
  • examples/survival_preprocess_support2_demo.py - Demo script showing complete workflow
  • tests/core/test_support2.py - Comprehensive test suite including both dataset and task tests

Testing:

All tests pass successfully:

Dataset Tests:

  • Dataset initialization
  • Data loading
  • Patient retrieval
  • Statistics generation
  • Patient count validation

Task Tests:

  • Task initialization with 2-month and 6-month horizons
  • Feature extraction and schema validation
  • Invalid time horizon error handling

Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should ask is this dataset in the test-resources?

@ningyuan-xie
Copy link
Contributor Author

I should ask is this dataset in the test-resources?

Hi John, I just updated the dataset in the test-resources to be a minimal example. Also added link to the full dataset, please check. Thanks!

@jhnwu3
Copy link
Collaborator

jhnwu3 commented Nov 17, 2025

Hey thanks for the quick response! There seems to be a conflict due to one of the upcoming merges. let me know if you can easily fix the merge conflict and recommit

@ningyuan-xie
Copy link
Contributor Author

Hey thanks for the quick response! There seems to be a conflict due to one of the upcoming merges. let me know if you can easily fix the merge conflict and recommit

No problem! I’ve fixed the merge conflict and recommitted the changes.

Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last request, just curious if there's a task that was done with this dataset from earlier? That we could use as an example showcase on how to use this dataset with the tasks_class.

Otherwise, just happy to merge to enable people to keep exploring.

@ningyuan-xie
Copy link
Contributor Author

One last request, just curious if there's a task that was done with this dataset from earlier? That we could use as an example showcase on how to use this dataset with the tasks_class.

Otherwise, just happy to merge to enable people to keep exploring.

Submitted a data preprocessing task, with an example in the demo file to showcase how to use this dataset using this class, please check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you change your example to use the PyHealth task here you wrote? To make it more complete?
  2. Can you also update the docs to point to this example here? Just in case people want to see how to use it? I will probably do another refactor/aggregate in the tutorials/additional_examples.rst here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

  1. Updated example in the task file
  2. Updated docstring in the task file to point to the example demo file
  3. Updated tutorials/additional_examples.rst to also point to the example demo file

Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jhnwu3 jhnwu3 merged commit 4fdeb08 into sunlabuiuc:master Nov 18, 2025
1 check passed
dalloliogm pushed a commit to dalloliogm/PyHealth that referenced this pull request Nov 26, 2025
* Add dataset SUPPORT2

* Update test dataset under test-resources; add link to full dataset

* Add a preprocessing task for dataset SUPPORT2

* Enhance documentation for Survival Preprocess task and add detailed example usage in tutorials
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants