Skip to content

Conversation

@mustafa-sadiq
Copy link

Add TCGA-PAAD Dataset (Pancreatic Adenocarcinoma)

  • Introduces TCGA PAAD Dataset for TCGA-PAAD with standardized mutations and clinical tables.
  • Config: tcga_paad.yaml defines two tables: mutations (hugo_symbol, variant_classification, variant_type, hgvsc, hgvsp, tumor_sample_barcode) and clinical (age_at_diagnosis, vital_status, days_to_death, tumor_stage).
  • Usage: from pyhealth.datasets import TCGAPAADDataset
    dataset = TCGAPAADDataset(root="path/to/TCGA-PAAD")
    samples = dataset.set_task() with default CancerSurvivalPrediction()

Testing

python -m pytest tests/core/test_tcga_paad.py -q

PS C:\Users\musta\OneDrive\UIUC MS CS 2024\CS 598 - Deep Learning for Healthcare\Project\PyHealth> python -m pytest tests/core/test_tcga_paad.py -q
....... [100%]
7 passed in 5.29s

@LogicFan LogicFan added the dataset Contribute a new dataset to PyHealth label Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Contribute a new dataset to PyHealth

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants