Add COVID Red Dataset #714
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add COVID-RED Dataset, Detection/Prediction Tasks, and Example
Summary
This PR adds support for the COVID-RED (Remote Early Detection of SARS-CoV-2 infections) dataset to PyHealth, including:
COVIDREDDataset)covidred_detection_fn,covidred_prediction_fn)covidred_example.py)This provides a clinically relevant wearable device dataset for PyHealth users and supports reproducible research in early infectious disease detection using consumer wearables.
Feature
1. COVIDREDDataset
split="train" | "test" | "all"window_daysparameter2. Task Functions
covidred_detection_fnMaps dataset samples into PyHealth task format for COVID-19 detection:
{ "patient_id": str, "visit_id": str, "signal": Tensor(n_features × window_days), "label": int(0 or 1), "metadata": dict }covidred_prediction_fnMaps dataset samples for early COVID-19 prediction (pre-symptomatic detection):
covidred_multiclass_fn(optional extension)Extends to multiclass severity classification:
3. Example Script
Dataset Details
Dataset: COVID-RED - Remote Early Detection of SARS-CoV-2 infections
Source: Utrecht University, Netherlands
DOI: 10.34894/FW9PO7
URL: https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/FW9PO7
Data characteristics:
Clinical significance:
Tests
Basic verification performed:
Note on Dataset Download
The COVID-RED dataset must be manually downloaded from DataverseNL.
Users must:
heart_rate.csv- Daily resting heart rate measurementssteps.csv- Daily step countssleep.csv- Daily sleep duration and efficiencylabels.csv- COVID-19 test results and symptom dates/data/covidred/)Usage Example
Files Changed
This PR adds three new files to PyHealth:
pyhealth/datasets/covidred.py- Dataset loader classpyhealth/tasks/covidred.py- Task functions for COVID-19 detection/predictionexamples/covidred_example.py- Complete usage example with LSTM classifierCitation
If you use this dataset implementation, please cite the original COVID-RED study: