Skip to content

Conversation

@WubbzyFromWuzzleburg
Copy link

Contributor: Elliott Huang (elliott500800@gmail.com)

Contribution Type: New Dataset + Documentation + Example Script

Description:
This PR adds support for the KaggleERN (INRIA BCI Challenge) EEG dataset in PyHealth. It introduces a KaggleERNDataset class that validates the expected raw folder structure and provides an offline preprocessing utility to convert raw EEG CSV files into fixed-length epoch/window pickle files for downstream training (keeping the pickle schema compatible with the provided fine-tuning workflow). This PR also updates the API documentation to include the new dataset and adds an end-to-end fine-tuning example for EEGPT on KaggleERN, including instructions for downloading and placing the pretrained EEGPT checkpoint.

Files to Review:

  • pyhealth/datasets/kaggleern.py — KaggleERN dataset class + offline preprocessing to window pickle format
  • pyhealth/datasets/configs/kaggleern.yaml — Dataset config entry (tables/attributes schema)
  • pyhealth/datasets/__init__.py — Expose KaggleERNDataset in the datasets namespace
  • examples/kaggleern_finetune_EEGPT.py — Fine-tuning example script (paths are placeholders; includes pretrained checkpoint notes)
  • docs/api/datasets.rst — Register KaggleERN in dataset API docs index
  • docs/api/datasets/pyhealth.datasets.KaggleERNDataset.rst — New API doc page for pyhealth.datasets.KaggleERNDataset
  • tests/core/test_kaggleern.py — Unit tests for dataset verification and optional preprocessing integration

@LogicFan LogicFan added the dataset Contribute a new dataset to PyHealth label Dec 18, 2025
@LogicFan
Copy link
Collaborator

LGTM, but probably need to update the code to support the newest changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Contribute a new dataset to PyHealth

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants