-
Notifications
You must be signed in to change notification settings - Fork 555
Open
1 / 11 of 1 issue completedOpen
1 / 11 of 1 issue completed
Copy link
Description
Starting from 2.0a11, PyHealth starts to use a disk-based memory efficient dataset to reduce the memory usage for large dataset such as MIMIC4.
This issues tracks any potential bugs or improvements required for new memory efficient dataset.
Improvements
- Batched processing for task transformation to speed up. Furthur optimization on task transformation. #750
- Support multi-worker for task transformation. Multiprocess task transformation #748
- Support configure
n_workerfor dask. Add num_workers to BaseDataset #743
Bugs
- Temporary folder for dataset is not proprely cleaned after dataset processing. Clean up tmpdir correctly, cache task transformation result, and better notebook support. #753
- Cached data is not cleaned if the program crashed in the middle, which may lead to corrupted cache file. Clean up tmpdir correctly, cache task transformation result, and better notebook support. #753
- Incorrect null handling for patient_id and timestamp Fix incorrect null handling for patient_id and timestamp #746
- Time-series processor, process() doesn't seem to properly set the self.n_features or self.size() function properly #742 Fix/processors fit process #744
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels