Add/dka task #749

jhnwu3 · 2025-12-23T03:01:34Z

This pull request introduces a new DKA (Diabetic Ketoacidosis) prediction task for the MIMIC-IV dataset, including its integration into the codebase, documentation, and example usage. It also adds comprehensive synthetic test resources for MIMIC-IV, supporting the new task and facilitating robust testing and benchmarking.

New DKA Prediction Task:

Added the DKAPredictionMIMIC4 class for DKA prediction on MIMIC-IV, and integrated it into the pyhealth.tasks package.
Documented the new task in the API docs and included it in the list of available tasks. [1] [2]

Examples and Benchmarks:

Added a new example script dka_mimic4.py demonstrating how to use StageNet for DKA prediction with MIMIC-IV, including data loading, task application, model training, and evaluation.
Introduced a benchmarking script benchmark_workers_12.py for evaluating MIMIC-IV mortality prediction performance with various metrics and memory usage tracking.

Test Resources:

Added realistic synthetic MIMIC-IV demo data files (patients.csv, admissions.csv, diagnoses_icd.csv, labevents.csv, d_labitems.csv) to support testing and development of the new task. [1] [2] [3] [4] [5]

These changes collectively enable DKA prediction research on MIMIC-IV within the pyhealth framework and provide the necessary infrastructure for both development and robust evaluation.

Copilot

Pull request overview

This pull request introduces a new DKA (Diabetic Ketoacidosis) prediction task for the MIMIC-IV dataset. The PR adds two task classes (DKAPredictionMIMIC4 for general population and T1DDKAPredictionMIMIC4 for Type 1 Diabetes patients), comprehensive synthetic test data, example scripts, and documentation updates.

Key Changes:

Added DKA prediction task implementation with temporal data leakage prevention
Created synthetic MIMIC-IV demo dataset with realistic medical codes and lab values
Provided example scripts demonstrating StageNet model training for DKA prediction

Critical Issues Identified:
The test file tests/core/test_mimic4_dka.py contains multiple critical bugs where it tests DKAPredictionMIMIC4 but uses attributes, methods, and parameters that only exist in T1DDKAPredictionMIMIC4. This suggests the test was written for the wrong class or copied from T1D tests without proper adaptation.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 20 comments.

Show a summary per file

File	Description
pyhealth/tasks/dka.py	Implements DKAPredictionMIMIC4 and T1DDKAPredictionMIMIC4 classes with ICD code classification, lab feature extraction, and temporal filtering to prevent data leakage
pyhealth/tasks/init.py	Exports the new DKA prediction task classes
pyhealth/datasets/configs/mimic4_ehr.yaml	Adds hadm_id field to labevents table configuration for admission-level filtering
tests/core/test_mimic4_dka.py	Test suite for DKA prediction (contains critical bugs - tests wrong class attributes/methods)
tests/core/test_mimic4_los.py	Test suite for length of stay prediction task
test-resources/core/mimic4demo/hosp/*.csv	Synthetic MIMIC-IV demo data files (patients, admissions, diagnoses, procedures, prescriptions, lab events)
examples/clinical_tasks/dka_mimic4.py	Example script for general population DKA prediction using StageNet
examples/clinical_tasks/t1dka_mimic4.py	Example script for T1D-specific DKA prediction using StageNet
examples/benchmark_perf/benchmark_workers_12.py	Benchmarking script for mortality prediction (has num_workers documentation inconsistencies)
docs/api/tasks/pyhealth.tasks.dka.rst	API documentation for DKA task
docs/api/tasks.rst	Adds DKA task to the task list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T19:17:11Z

tests/core/test_mimic4_dka.py

+                # Verify diagnoses format (tuple of times and sequences)
+                diagnoses = sample["diagnoses"]
+                self.assertIsInstance(diagnoses, tuple, "diagnoses should be a tuple")
+                self.assertEqual(len(diagnoses), 2, "diagnoses tuple should have 2 elements")
+                print(f"  - diagnoses: {len(diagnoses[1])} admission(s)")


The test expects 'diagnoses' in the sample but the DKAPredictionMIMIC4 class returns 'icd_codes' (as shown in line 268 of pyhealth/tasks/dka.py). The test will fail when accessing sample['diagnoses'] since that key doesn't exist.

Copilot · 2025-12-24T19:17:11Z

tests/core/test_mimic4_dka.py

+
+        # Test custom initialization
+        print("\nTesting custom initialization...")
+        custom_task = DKAPredictionMIMIC4(dka_window_days=30, padding=5)


The test calls DKAPredictionMIMIC4 with dka_window_days and padding parameters, but the DKAPredictionMIMIC4.init only accepts 'padding' parameter. The 'dka_window_days' parameter is only available in T1DDKAPredictionMIMIC4. This will cause a TypeError when the test runs.

Copilot · 2025-12-24T19:17:11Z

tests/core/test_mimic4_dka.py

+        self.assertEqual(len(DKAPredictionMIMIC4.LAB_CATEGORY_ORDER), 6)
+        expected_categories = ["glucose", "bicarbonate", "anion_gap", "potassium", "sodium", "chloride"]


The test expects LAB_CATEGORY_ORDER to have 6 categories with lowercase names like 'glucose', 'bicarbonate', etc., but the actual DKAPredictionMIMIC4.LAB_CATEGORY_ORDER has 10 categories with capitalized names like 'Glucose', 'Bicarbonate', 'Sodium', 'Potassium', 'Chloride', 'Calcium', 'Magnesium', 'Anion Gap', 'Osmolality', 'Phosphate'. The test will fail on these assertions.

Suggested change

self.assertEqual(len(DKAPredictionMIMIC4.LAB_CATEGORY_ORDER), 6)

expected_categories = ["glucose", "bicarbonate", "anion_gap", "potassium", "sodium", "chloride"]

self.assertEqual(len(DKAPredictionMIMIC4.LAB_CATEGORY_ORDER), 10)

expected_categories = [

"Glucose",

"Bicarbonate",

"Sodium",

"Potassium",

"Chloride",

"Calcium",

"Magnesium",

"Anion Gap",

"Osmolality",

"Phosphate",

]

Copilot · 2025-12-24T19:17:12Z

tests/core/test_mimic4_dka.py

+        expected_categories = ["glucose", "bicarbonate", "anion_gap", "potassium", "sodium", "chloride"]
+        self.assertEqual(DKAPredictionMIMIC4.LAB_CATEGORY_ORDER, expected_categories)
+        print(f"✓ Lab categories: {DKAPredictionMIMIC4.LAB_CATEGORY_ORDER}")
+        print(f"✓ Total lab item IDs: {len(DKAPredictionMIMIC4.ALL_LAB_ITEMIDS)}")


The test tries to access DKAPredictionMIMIC4.ALL_LAB_ITEMIDS but this attribute doesn't exist. The actual attribute name is 'LABITEMS' (line 81 in pyhealth/tasks/dka.py). This will cause an AttributeError.

Suggested change

print(f"✓ Total lab item IDs: {len(DKAPredictionMIMIC4.ALL_LAB_ITEMIDS)}")

print(f"✓ Total lab item IDs: {len(DKAPredictionMIMIC4.LABITEMS)}")

Copilot · 2025-12-24T19:17:12Z

tests/core/test_mimic4_dka.py

+    def test_dka_prediction_helper_methods(self):
+        """Test helper methods of DKAPredictionMIMIC4."""
+        print(f"\n{'='*60}")
+        print("TEST: test_dka_prediction_helper_methods()")
+        print(f"{'='*60}")
+
+        task = DKAPredictionMIMIC4()
+
+        # Test _normalize_icd
+        print("Testing _normalize_icd()...")
+        self.assertEqual(task._normalize_icd("E10.10"), "E1010")
+        self.assertEqual(task._normalize_icd("e10.10"), "E1010")
+        self.assertEqual(task._normalize_icd(None), "")
+        self.assertEqual(task._normalize_icd("  25001  "), "25001")
+        print("✓ _normalize_icd() works correctly")
+
+        # Test _is_t1dm_code
+        print("\nTesting _is_t1dm_code()...")
+        self.assertTrue(task._is_t1dm_code("E10.10", 10))
+        self.assertTrue(task._is_t1dm_code("E1010", "10"))
+        self.assertTrue(task._is_t1dm_code("25001", 9))
+        self.assertTrue(task._is_t1dm_code("25001", "9"))
+        self.assertFalse(task._is_t1dm_code("E11.0", 10))  # Type 2
+        self.assertFalse(task._is_t1dm_code("25000", 9))  # Type 2
+        self.assertFalse(task._is_t1dm_code(None, 10))
+        print("✓ _is_t1dm_code() works correctly")
+
+        # Test _is_dka_code
+        print("\nTesting _is_dka_code()...")
+        self.assertTrue(task._is_dka_code("E10.10", 10))
+        self.assertTrue(task._is_dka_code("E1011", "10"))
+        self.assertTrue(task._is_dka_code("25011", 9))
+        self.assertTrue(task._is_dka_code("25013", "9"))
+        self.assertFalse(task._is_dka_code("E10.65", 10))  # Not DKA
+        self.assertFalse(task._is_dka_code(None, 10))
+        print("✓ _is_dka_code() works correctly")


The test tries to access attributes and methods (DKA_ICD10_PREFIX, T1DM_ICD10_PREFIX, T1DM_ICD9_CODES) that don't exist in DKAPredictionMIMIC4. The DKAPredictionMIMIC4 class doesn't have T1DM-related attributes, only DKA_ICD9_CODES and DKA_ICD10_PREFIXES (note: plural). These tests appear to be written for T1DDKAPredictionMIMIC4 instead.

Copilot · 2025-12-24T19:17:15Z

pyhealth/tasks/dka.py

+                dischtime_str = getattr(admission, "dischtime", None)
+                if dischtime_str:
+                    admission_dischtime = datetime.strptime(
+                        dischtime_str, "%Y-%m-%d %H:%M:%S"
+                    )
+                else:
+                    admission_dischtime = None


This assignment to 'admission_dischtime' is unnecessary as it is redefined before this value is used.

Suggested change

dischtime_str = getattr(admission, "dischtime", None)

if dischtime_str:

admission_dischtime = datetime.strptime(

dischtime_str, "%Y-%m-%d %H:%M:%S"

)

else:

admission_dischtime = None

Copilot · 2025-12-24T19:17:15Z

pyhealth/tasks/dka.py

+            # Parse admission times
+            try:
+                admission_time = admission.timestamp
+                dischtime_str = getattr(admission, "dischtime", None)
+                if dischtime_str:
+                    admission_dischtime = datetime.strptime(
+                        dischtime_str, "%Y-%m-%d %H:%M:%S"
+                    )
+                else:
+                    admission_dischtime = None


This assignment to 'admission_dischtime' is unnecessary as it is redefined before this value is used.

Suggested change

# Parse admission times

try:

admission_time = admission.timestamp

dischtime_str = getattr(admission, "dischtime", None)

if dischtime_str:

admission_dischtime = datetime.strptime(

dischtime_str, "%Y-%m-%d %H:%M:%S"

)

else:

admission_dischtime = None

# Parse admission time; skip admission if timestamp is invalid

try:

admission_time = admission.timestamp

Copilot · 2025-12-24T19:17:15Z

examples/benchmark_perf/benchmark_workers_12.py

+
+    # Apply memory limit if enabled
+    if enable_memory_limit:
+        set_memory_limit(max_memory_gb)


This statement is unreachable.

Copilot · 2025-12-24T19:17:15Z

examples/benchmark_perf/benchmark_workers_12.py

+    print("=" * 80)
+    print(f"BENCHMARK: num_workers=4, dev={dev}")
+    if enable_memory_limit:
+        print(f"Memory Limit: {max_memory_gb} GB (ENFORCED)")


This statement is unreachable.

Copilot · 2025-12-24T19:17:15Z

examples/benchmark_perf/benchmark_workers_12.py

+    # Define cache directories based on dev mode
+    cache_root = "/shared/rsaas/pyhealth/"
+    if dev:
+        cache_root += "_dev"


This statement is unreachable.

jhnwu3 added 4 commits December 19, 2025 14:46

init commit, will need to debug and clean later

81ae4b0

make example simpelr

d1a572c

new updates to general population dka prediction

1a99bf7

more commits for improving the robustness of task processing

9190f95

jhnwu3 requested review from LogicFan and Copilot December 24, 2025 19:11

Copilot started reviewing on behalf of jhnwu3 December 24, 2025 19:11 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add/dka task #749

Add/dka task #749

Uh oh!

jhnwu3 commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self.assertEqual(len(DKAPredictionMIMIC4.LAB_CATEGORY_ORDER), 6)
		expected_categories = ["glucose", "bicarbonate", "anion_gap", "potassium", "sodium", "chloride"]

-        self.assertEqual(len(DKAPredictionMIMIC4.LAB_CATEGORY_ORDER), 6)
-        expected_categories = ["glucose", "bicarbonate", "anion_gap", "potassium", "sodium", "chloride"]
+        self.assertEqual(len(DKAPredictionMIMIC4.LAB_CATEGORY_ORDER), 10)
+        expected_categories = [
+            "Glucose",
+            "Bicarbonate",
+            "Sodium",
+            "Potassium",
+            "Chloride",
+            "Calcium",
+            "Magnesium",
+            "Anion Gap",
+            "Osmolality",
+            "Phosphate",
+        ]

	print(f"✓ Total lab item IDs: {len(DKAPredictionMIMIC4.ALL_LAB_ITEMIDS)}")
	print(f"✓ Total lab item IDs: {len(DKAPredictionMIMIC4.LABITEMS)}")

Add/dka task #749

Are you sure you want to change the base?

Add/dka task #749

Uh oh!

Conversation

jhnwu3 commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants