Skip to content

Math2130/MIMIC-III-Mortality-Analysis

Repository files navigation

MIMIC3 ICU Mortality Analysis

Project Overview

This data science project analyzes the MIMIC-III (Medical Information Mart for Intensive Care III) clinical database to investigate factors affecting mortality rates of ICU patients at Beth Israel Deaconess Medical Center from 2001 to 2012. The analysis explores temporal patterns, medication effects, and demographic factors that may influence patient outcomes.

Research Questions

The primary research question addressed in this project is:

"What factors affected the mortality rate of ICU patients hospitalized at Beth Israel Deaconess Medical Center from 2001 to 2012?"

Sub-questions investigated:

  1. Temporal Analysis: Are there seasonal or monthly patterns in ICU mortality rates?
  2. Medication Analysis: Do prescribed drugs and their duration correlate with mortality rates?
  3. Demographics Analysis: Can demographic factors (ethnicity, language, religion) predict patient mortality?

Dataset

The project uses the MIMIC-III Clinical Database v1.4, which contains de-identified health data from approximately 60,000 ICU stays. Key datasets analyzed include:

  • PATIENTS.csv - Patient demographics and basic information
  • ADMISSIONS.csv - Hospital admission details
  • ICUSTAYS.csv - ICU stay information
  • PRESCRIPTIONS.csv - Medication prescriptions
  • NOTEEVENTS.csv - Clinical notes
  • D_ICD_DIAGNOSES.csv - ICD diagnosis codes

Installation & Setup

Prerequisites

  • Python 3.7+
  • Access to MIMIC-III dataset

Required Python Packages

pip install pandas numpy matplotlib seaborn scikit-learn statsmodels scipy

Data Setup

  1. Get access to the MIMIC-III dataset
  2. Place the dataset files in the mimic-iii-clinical-database-1.4/ directory
  3. Run CleanedData.ipynb to preprocess the raw data

Key Findings

Temporal Analysis

  • Monthly variation exists: April showed the highest mortality rate, June the lowest
  • Statistical significance: Two-sample proportion t-test revealed significant differences between months (z > 3)
  • Limited predictability: Multi-class classification models achieved only ~7% accuracy in predicting death month from diagnosis variables

Medication Analysis

  • Weak correlations: Cramér's V tests showed statistically significant but weak associations between drug categories (ATC levels) and mortality (V = 0.0516-0.1448)
  • Drug duration: No strong correlation found between prescription duration and mortality
  • Feature engineering: External ATC (Anatomical Therapeutic Chemical) dataset was integrated for drug classification

Demographics Analysis

  • Strong predictive power: Support Vector Machine models achieved ~90% accuracy in predicting mortality from demographic variables
  • Key factors: Religion, ethnicity, and language were all individually predictive of mortality
  • Statistical significance: Chi-square tests revealed significant associations between all demographic categories and mortality outcomes

Methodology

Data Preprocessing

  • Quality control: Removed invalid entries, handled missing values, normalized categorical variables
  • Date consistency: Converted temporal variables to pandas datetime format
  • Feature engineering: Created derived variables including monthly mortality rates, prescription durations, and one-hot encoded categorical variables
  • Data validation: Dropped inconsistent demographic entries and invalid date ranges

Statistical Analysis

  • Temporal patterns: Created monthly mortality rate calculations and seasonal analysis
  • Association testing: Used Cramér's V for categorical associations and chi-square tests for independence
  • Hypothesis testing: Two-sample proportion tests for monthly mortality differences

Machine Learning Models

  • Multi-class classification: Logistic regression for predicting month of death
  • Binary classification: Support Vector Machine (SVM) for mortality prediction
  • Cross-validation: 100-iteration cross-validation for model reliability
  • Sampling strategy: Used 20% training, 5% testing, 75% discarded for computational efficiency

Results Summary

Demographics (ethnicity, language, religion) are strong predictors of ICU mortality at this hospital (90% accuracy), while temporal factors show statistical differences but limited practical predictability. Medication factors show weak associations with outcomes.

Model Performance

  • Demographics SVM: 90% average accuracy (88-93% range)
  • Temporal multi-class: 7% average accuracy (5-12% range)
  • Medication correlation: Weak but significant associations (Cramér's V < 0.15)

Limitations & Considerations

  • Single institution: Results specific to Beth Israel Deaconess Medical Center
  • Date shifting: MIMIC-III dates are shifted to protect privacy, though temporal relationships are preserved
  • Sample bias: Demographics findings may not generalize to other hospitals or populations
  • Model constraints: Limited independent variables available for temporal analysis

Future Work

  • Investigate more detailed medication interactions and dosage effects
  • Expand temporal analysis with additional clinical variables
  • Validate demographic findings across multiple healthcare institutions
  • Explore potential biases in healthcare delivery indicated by demographic predictors

Usage

  1. Data Cleaning: Start with CleanedData.ipynb to process raw MIMIC-III files
  2. Temporal Analysis: Run VisualizationTime.ipynb for mortality pattern analysis
  3. Medication Analysis: Execute VisualizationMedication.ipynb for drug correlation studies
  4. Demographics Analysis: Use VisualizationDemographics.ipynb

References

  • Johnson, A., Pollard, T., & Mark, R. (2019). MIMIC-III Clinical Database Demo (version 1.4). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/C2HM2Q
  • Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.
  • Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
  • MIMIC-III Documentation: https://mimic.physionet.org/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published