nnMIL: A generalizable multiple instance learning framework for computational pathology
- No-new MIL training strategies yield benefits in the foundation model era.
- Unified pipeline for classification, regression, and survival MIL tasks
- Plan-driven training that inspects slide features, builds patient-level splits, and recommends hyperparameters
- Consistent inference utilities for official or k-fold evaluation settings
Looking for the full step-by-step walkthrough? Jump to TUTORIAL.md.
nnMIL/
├── data/ # Dataset abstractions
├── network_architecture/ # Model factory + implementations
├── preprocessing/ # Experiment planner & helpers
├── run/ # Python entry points (plan/train/predict)
├── scripts/ # Shell wrappers for complete workflows
├── training/ # Trainers, losses, samplers, callbacks
└── utilities/ # Shared utils (logging, configs, etc.)
Two external directories are expected beside the repo:
nnMIL_raw_data/TaskXXX_*holds each task’sdataset.json,dataset.csv, generateddataset_plan.json, and HDF5 feature files.nnMIL_results/TaskXXX_*receives logs, checkpoints, predictions, and metrics (official_split/orfold_*subfolders).
Reference bundles:
- Datasets & plan files (
nnMIL_raw_datasnapshot) - Experiment outputs (
nnMIL_resultssnapshot) - Extracted patch-level embeddings (TCGA/EBRAINS using Virchow2 and UNI):
- Install system packages for HDF5/BLAS as required by your platform.
- Create and activate the project environment (example using conda):
conda env create -f environment.yml
conda activate pyrad- Install PyTorch 2.9 nightly and the matching torchvision build (CUDA 12.6 example):
pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/nightly/cu126
pip install --no-cache-dir torchvision --index-url https://download.pytorch.org/whl/nightly/cu126- Prepare data under
nnMIL_raw_data/<Task_ID>/with:dataset.json(task metadata: labels, metrics, feature path)dataset.csv(slide/patient metadata, labels or survival fields)- slide-level feature files (
<slide_id>.h5)
- Plan the experiment:
python nnMIL/run/nnMIL_plan_experiment.py -d nnMIL_raw_data/Task001_CRC_DSSThis produces dataset_plan.json with recommended hyper-parameters and patient splits.
3. Train:
python nnMIL/run/nnMIL_run_training.py nnMIL_raw_data/Task001_CRC_DSS simple_mil allor use bash nnMIL/scripts/run_classification.sh nnMIL_raw_data/Task001_CRC_DSS simple_mil 0 auto.
4. Predict with nnMIL/run/nnMIL_predict.py, pointing to each checkpoint directory.
For detailed guidance (including how to adapt dataset.json/dataset.csv to your own data), consult TUTORIAL.md.
scripts/run_classification.sh <DATASET_DIR> <MODEL> <CUDA_DEVICE> [split]
Automates planning → training → prediction.splitacceptsauto(default),all,None, or a fold index.scripts/run_survival.sh <DATASET_DIR> <MODEL> <CUDA_DEVICE>
Equivalent wrapper tailored for survival experiments.scripts/run_plco_crc.sh
End-to-end recipe for the PLCO CRC cohort.
All scripts assume the project root is the parent of nnMIL/ and write outputs into nnMIL_results/.
We gratefully acknowledge prior work that inspired nnMIL:
- MIL_BASELINE for its comprehensive collection of MIL models.
- nnUNet for the self-configuring design principles that guided our training planner and workflow automation.
This project focuses mainly on simple yet generalizable MIL training. For feature extraction, we highly recommend using the excellent projects CLAM or STAMP.
If you use this codebase in your research, please cite the following works:
@misc{luo2025nnmil,
title={nnMIL: A generalizable multiple instance learning framework for computational pathology},
author={Xiangde Luo and Jinxi Xiang and Yuanfeng Ji and Ruijiang Li},
year={2025},
eprint={2511.14907},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.14907}}
nnMIL is actively evolving—expect iterative updates to the planner, trainers, and evaluation scripts. Feedback and contributions are welcome. Reach out at luoxd96 at stanford dot edu.
👉 A comprehensive tutorial (classification + survival, custom dataset adaptation, shell scripts) is maintained in TUTORIAL.md and updated alongside code changes.