nnMIL: No-New Multiple Instance Learning

nnMIL: A generalizable multiple instance learning framework for computational pathology

No-new MIL training strategies yield benefits in the foundation model era.
Unified pipeline for classification, regression, and survival MIL tasks
Plan-driven training that inspects slide features, builds patient-level splits, and recommends hyperparameters
Consistent inference utilities for official or k-fold evaluation settings

Looking for the full step-by-step walkthrough? Jump to TUTORIAL.md.

Repository Layout

nnMIL/
├── data/                    # Dataset abstractions
├── network_architecture/    # Model factory + implementations
├── preprocessing/           # Experiment planner & helpers
├── run/                     # Python entry points (plan/train/predict)
├── scripts/                 # Shell wrappers for complete workflows
├── training/                # Trainers, losses, samplers, callbacks
└── utilities/               # Shared utils (logging, configs, etc.)

Two external directories are expected beside the repo:

nnMIL_raw_data/TaskXXX_* holds each task’s dataset.json, dataset.csv, generated dataset_plan.json, and HDF5 feature files.
nnMIL_results/TaskXXX_* receives logs, checkpoints, predictions, and metrics (official_split/ or fold_* subfolders).

Reference bundles:

Datasets & plan files (nnMIL_raw_data snapshot)
Experiment outputs (nnMIL_results snapshot)
Extracted patch-level embeddings (TCGA/EBRAINS using Virchow2 and UNI):

Environment & Dependencies

Install system packages for HDF5/BLAS as required by your platform.
Create and activate the project environment (example using conda):

   conda env create -f environment.yml
   conda activate pyrad

Install PyTorch 2.9 nightly and the matching torchvision build (CUDA 12.6 example):

   pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/nightly/cu126
   pip install --no-cache-dir torchvision --index-url https://download.pytorch.org/whl/nightly/cu126

Quick Start (High Level)

Prepare data under nnMIL_raw_data/<Task_ID>/ with:
- dataset.json (task metadata: labels, metrics, feature path)
- dataset.csv (slide/patient metadata, labels or survival fields)
- slide-level feature files (<slide_id>.h5)
Plan the experiment:

   python nnMIL/run/nnMIL_plan_experiment.py -d nnMIL_raw_data/Task001_CRC_DSS

This produces dataset_plan.json with recommended hyper-parameters and patient splits. 3. Train:

   python nnMIL/run/nnMIL_run_training.py nnMIL_raw_data/Task001_CRC_DSS simple_mil all

or use bash nnMIL/scripts/run_classification.sh nnMIL_raw_data/Task001_CRC_DSS simple_mil 0 auto. 4. Predict with nnMIL/run/nnMIL_predict.py, pointing to each checkpoint directory.

For detailed guidance (including how to adapt dataset.json/dataset.csv to your own data), consult TUTORIAL.md.

Workflow Scripts

scripts/run_classification.sh <DATASET_DIR> <MODEL> <CUDA_DEVICE> [split]
Automates planning → training → prediction. split accepts auto (default), all, None, or a fold index.
scripts/run_survival.sh <DATASET_DIR> <MODEL> <CUDA_DEVICE>
Equivalent wrapper tailored for survival experiments.
scripts/run_plco_crc.sh
End-to-end recipe for the PLCO CRC cohort.

All scripts assume the project root is the parent of nnMIL/ and write outputs into nnMIL_results/.

Acknowledgements

We gratefully acknowledge prior work that inspired nnMIL:

MIL_BASELINE for its comprehensive collection of MIL models.
nnUNet for the self-configuring design principles that guided our training planner and workflow automation.

This project focuses mainly on simple yet generalizable MIL training. For feature extraction, we highly recommend using the excellent projects CLAM or STAMP.

If you use this codebase in your research, please cite the following works:

	@misc{luo2025nnmil,
    title={nnMIL: A generalizable multiple instance learning framework for computational pathology}, 
    author={Xiangde Luo and Jinxi Xiang and Yuanfeng Ji and Ruijiang Li},
    year={2025},
    eprint={2511.14907},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2511.14907}}

Status & Contact

nnMIL is actively evolving—expect iterative updates to the planner, trainers, and evaluation scripts. Feedback and contributions are welcome. Reach out at luoxd96 at stanford dot edu.

👉 A comprehensive tutorial (classification + survival, custom dataset adaptation, shell scripts) is maintained in TUTORIAL.md and updated alongside code changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nnMIL: No-New Multiple Instance Learning

Repository Layout

Environment & Dependencies

Quick Start (High Level)

Workflow Scripts

Acknowledgements

Status & Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
inference		inference
network_architecture		network_architecture
preprocessing		preprocessing
run		run
scripts		scripts
training		training
utilities		utilities
.gitignore		.gitignore
README.md		README.md
TUTORIAL.md		TUTORIAL.md
__init__.py		__init__.py
environment.yml		environment.yml

Luoxd1996/nnMIL

Folders and files

Latest commit

History

Repository files navigation

nnMIL: No-New Multiple Instance Learning

Repository Layout

Environment & Dependencies

Quick Start (High Level)

Workflow Scripts

Acknowledgements

Status & Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages