Histo-Miner: Tissue Features Extraction With Deep Learning from H&E Images of Squamous Cell Carcinoma Skin Cancer
Histo-Miner presentation • Project Structure • Visualization • Installation • Usage • Example • Datasets • Checkpoints • Q&A • Citation
This repository contains the code for "Histo-Miner: Tissue Features Extraction With Deep Learning from H&E Images of Squamous Cell Carcinoma Skin Cancer" paper.
Histo-Miner employs convolutional neural networks and vision transformers models for nucleus segmentation and classification as well as tumor region segmentation (a), (b), (c). From these predictions, it generates a compact feature vector summarizing tissue morphology and cellular interactions (d). We used such generated features to classify cSCC patient response to immunotherapy.
Here is an explanation of the project structure:
├── configs # All configs file with explanations
│ ├── models # Configs for both models inference
│ ├── classification_training # Configs for classifier training
│ ├── histo_miner_pipeline # Configs for the core code of histo-miner
├── docs # Images and Videos files
├── example # End to end example to run histo-miner
├── scripts # Main code for users to run histo-miner
├── src # Functions used for scripts
│ ├── histo-miner # All functions from the core code
│ ├── models # Submodules of models for inference and training
│ │ ├── hover-net # hover-net submodule
│ │ ├── mmsegmentation # segmenter submodule
├── supplements # Mathematica notebook for probability of distance overestimaiton calculation
├── visualization # Both python and groovy scripts to either reproduce paper figures or to vizualize model inference with qupath Note: Use the slider to fully read the comments for each section.
(step (c) from figure above)
The full pipeline requires 3 environments to work, one for each git submodule and one for the core histo-miner code. We propose hovernet git submodule containing the code of SCC Hovernet model and mmsegmentation git submodule containing the code of SCC Segmenter model. The reason why git submodules are used in this project are detailed in the Q&A section.
First clone the repository including its submodules:
git clone --recurse-submodules git@github.com:bozeklab/histo-miner.gitTo use histo-miner you need hardware with CUDA-enable GPUs (NVIDIA GPUs). CPU-only compatible environment are not supported yet.
cd histo-miner
# histo-miner env
conda env create -f histo-miner-env.yml
conda activate histo-miner-env
pip install --no-dependencies mrmr-selection==0.2.5
# mmsegmentation submodule env
conda create --name mmsegmentation_submodule python=3.8 -y
conda activate mmsegmentation_submodule
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
pip install click==8.2.1
pip install -U openmim
mim install mmengine
mim install mmcv==2.2.0
mim install mmcv-full==1.7.2
pip install mmsegmentation==1.2.2
# hovernet submodule env
conda env create -f hovernet_submodule.yml
conda activate hovernet_submodule
pip install torch==1.10.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
conda deactivateYou can create environments from scratch instead of using yaml files.
Alternative commands:
cd histo-miner
# histo-miner env
conda create -n histo-miner-env-nomrmr python=3.10 -y
conda activate histo-miner-env-nomrmr
conda install -c conda-forge openslide=3.4.1
pip install -r ./requirements.txt
pip install --no-dependencies mrmr-selection==0.2.5
# mmsegmentation submodule env
conda create --name mmsegmentation_submodule python=3.8 -y
conda activate mmsegmentation_submodule
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
pip install click==8.2.1
pip install -U openmim
mim install mmengine
mim install mmcv==2.2.0
mim install mmcv-full==1.7.2
pip install mmsegmentation==1.2.2
# hovernet submodule env
conda create -n hovernet_submodule python==3.6.12 -y
conda activate hovernet_submodule
pip install torch==1.10.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
chmod +x src/models/hover_net/setup_condainstall.sh
yes | ./src/models/hover_net/setup_condainstall.sh
pip install -r src/models/hover_net/requirements.txt
conda deactivateThe installation can take some time, especially for conda install -c conda-forge openslide, pip install torch and yes | ./src/models/hover_net/setup_condainstall.sh commands.
If you face problems installing pytorch, check next section. It is also possible to install pytorch no-gpu versions.
Further details are available in the README files of each git submodule. These files are:
- For hovernet submodule:
/src/models/hover_net/README.md - For mmsegmentation submodule:
/src/models/mmsegmentation/README.md
This section explains how to use the Histo-Miner code. A complete end-to-end example is also included to help you get started.
Remark: You can check how the config files are filled in example/example-configs/ if you have issues filling configs.
This step performs nucleus segmentation and classification from your input WSIs — corresponding to steps (a), (b), (c) in the figure above.
- Download SCC Segmenter and SCC Hovernet trained weights (see Datasets for manual download):
They are now in
conda activate histo-miner-env python src/histo_miner/download_weights.py
/data/checkpoints/ - Configure the files
scc_hovernet.ymlandscc_segmenter.yml:- Set the input/output paths
- Set the number of GPUs
- Set the checkpoints paths (in
./data/checkpoints/if automatic download) - Choose a cache folder
- Run the inference (it includes pre-processing of the slides):
cd scripts bash -l main1_hovernet_inference.sh bash -l main2_segmenter_inference.sh cd ..
- Combine the outputs:
- Copy both outputs in a folder
- Add the path to this folder to the inferences_postproc_main field in
histo_miner_pipeline.ymlconfig
- Run post-processing to correct tumor nuclei classification and reformat files for visualization:
conda activate histo-miner-env python scripts/main3_inferences_postproc.py
Output: One JSON file with segmented and classified nuclei for each input WSI.
Visualize the nucleus segmentation and classification as shown in the Visualization section.
- Put the JSON output of "Models inference" step and the corresponding input WSI in the same folder (you can use symbolic links if needed).
- Ensure both files have the same basename name (excluding extenstion).
- Open QuPath and open the input WSI inside QuPath. To download QuPath go to: QuPath website.
- In QuPath:
- Go to the
Automatemenu →Script Editor - Load and run the script:
visualization/qupath_scripts/open_annotations_SCC_Classes.groovy
- Go to the
- (Optional) Run the conversion script:
This helps improve navigation as detection objects are lighter than annotation objects in QuPath.
convert_annotation_to_detection.groovy
This step computes tissue-relevant features based on previously obtained nucleus segmentations — corresponding to step (d) in the figure above.
- Complete the "Models inference" step.
- Update the following paths in
histo_miner_pipeline.yml:tissue_analyser_main, folder containing the inference output JSON filestissue_analyser_output, path to saving folder
- Choose which features to compute using boolean flags in
histo_miner_pipeline.yml:calculate_morphologies, compute or not morphology related featurescalculate_vicinity, compute or not features specifically for cells in tumor vicinitycalculate_distances, compute or not distance related features (False by default)
- Run:
conda activate histo-miner-env python scripts/main4_tissue_analyser.py
Output: Structured JSON files with the computed features.
This step classifies WSIs with tumor regions into responder vs. non-responder for CPI treatment using features selected in the original Histo-Miner paper.
-
Complete the "Models inference" and "Tissue Analyser" steps.
-
Download
Ranking_of_features.jsonfile (here for manual download):conda activate histo-miner-env python src/histo_miner/download_rank.py
-
Update the following paths in
histo_miner_pipeline.ymlconfig:tissue_analyser_output, folder containing the tissue analyser output JSONs with correct naming (see 4.)featarray_folder, folder to the feature matrix output
-
Ensure to have "no_response" or "response" caracters in the name of the training json files (depending on the file class). For instance
sample_1_response_analysed.json. -
To generate the combined feature matrix and class vectors, run:
conda activate histo-miner-env python scripts/usecase1_collect_features_consistently.py
-
Update the following parameters in
classification.ymlconfig:predefined_feature_selectionmust be set to Truefeature_selection_file, path to theRanking_of_features.jsonfile (in./data/feature_rank/if automatic download)folders.save_trained_model, folder to save the modelnames.trained_model, name choosen for the model
Ensure that in
histo_miner_pipeline.ymlconfig:nbr_keptfeatis set to default value: 19
-
Run:
python scripts/training/training_classifier.py
-
Update the following parameters in
classification.ymlconfig:inference_input, path to the folder containing WSI to classify
-
Run:
python scripts/usecase2_classification_inference.py
Output: Prediction of responder vs non-responder class for each WSI displayed in terminal.
This version performs classification using a new feature selection tailored to your dataset.
- Complete the "Models inference" and "Tissue Analyser" steps.
- Update the following paths in
histo_miner_pipeline.yml:
tissue_analyser_output, folder containing the tissue analyser output JSONs with correct naming (see next point)featarray_folder, folder to the feature matrix output
-
Ensure to have "no_response" or "response" caracters in the name of the training json files (depending on the file class). For instance
sample_1_response_analysed.json. -
To generate the combined feature matrix and class vectors, run:
conda activate histo-miner-env python scripts/usecase1_collect_features_consistently.py
-
Update
histo_miner_pipeline.ymlconfig:classification_evaluation, path to folder to output cross-validation evaluationeval_folder, name of the folder Optionally updateclassification.ymlfor custom parameters.
-
Choose a feature selection method from
scripts/cross_validation/. We recommand runningfeatsel_mrmr_std_crossval_samesplits.py. Run the selected feature method. -
Update the following parameters in
classification.ymlconfig:predefined_feature_selectionmust be set to Falsefeature_selection_file, path to the feature selection numpy file generated in 7.folders.save_trained_model, folder to save the modelnames.trained_model, name choosen for the model
Importantly update
histo_miner_pipeline.ymlconfig:nbr_keptfeatto the new number of kept features (see infofiles generated in 7. if needed)
-
Run:
python scripts/training/training_classifier.py
-
Update the following parameters in
classification.ymlconfig:inference_input, path to the folder containing WSI to classify
-
Run:
python scripts/usecase2_classification_inference.pyOutput: Prediction of responder vs non-responder class for each WSI displayed in terminal.
An end to end example on how to run the code on one provided WSI is availble in example folder.
🔸 Contrary to NucSeg, TumSeg, SCC Hovernet and SCC Segmenter weights, CPI dataset remains restricted until the paper is published in a journal. The dataset was publicly available few days after publication of the preprint but was unfortunately made private again after discussion and agreement with co-authors.
Why this repository is using git submodules and needs 3 conda envs?
Git submodules allow for development of SCC Hovernet (hovernet submodule) and SCC Segmenter (mmsegmentation submodule) outside of their initial repositories. This allow to develop them in parallel, not to be influenced by any change done in the original repositories. Also, the code in the submodule contains only the necessary scripts and function to run histo-miner, facilitating readability and reducing repository weight. These modules require python and packages versions that are not compatible with histo-miner core code, so it is not possible to have an environment that fits for the whole repository.
In short, git submodules allow to treat SCC Hovernet and SCC Segmenter codes as separate projects from histo-miner core code while still allowing for their use within each other. More about git submodule can be found here.
Why pytorch installation is not included inside the env.yml from hovernet and mmsegmentation submodules?
Pytorch versioning depends on the GPUs of your machine. By excluding pytorch installation from the yml files, it allows user to find the pytorch version the most compatible with their software.
@misc{sancéré2025histominerdeeplearningbased,
title={Histo-Miner: Deep Learning based Tissue Features Extraction Pipeline from H&E Whole Slide Images of Cutaneous Squamous Cell Carcinoma},
author={Lucas Sancéré and Carina Lorenz and Doris Helbig and Oana-Diana Persa and Sonja Dengler and Alexander Kreuter and Martim Laimer and Anne Fröhlich and Jennifer Landsberg and Johannes Brägelmann and Katarzyna Bozek},
year={2025},
eprint={2505.04672},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.04672},
}
If you use this code or the datasets links please also consider starring the repo to increase its visibility! Thanks 💫



