An integrated framework for cell segmentation and cell type annotation using CellSAM and DeepCell Types.
This repository provides reproducible pipelines for processing, segmenting, and classifying cells in multi-channel microscopy datasets.
- Overview
- Prerequisites
- Environment Setup
- API Configuration
- Workflow
- Expected Outputs
- Troubleshooting
- References
This repository implements an end-to-end workflow for microscopy image analysis:
- Segmentation β Identify individual cells using CellSAM
- Annotation β Predict cell types using DeepCell Types API
- Reporting β Generate per-cell CSVs and visual summaries
- Python 3.12+ (required)
- Conda or pip/venv for environment management
- DeepCell API token (for cell type annotation)
- Globus account (for HuBMAP data access)
- GPU recommended (for faster inference)
# Clone the repository
git clone https://github.com/cns-iu/hra-deepcell-experiments.git
cd hra-deepcell-experiments
# Create and activate conda environment
conda create -n hra-deepcell python=3.12
conda activate hra-deepcell
# Install dependencies
pip install -r requirements.txt
pip install git+https://github.com/vanvalenlab/cellSAM.git
pip install git+https://github.com/vanvalenlab/deepcell-types@master# Create and activate virtual environment
python3.12 -m venv hra-deepcell
# Activate environment
source hra-deepcell/bin/activate # macOS/Linux
hra-deepcell\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
pip install git+https://github.com/vanvalenlab/cellSAM.git
pip install git+https://github.com/vanvalenlab/deepcell-types@masterStep 1: Obtain your API token
- Visit DeepCell API Key Management
- Log in and generate a new API key
- Copy the token
Step 2: Configure the token (choose one method)
A. For Conda users (persistent):
conda env config vars set -n hra-deepcell DEEPCELL_ACCESS_TOKEN=<your-token>
conda deactivate && conda activate hra-deepcellB. For system environment (persistent):
macOS/Linux:
export DEEPCELL_ACCESS_TOKEN=<your-token>
# Add to ~/.bashrc or ~/.zshrc for persistenceWindows PowerShell:
setx DEEPCELL_ACCESS_TOKEN "<your-token>"Windows CMD:
set DEEPCELL_ACCESS_TOKEN=<your-token>C. In Python (temporary):
import os
os.environ["DEEPCELL_ACCESS_TOKEN"] = "<your-token>"π‘ Note: Restart your terminal or IDE after setting environment variables.
pip install atlas-consortia-cltglobus login --no-local-serverFollow the authentication link in your browser and authorize access.
Verify authentication:
globus whoami --verboseDownload and install Globus Connect Personal:
wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
tar -xzf globusconnectpersonal-latest.tgz
cd globusconnectpersonal-*
./globusconnectpersonal -start &Follow the setup prompts. For detailed instructions, see Globus Connect Personal for Linux.
Create a manifest.txt file following the HuBMAP Manifest File Documentation.
hubmap-clt transfer manifest.txt -d /path/to/destination/data-original/π‘ Tip: Use
screenortmuxto run long transfers in the background:screen -S hubmap-transfer hubmap-clt transfer manifest.txt -d /path/to/destination/ # Press Ctrl+A, then D to detach
If you see "Session reauthentication required", run:
globus session update <SESSION-ID>Follow the prompts to re-authenticate.
Navigate to the scripts/ folder corresponding to your dataset (e.g., scripts/thymus/).
Extract descendant HuBMAP IDs from parent IDs:
python3 00_hubmap-id_desc.pyInput: CSV file with parent HuBMAP IDs
Output: CSV with descendant IDs
β οΈ Note: Some parent IDs may not have descendants β this is expected and handled automatically.
To use the API:
- Login to HuBMAP Portal
- Navigate to the CCF-EUI Portal
- Right-click β Inspect β Network tab
- Find a request and copy the token from
token=...&in the URL - Paste the token in
00_hubmap-id_desc.py
β οΈ Note: Tokens expire periodically. Repeat this process if authentication fails.
Generate manifest.txt files for each dataset:
python3 01_manifest_creation.pyDatasets without descendants are automatically skipped.
Transfer data using Globus:
hubmap-clt transfer manifest.txt -d /path/to/data-original/π‘ Tip: Run in a
screensession for long downloads.
Create config.yaml files with channel information:
python3 02_hubmap-config.pyThis script:
- Extracts
nucleus_channelandcell_channelfrom pipeline metadata - Restructures file names in
input-data/ - Generates configuration files for inference
Navigate to the src/ directory and run:
python run_inference_pipeline.py \
--input_root /path/to/input-data/ \
--output_root /path/to/output-data/What happens:
- Segmentation β CellSAM identifies cell boundaries
- Annotation β DeepCell Types predicts cell types
- Output generation β Masks, CSVs, and summaries are saved
β οΈ Note: Datasets missingnucleus_channelwill be automatically skipped.
To utilize multiple GPUs simultaneously:
# Terminal 1 (GPU 0)
CUDA_VISIBLE_DEVICES=0 python run_inference_pipeline.py \
--input_root /path/to/input-data-1/ \
--output_root /path/to/output-data-1/ &
# Terminal 2 (GPU 1)
CUDA_VISIBLE_DEVICES=1 python run_inference_pipeline.py \
--input_root /path/to/input-data-2/ \
--output_root /path/to/output-data-2/π‘ Tip: Run each command in a separate
screensession.
After a successful run, each dataset will produce:
output-data/
βββ <dataset-id>/
β βββ mask.tif # Segmentation masks
β βββ cell_populations.csv # Morphological metrics
β βββ cell_types.csv # Cell type annotations
File descriptions:
mask.tifβ Labeled instance segmentation maskscell_populations.csvβ Per-cell morphological features (area, perimeter, etc.)cell_types.csvβ Cell type predictions with confidence scores
Issue: ModuleNotFoundError: No module named 'cellSAM'
Solution: Ensure you installed CellSAM from source:
pip install git+https://github.com/vanvalenlab/cellSAM.gitIssue: DeepCell API authentication failed
Solution: Verify your token is set correctly:
echo $DEEPCELL_ACCESS_TOKENIssue: CUDA out of memory
Solution: Reduce batch size or process fewer images at once. Use CUDA_VISIBLE_DEVICES to assign specific GPUs.
Issue: Globus transfer stalls
Solution: Ensure Globus Connect Personal is running:
./globusconnectpersonal -statusIssue: Missing nucleus channel
Solution: This is expected for some datasets. The pipeline will automatically skip them and continue.
π Results Comparison with hubmap-mirror-data-api
Use the two scripts from the scripts folder:
-
getsample.pyβ Reads HuBMAP Zarr archives from S3, extracts thecell_types/predictionsattribute (hubmap_dct) from each dataset, converts it to a DataFrame withcell_typeandCL_idcolumns, and writes one CSV per dataset toross_results/. Skips datasets missing the required attribute and logs errors. -
preprocess.pyβ Processes each CSV inross_results/, counts cells percell_type, calculates percentages, sorts by count descending, and outputs summary CSVs toprocessed_ross/with cleaned filenames (e.g.,*_deepcell_population.csv).
- CellSAM: Documentation | GitHub
- DeepCell Types: Tutorial | GitHub
- DeepCell API: Setup Guide
- HuBMAP: API Reference | CLI Guide
- Globus: Connect Personal Installation