🧬 HRA DeepCell Experiments

An integrated framework for cell segmentation and cell type annotation using CellSAM and DeepCell Types.

This repository provides reproducible pipelines for processing, segmenting, and classifying cells in multi-channel microscopy datasets.

📑 Table of Contents

Overview
Prerequisites
Environment Setup
API Configuration
- DeepCell API
- HuBMAP Globus API
Workflow
Expected Outputs
Troubleshooting
References

🚀 Overview

This repository implements an end-to-end workflow for microscopy image analysis:

Segmentation — Identify individual cells using CellSAM
Annotation — Predict cell types using DeepCell Types API
Reporting — Generate per-cell CSVs and visual summaries

📋 Prerequisites

Python 3.12+ (required)
Conda or pip/venv for environment management
DeepCell API token (for cell type annotation)
Globus account (for HuBMAP data access)
GPU recommended (for faster inference)

⚙️ Environment Setup

Option 1: Using Conda (Recommended)

# Clone the repository
git clone https://github.com/cns-iu/hra-deepcell-experiments.git
cd hra-deepcell-experiments

# Create and activate conda environment
conda create -n hra-deepcell python=3.12
conda activate hra-deepcell

# Install dependencies
pip install -r requirements.txt
pip install git+https://github.com/vanvalenlab/cellSAM.git
pip install git+https://github.com/vanvalenlab/deepcell-types@master

Option 2: Using pip/venv

# Create and activate virtual environment
python3.12 -m venv hra-deepcell

# Activate environment
source hra-deepcell/bin/activate       # macOS/Linux
hra-deepcell\Scripts\activate          # Windows

# Install dependencies
pip install -r requirements.txt
pip install git+https://github.com/vanvalenlab/cellSAM.git
pip install git+https://github.com/vanvalenlab/deepcell-types@master

🔑 API Configuration

DeepCell API

Step 1: Obtain your API token

Visit DeepCell API Key Management
Log in and generate a new API key
Copy the token

Step 2: Configure the token (choose one method)

A. For Conda users (persistent):

conda env config vars set -n hra-deepcell DEEPCELL_ACCESS_TOKEN=<your-token>
conda deactivate && conda activate hra-deepcell

B. For system environment (persistent):

macOS/Linux:

export DEEPCELL_ACCESS_TOKEN=<your-token>
# Add to ~/.bashrc or ~/.zshrc for persistence

Windows PowerShell:

setx DEEPCELL_ACCESS_TOKEN "<your-token>"

Windows CMD:

set DEEPCELL_ACCESS_TOKEN=<your-token>

C. In Python (temporary):

import os
os.environ["DEEPCELL_ACCESS_TOKEN"] = "<your-token>"

💡 Note: Restart your terminal or IDE after setting environment variables.

HuBMAP Globus API

1. Install Atlas Consortia CLI Tools

pip install atlas-consortia-clt

2. Authenticate with Globus

globus login --no-local-server

Follow the authentication link in your browser and authorize access.

Verify authentication:

globus whoami --verbose

3. Install Globus Connect Personal

Download and install Globus Connect Personal:

wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
tar -xzf globusconnectpersonal-latest.tgz
cd globusconnectpersonal-*
./globusconnectpersonal -start &

Follow the setup prompts. For detailed instructions, see Globus Connect Personal for Linux.

4. Create Manifest File

Create a manifest.txt file following the HuBMAP Manifest File Documentation.

⚠️ Important: Ensure the manifest file contains no comments, as they may cause parsing errors.

5. Transfer Data

hubmap-clt transfer manifest.txt -d /path/to/destination/data-original/

💡 Tip: Use screen or tmux to run long transfers in the background:
screen -S hubmap-transfer
hubmap-clt transfer manifest.txt -d /path/to/destination/
# Press Ctrl+A, then D to detach

6. Troubleshooting: Session Re-authentication

If you see "Session reauthentication required", run:

globus session update <SESSION-ID>

Follow the prompts to re-authenticate.

🔄 Workflow

Navigate to the scripts/ folder corresponding to your dataset (e.g., scripts/thymus/).

Step 1: Generate Descendant IDs

Extract descendant HuBMAP IDs from parent IDs:

python3 00_hubmap-id_desc.py

Input: CSV file with parent HuBMAP IDs
Output: CSV with descendant IDs

⚠️ Note: Some parent IDs may not have descendants — this is expected and handled automatically.

Obtaining HuBMAP API Token

To use the API:

Login to HuBMAP Portal
Navigate to the CCF-EUI Portal
Right-click → Inspect → Network tab
Find a request and copy the token from token=...& in the URL
Paste the token in 00_hubmap-id_desc.py

⚠️ Note: Tokens expire periodically. Repeat this process if authentication fails.

Step 2: Create Manifest Files

Generate manifest.txt files for each dataset:

python3 01_manifest_creation.py

Datasets without descendants are automatically skipped.

Step 3: Download Data

Transfer data using Globus:

hubmap-clt transfer manifest.txt -d /path/to/data-original/

💡 Tip: Run in a screen session for long downloads.

Step 4: Generate Configuration Files

Create config.yaml files with channel information:

python3 02_hubmap-config.py

This script:

Extracts nucleus_channel and cell_channel from pipeline metadata
Restructures file names in input-data/
Generates configuration files for inference

Step 5: Run Inference Pipeline

Navigate to the src/ directory and run:

python run_inference_pipeline.py \
    --input_root /path/to/input-data/ \
    --output_root /path/to/output-data/

What happens:

Segmentation — CellSAM identifies cell boundaries
Annotation — DeepCell Types predicts cell types
Output generation — Masks, CSVs, and summaries are saved

⚠️ Note: Datasets missing nucleus_channel will be automatically skipped.

Running Multiple GPU Processes

To utilize multiple GPUs simultaneously:

# Terminal 1 (GPU 0)
CUDA_VISIBLE_DEVICES=0 python run_inference_pipeline.py \
    --input_root /path/to/input-data-1/ \
    --output_root /path/to/output-data-1/ &

# Terminal 2 (GPU 1)
CUDA_VISIBLE_DEVICES=1 python run_inference_pipeline.py \
    --input_root /path/to/input-data-2/ \
    --output_root /path/to/output-data-2/

💡 Tip: Run each command in a separate screen session.

📊 Expected Outputs

After a successful run, each dataset will produce:

output-data/
├── <dataset-id>/
│   ├── mask.tif                    # Segmentation masks
│   ├── cell_populations.csv        # Morphological metrics
│   ├── cell_types.csv              # Cell type annotations

File descriptions:

mask.tif — Labeled instance segmentation masks
cell_populations.csv — Per-cell morphological features (area, perimeter, etc.)
cell_types.csv — Cell type predictions with confidence scores

🛠️ Troubleshooting

Common Issues

Issue: ModuleNotFoundError: No module named 'cellSAM'
Solution: Ensure you installed CellSAM from source:

pip install git+https://github.com/vanvalenlab/cellSAM.git

Issue: DeepCell API authentication failed
Solution: Verify your token is set correctly:

echo $DEEPCELL_ACCESS_TOKEN

Issue: CUDA out of memory
Solution: Reduce batch size or process fewer images at once. Use CUDA_VISIBLE_DEVICES to assign specific GPUs.

Issue: Globus transfer stalls
Solution: Ensure Globus Connect Personal is running:

./globusconnectpersonal -status

Issue: Missing nucleus channel
Solution: This is expected for some datasets. The pipeline will automatically skip them and continue.

📌 Results Comparison with hubmap-mirror-data-api

Use the two scripts from the scripts folder:

getsample.py – Reads HuBMAP Zarr archives from S3, extracts the cell_types/predictions attribute (hubmap_dct) from each dataset, converts it to a DataFrame with cell_type and CL_id columns, and writes one CSV per dataset to ross_results/. Skips datasets missing the required attribute and logs errors.
preprocess.py – Processes each CSV in ross_results/, counts cells per cell_type, calculates percentages, sorts by count descending, and outputs summary CSVs to processed_ross/ with cleaned filenames (e.g., *_deepcell_population.csv).

🧠 References

CellSAM: Documentation | GitHub
DeepCell Types: Tutorial | GitHub
DeepCell API: Setup Guide
HuBMAP: API Reference | CLI Guide
Globus: Connect Personal Installation

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
input-data/img_test		input-data/img_test
nbs		nbs
output-data/img_test		output-data/img_test
scripts		scripts
src		src
.gitignore		.gitignore
Globus.md		Globus.md
LICENSE		LICENSE
README.md		README.md
channel_preview.png		channel_preview.png
descendant_hubmapID.csv		descendant_hubmapID.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 HRA DeepCell Experiments

📑 Table of Contents

🚀 Overview

📋 Prerequisites

⚙️ Environment Setup

Option 1: Using Conda (Recommended)

Option 2: Using pip/venv

🔑 API Configuration

DeepCell API

HuBMAP Globus API

1. Install Atlas Consortia CLI Tools

2. Authenticate with Globus

3. Install Globus Connect Personal

4. Create Manifest File

5. Transfer Data

6. Troubleshooting: Session Re-authentication

🔄 Workflow

Step 1: Generate Descendant IDs

Obtaining HuBMAP API Token

Step 2: Create Manifest Files

Step 3: Download Data

Step 4: Generate Configuration Files

Step 5: Run Inference Pipeline

Running Multiple GPU Processes

📊 Expected Outputs

🛠️ Troubleshooting

Common Issues

📌 Results Comparison with hubmap-mirror-data-api

🧠 References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

cns-iu/hra-deepcell-experiments

Folders and files

Latest commit

History

Repository files navigation

🧬 HRA DeepCell Experiments

📑 Table of Contents

🚀 Overview

📋 Prerequisites

⚙️ Environment Setup

Option 1: Using Conda (Recommended)

Option 2: Using pip/venv

🔑 API Configuration

DeepCell API

HuBMAP Globus API

1. Install Atlas Consortia CLI Tools

2. Authenticate with Globus

3. Install Globus Connect Personal

4. Create Manifest File

5. Transfer Data

6. Troubleshooting: Session Re-authentication

🔄 Workflow

Step 1: Generate Descendant IDs

Obtaining HuBMAP API Token

Step 2: Create Manifest Files

Step 3: Download Data

Step 4: Generate Configuration Files

Step 5: Run Inference Pipeline

Running Multiple GPU Processes

📊 Expected Outputs

🛠️ Troubleshooting

Common Issues

📌 Results Comparison with hubmap-mirror-data-api

🧠 References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages