Skip to content

CellProfiling/CAR-TCell-MPX-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Molecular Pixelation CAR T-Cell Analysis Pipeline

This repository contains a Python pipeline for processing Molecular Pixelation data, running QC and clustering workflows, and performing polarization/colocalization analyses with supporting visualizations. The entry point is main.py, which orchestrates data loading, preprocessing, dimensionality reduction, differential expression, and optional downstream analyses.

What this pipeline does

  • Data aggregation: Loads multiple datasets and aggregates them into a single PXL object for reuse.
  • Quality control: Computes QC metrics, plots control marker distributions, and tracks edge-count thresholds.
  • Normalization: Applies CLR transformations (with and without control markers) and scaling.
  • Clustering & embedding: Runs PCA, UMAP, and Louvain clustering for exploratory analysis.
  • Differential expression: Performs DE on clusters and saves summary plots.
  • Polarization analysis: Computes Moran’s I-based polarization metrics (optional step).
  • Colocalization analysis: Computes Pearson correlation-based colocalization metrics and volcano plots (enabled by default in run_workflow).
  • Spatial visualization: Optional spatial plots for curated marker sets.

Repository structure

.
├── main.py                     # CLI entry point
├── src/
│   ├── analysis.py             # Core analysis steps (filters, PCA/UMAP, DE)
│   ├── colocalization/         # Colocalization analysis + plots
│   ├── data_loader.py          # Dataset I/O
│   ├── polarization_analysis.py
│   ├── processor.py            # Aggregation + PXL utilities
│   ├── runner.py               # Pipeline orchestration
│   ├── stats.py                # Shared stats utilities
│   ├── util.py
│   └── visualization/          # QC, DE, spatial, and embedding plots
├── nb/                         # Notebooks
└── tests/

Quick start

1. Configure data paths

Update data locations and labels in src/constants.py:

  • DATA_PATHS / AGGREGATE_LABELS
  • DATA_PATHS_PILOT / AGGREGATE_LABELS_PILOT
  • CONTROL_MARKERS and CUTOFF

2. Run the pipeline

python main.py

To force a full re-aggregation of data:

python main.py --remake

What gets produced

The pipeline creates an output directory tree (see src/visualization/dirs.py) with:

  • QC plots: control marker distributions, CLR comparisons, edge-rank plots, etc.
  • Dimensionality reduction plots: PCA/UMAP colored by donor/sample/condition.
  • Differential expression outputs: DE tables and volcano/clustermap visualizations.
  • Colocalization outputs: marker pair analyses and volcano plots.
  • Logs: timestamped run logs from src/runner.py.

Aggregated data and intermediate PXL objects are cached in the output directories for faster reruns.

Notes on pipeline stages

The default Runner.run_workflow() in src/runner.py executes:

  1. Load/aggregate data
  2. QC + preprocessing
  3. PCA/UMAP + Louvain + DE
  4. Colocalization analysis

Polarization analysis, spatial visualization, and export helpers are implemented but commented out by default. You can enable them in run_workflow() as needed.

Development tip

  • Use the Runner class for programmatic access to each stage if you want to run steps independently.

License

MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages