This repository contains a Python pipeline for processing Molecular Pixelation data, running QC and clustering workflows, and performing polarization/colocalization analyses with supporting visualizations. The entry point is main.py, which orchestrates data loading, preprocessing, dimensionality reduction, differential expression, and optional downstream analyses.
- Data aggregation: Loads multiple datasets and aggregates them into a single PXL object for reuse.
- Quality control: Computes QC metrics, plots control marker distributions, and tracks edge-count thresholds.
- Normalization: Applies CLR transformations (with and without control markers) and scaling.
- Clustering & embedding: Runs PCA, UMAP, and Louvain clustering for exploratory analysis.
- Differential expression: Performs DE on clusters and saves summary plots.
- Polarization analysis: Computes Moran’s I-based polarization metrics (optional step).
- Colocalization analysis: Computes Pearson correlation-based colocalization metrics and volcano plots (enabled by default in
run_workflow). - Spatial visualization: Optional spatial plots for curated marker sets.
.
├── main.py # CLI entry point
├── src/
│ ├── analysis.py # Core analysis steps (filters, PCA/UMAP, DE)
│ ├── colocalization/ # Colocalization analysis + plots
│ ├── data_loader.py # Dataset I/O
│ ├── polarization_analysis.py
│ ├── processor.py # Aggregation + PXL utilities
│ ├── runner.py # Pipeline orchestration
│ ├── stats.py # Shared stats utilities
│ ├── util.py
│ └── visualization/ # QC, DE, spatial, and embedding plots
├── nb/ # Notebooks
└── tests/
Update data locations and labels in src/constants.py:
DATA_PATHS/AGGREGATE_LABELSDATA_PATHS_PILOT/AGGREGATE_LABELS_PILOTCONTROL_MARKERSandCUTOFF
python main.pyTo force a full re-aggregation of data:
python main.py --remakeThe pipeline creates an output directory tree (see src/visualization/dirs.py) with:
- QC plots: control marker distributions, CLR comparisons, edge-rank plots, etc.
- Dimensionality reduction plots: PCA/UMAP colored by donor/sample/condition.
- Differential expression outputs: DE tables and volcano/clustermap visualizations.
- Colocalization outputs: marker pair analyses and volcano plots.
- Logs: timestamped run logs from
src/runner.py.
Aggregated data and intermediate PXL objects are cached in the output directories for faster reruns.
The default Runner.run_workflow() in src/runner.py executes:
- Load/aggregate data
- QC + preprocessing
- PCA/UMAP + Louvain + DE
- Colocalization analysis
Polarization analysis, spatial visualization, and export helpers are implemented but commented out by default. You can enable them in run_workflow() as needed.
- Use the
Runnerclass for programmatic access to each stage if you want to run steps independently.
MIT License