Gene Expression Decomposition and Integration
A scverse-compliant Python package for single-cell RNA-seq batch correction and dimensionality reduction using the GEDI algorithm.
Original implementation of gedi2 as an R library is available at:
https://github.com/csglab/gedi2
gedi2py implements a latent variable model for integrating single-cell RNA sequencing data across multiple samples and batches. It learns shared gene expression patterns while correcting for technical batch effects, producing batch-corrected cell embeddings suitable for downstream analysis.
pip install gedi2pygit clone https://github.com/csglab/gedi2py.git
cd gedi2py
pip install -e .- Python >= 3.10
- C++14 compiler
- Eigen3 >= 3.3.0
- CMake >= 3.15
See the Installation Guide for detailed instructions.
import gedi2py as gd
import scanpy as sc
# Load data
adata = sc.read_h5ad("data.h5ad")
# Preprocess
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# Run GEDI batch correction
gd.tl.gedi(adata, batch_key="sample", n_latent=10)
# Visualize
gd.tl.umap(adata)
gd.pl.embedding(adata, color=["sample", "cell_type"])- Memory-efficient: C++ backend keeps large matrices in native memory
- Fast: OpenMP parallelization for multi-threaded optimization
- scverse-compliant: Works seamlessly with AnnData and scanpy
- Flexible: Supports counts, log-transformed data, paired data (e.g., CITE-seq), and binary indicators
- Comprehensive: Includes projections, embeddings, imputation, and differential analysis
gedi2py supports paired count data stored in two AnnData layers, useful for:
- CITE-seq (ADT vs RNA)
- Dual-modality assays
- Ratio-based analyses
# Two layers: 'm1' (numerator counts) and 'm2' (denominator counts)
# GEDI models: Yi = log((M1+1)/(M2+1))
gd.tl.gedi(
adata,
batch_key="sample",
layer="m1", # First count matrix
layer2="m2", # Second count matrix
n_latent=10
)Full documentation is available at csglab.github.io/gedi2py:
gedi2py follows the scanpy convention with submodules:
| Module | Description |
|---|---|
gd.tl |
Tools: model training, projections, embeddings, imputation, differential |
gd.pl |
Plotting: embeddings, convergence, features |
gd.io |
I/O: H5AD, 10X formats, model persistence |
import gedi2py as gd
# Tools
gd.tl.gedi(adata, batch_key="sample")
gd.tl.umap(adata)
# Plotting
gd.pl.embedding(adata, color="cell_type")
gd.pl.convergence(adata)
# I/O
adata = gd.read_h5ad("data.h5ad")
gd.io.save_model(adata, "model.h5")MIT License - see LICENSE for details.