gedi2py

Gene Expression Decomposition and Integration

A scverse-compliant Python package for single-cell RNA-seq batch correction and dimensionality reduction using the GEDI algorithm.

Original implementation of gedi2 as an R library is available at: https://github.com/csglab/gedi2

Overview

gedi2py implements a latent variable model for integrating single-cell RNA sequencing data across multiple samples and batches. It learns shared gene expression patterns while correcting for technical batch effects, producing batch-corrected cell embeddings suitable for downstream analysis.

Installation

pip (recommended)

pip install gedi2py

From source

git clone https://github.com/csglab/gedi2py.git
cd gedi2py
pip install -e .

Requirements

Python >= 3.10
C++14 compiler
Eigen3 >= 3.3.0
CMake >= 3.15

See the Installation Guide for detailed instructions.

Quick Start

import gedi2py as gd
import scanpy as sc

# Load data
adata = sc.read_h5ad("data.h5ad")

# Preprocess
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

# Run GEDI batch correction
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# Visualize
gd.tl.umap(adata)
gd.pl.embedding(adata, color=["sample", "cell_type"])

Features

Memory-efficient: C++ backend keeps large matrices in native memory
Fast: OpenMP parallelization for multi-threaded optimization
scverse-compliant: Works seamlessly with AnnData and scanpy
Flexible: Supports counts, log-transformed data, paired data (e.g., CITE-seq), and binary indicators
Comprehensive: Includes projections, embeddings, imputation, and differential analysis

Paired Data Mode (M_paired)

gedi2py supports paired count data stored in two AnnData layers, useful for:

CITE-seq (ADT vs RNA)
Dual-modality assays
Ratio-based analyses

# Two layers: 'm1' (numerator counts) and 'm2' (denominator counts)
# GEDI models: Yi = log((M1+1)/(M2+1))
gd.tl.gedi(
    adata,
    batch_key="sample",
    layer="m1",      # First count matrix
    layer2="m2",     # Second count matrix
    n_latent=10
)

Documentation

Full documentation is available at csglab.github.io/gedi2py:

API Overview

gedi2py follows the scanpy convention with submodules:

Module	Description
`gd.tl`	Tools: model training, projections, embeddings, imputation, differential
`gd.pl`	Plotting: embeddings, convergence, features
`gd.io`	I/O: H5AD, 10X formats, model persistence

import gedi2py as gd

# Tools
gd.tl.gedi(adata, batch_key="sample")
gd.tl.umap(adata)

# Plotting
gd.pl.embedding(adata, color="cell_type")
gd.pl.convergence(adata)

# I/O
adata = gd.read_h5ad("data.h5ad")
gd.io.save_model(adata, "model.h5")

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gedi2py

Overview

Installation

pip (recommended)

From source

Requirements

Quick Start

Features

Paired Data Mode (M_paired)

Documentation

API Overview

License

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

csglab/gedi2py

Folders and files

Latest commit

History

Repository files navigation

gedi2py

Overview

Installation

pip (recommended)

From source

Requirements

Quick Start

Features

Paired Data Mode (M_paired)

Documentation

API Overview

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages