Skip to content

berlogo/scHi-C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

scHi-C

A Python toolkit for single-cell Hi-C data normalization and analysis

Python

scHi-C is a study project developed at the Bioinformatics Institute for exploring single-cell Hi-C data in Python. It provides a lightweight codebase and an example notebook to illustrate basic workflows in loading, processing and visualizing sparse single-cell Hi-C contact matrices.

Table of Contents

About Single-Cell Hi-C Data

Single-cell Hi-C (scHi-C) captures chromosome conformation in individual cells, revealing cell-to-cell variability in 3D genome organization. Unlike bulk Hi-C, scHi-C data presents unique analytical challenges:

  • Ultra-sparse contact matrices (>99.9% zeros)
  • Variable sequencing depth across cells
  • High technical noise requiring specialized normalization
  • Limited statistical power per individual cell

This toolkit implements normalization methods adapted for these challenges, based on established algorithms from the Hi-C analysis ecosystem.

Features

Normalization Methods:

  • ICE - Iterative Correction and Eigenvector decomposition (for high-coverage cells)
  • VC - Vanilla Coverage normalization
  • Coverage-based - Simple coverage normalization for sparse data
  • SCALE - Total count scaling (optimal for ultra-sparse data)

Data Processing:

  • Automatic method selection based on data sparsity
  • Parallel processing for large cell collections
  • Memory-efficient sparse matrix operations
  • Integration with cooler/cooltools ecosystem

Quality Control:

  • Sparsity pattern diagnosis
  • Coverage analysis and filtering
  • Cell quality assessment
  • Normalization validation

Visualization:

  • Contact map plotting
  • Quality control plots
  • Before/after normalization comparisons
  • Group-based analysis

Installation

Requirements

  • Python 3.8 or higher
  • 16GB+ RAM recommended for large datasets

Install from Source

git clone https://github.com/berlogo/scHi-C
cd scHi-C
pip install -e .

Install Dependencies

pip install -r requirements.txt

Verify Installation

# Test basic imports
from schic_normalization import scHiCNormConfig
from schic_class import scHiC
from scHiC_vis import plot_hic_maps
print("scHiCTools imported successfully!")

Quick start

  1. Clone the repository

    git clone https://github.com/berlogo/scHi-C.git
    cd scHi-C
  2. Open the example notebook

Launch Jupyter and open Example.ipynb to walk through basic data loading, metadata annotation, grouping and visualization steps:

jupyter lab Example.ipynb

Normalization Methods

Method Best for Description
SCALE Ultra-sparse (<10K contacts) Simple total count scaling
Coverage Sparse (10K-100K contacts) Coverage-based normalization
VC Medium-sparse (100K-1M contacts) Vanilla Coverage (square root)
ICE Dense (>1M contacts) Iterative correction (computationally intensive)

The toolkit automatically selects the optimal method based on your data's sparsity patterns.

Data Sources

This toolkit has been tested on single-cell Hi-C datasets from:

Public Datasets

  • GEO: GSE117876 - Human prefrontal cortex (Ramani et al., 2017)

Supported Data Formats

  • Cool files (.cool) - Primary format, created by cooler

References

Key Publications

Single-cell Hi-C Methods:

  • Tian, D. et al. (2023). Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature Genetics. DOI: 10.1038/s41588-023-01389-1
  • Ramani, V. et al. (2017). Massively multiplex single-cell Hi-C. Nature Methods, 14, 263–266. DOI: 10.1038/nmeth.4155
  • Nagano, T. et al. (2017). Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature, 547, 61–67. DOI: 10.1038/nature23001

Hi-C Analysis Tools:

Normalization Methods:

  • Imakaev, M. et al. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods, 9, 999–1003. DOI: 10.1038/nmeth.2148
  • Lieberman-Aiden, E. et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289-293. DOI: 10.1126/science.1181369
  • Cournac, A. et al. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13, 436. DOI: 10.1186/1471-2164-13-436

Troubleshooting

Common Issues

Import errors

# Make sure your Python files are in the correct location
# and adjust import paths as needed
import sys
sys.path.append('/path/to/your/code/')
from schic_normalization import scHiCNormConfig

"No cells were successfully normalized"

# Try ultra-sparse configuration
config = scHiCNormConfig.for_ultra_sparse_data()
config.method = "SCALE"
# Then run normalize_schic with this config

Memory errors with large datasets

# Process fewer cells at once
config = scHiCNormConfig()
config.chunk_size = 50  # Reduce from default
config.use_float32 = True  # Use less memory

ICE convergence issues

# Use simpler methods for sparse data
config = scHiCNormConfig(method="coverage")  # Instead of ICE

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •