Single cell RNA-seq analysis Snakemake

This repository contains a modular Snakemake pipeline for single-cell RNA sequencing (scRNA-seq) data analysis. It automates quality control, integration, cell type annotation (using CellTypist), and marker gene identification, ensuring reproducibility and scalability for large datasets.

Pipeline Overview

The workflow performs the following steps:

Quality control (QC) per sample
- Generates cleaned .h5ad files
- Creates UMAP plots
- Exports per-cluster marker genes
Integration of all cleaned samples into a single dataset using Harmony
Cell type annotation using CellTypist
Marker gene ranking for integrated clusters

Directory Structure

Expected project layout:

project/
├─ Snakefile
├─ samples.csv # metadata with sample IDs and paths
├─ scripts/ # analysis scripts called by Snakemake
│ ├─ 02_qc.py
│ ├─ 03_integration.py
│ ├─ 04_celltyping.py
│ └─ 05_markers.py
├─ environments/ # conda environments
│ └─ sc.yml
├─ data/ # optional: raw input files (.h5ad)
├─ 02_output/ # cleaned per-sample data
├─ 03_output/ # integrated dataset
├─ 04_output/ # annotated data + markers
└─ qc/ # QC plots and markers per sample

Input

The pipeline requires a samples.csv file with at least two columns:

sample_id → short name for the sample (used in file naming)
filtered_matrix_path → path to the raw/filtered .h5 file (output from Cell Ranger)

Example:

sample_id,filtered_matrix_path
S01,/path/to/S01_filtered.h5
S02,/data/run_A/S02_filtered.h5

1. Setup conda environment and install Snakemake

conda install bioconda::snakemake

2. Then go to the directory with Snakefile and in terminal run (adjust cores as available)

snakemake --use-conda --cores 4

📚 Acknowledgements

Snakemake for workflow management
Harmony for integration
CellTypist for cell type annotation
Developed as part of a project at the Kuppe Lab of Quantitative Cell Dynamics and Translational Systems Biology, UKAachen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single cell RNA-seq analysis Snakemake

This repository contains a modular Snakemake pipeline for single-cell RNA sequencing (scRNA-seq) data analysis. It automates quality control, integration, cell type annotation (using CellTypist), and marker gene identification, ensuring reproducibility and scalability for large datasets.

Pipeline Overview

Directory Structure

Input

1. Setup conda environment and install Snakemake

2. Then go to the directory with Snakefile and in terminal run (adjust cores as available)

📚 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
environments		environments
figures		figures
qc		qc
scripts		scripts
README.md		README.md
Snakefile		Snakefile
samples.csv		samples.csv

Dana162001/scRNAseq_analysis_Snakemake

Folders and files

Latest commit

History

Repository files navigation

Single cell RNA-seq analysis Snakemake

This repository contains a modular Snakemake pipeline for single-cell RNA sequencing (scRNA-seq) data analysis. It automates quality control, integration, cell type annotation (using CellTypist), and marker gene identification, ensuring reproducibility and scalability for large datasets.

Pipeline Overview

Directory Structure

Input

1. Setup conda environment and install Snakemake

2. Then go to the directory with Snakefile and in terminal run (adjust cores as available)

📚 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages