Skip to content

Dana162001/scRNAseq_analysis_Snakemake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Single cell RNA-seq analysis Snakemake

This repository contains a modular Snakemake pipeline for single-cell RNA sequencing (scRNA-seq) data analysis. It automates quality control, integration, cell type annotation (using CellTypist), and marker gene identification, ensuring reproducibility and scalability for large datasets.


Pipeline Overview

The workflow performs the following steps:

  1. Quality control (QC) per sample

    • Generates cleaned .h5ad files
    • Creates UMAP plots
    • Exports per-cluster marker genes
  2. Integration of all cleaned samples into a single dataset using Harmony

  3. Cell type annotation using CellTypist

  4. Marker gene ranking for integrated clusters


Directory Structure

Expected project layout:

project/
β”œβ”€ Snakefile
β”œβ”€ samples.csv # metadata with sample IDs and paths
β”œβ”€ scripts/ # analysis scripts called by Snakemake
β”‚ β”œβ”€ 02_qc.py
β”‚ β”œβ”€ 03_integration.py
β”‚ β”œβ”€ 04_celltyping.py
β”‚ └─ 05_markers.py
β”œβ”€ environments/ # conda environments
β”‚ └─ sc.yml
β”œβ”€ data/ # optional: raw input files (.h5ad)
β”œβ”€ 02_output/ # cleaned per-sample data
β”œβ”€ 03_output/ # integrated dataset
β”œβ”€ 04_output/ # annotated data + markers
└─ qc/ # QC plots and markers per sample

Input

The pipeline requires a samples.csv file with at least two columns:

  • sample_id β†’ short name for the sample (used in file naming)
  • filtered_matrix_path β†’ path to the raw/filtered .h5 file (output from Cell Ranger)

Example:

sample_id,filtered_matrix_path
S01,/path/to/S01_filtered.h5
S02,/data/run_A/S02_filtered.h5

1. Setup conda environment and install Snakemake

conda install bioconda::snakemake

2. Then go to the directory with Snakefile and in terminal run (adjust cores as available)

snakemake --use-conda --cores 4

πŸ“š Acknowledgements

  • Snakemake for workflow management
  • Harmony for integration
  • CellTypist for cell type annotation
  • Developed as part of a project at the Kuppe Lab of Quantitative Cell Dynamics and Translational Systems Biology, UKAachen

About

Snakemake pipeline for automatic scRNA-seq data analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages