This repository contains the code to reproduce all analyses in the following manuscript:
by Anita Térmeg, Johanna Geuder, Vladyslav Storozhuk, Zane Kliesmete, Fiona C. Edenhofer, Beate Vieth, Philipp Janssen, Ines Hellmann
CroCoNet – the framework underlying these analyses – is available as an R package, accompanied by detailed documentation and a step-by-step tutorial.
The raw data necessary to reproduce these analyses can be found on ArrayExpress and GEO:
| Accession | Dataset |
|---|---|
| E-MTAB-15695 | scRNA-seq data |
| E-MTAB-13373 & E-MTAB-15654 | ATAC-seq data |
| GSE298717 (currently private) | single-cell CRISPRi data |
Processed data files have been deposited to Zenodo:
To be able to smoothly run all analyses, please follow these steps:
-
Start a new R project in a new directory
-
Clone this Github repository to a subdirectory scripts
-
Download the linked Zenodo repository to a subdirectory data
The main example data throughout the paper is a scRNA-seq dataset obtained during the early neural differentiation of human, gorilla and cynomolgus macaque induced pluripotent (iPS) cell lines. The relevant scripts to analyze this dataset are the following:
Mapping & QC
- Download the reference genomes
- Create non-human primate annotations using Liftoff
- Filter Liftoff annotations
- Remove small contigs from the gorGor6 genome for quicker mapping
- Create STAR indices
- Download FASTQ files of the scRNA-seq data
- Trim poly-A tails
- Map reads using the zUMIs pipeline
- Download the cell type annotation reference dataset
- QC, filtering, normalization, cell type annotation and pseudotime inference
Network inference
Start here to skip computationally intensive steps and jump to core analysis
CroCoNet analysis
To validate the results of the CroCoNet analysis on the neural differentiation dataset, we used various additional datasets and databases. These lines of analysis can be reproduced using the following scripts:
Regulator interactions and module overlaps
Pathway enrichment analysis
Binding site enrichment and divergence
- Download FASTQ files of the ATAC-seq data
- Trim adapters
- Create bwa-mem2 indices
- Map ATAC-seq reads using bwa-mem2
- Name-sort BAM files
- Call peaks using Genrich without blacklists
- Generate blacklists from broad and high-intensity peaks
- Call peaks using Genrich with blacklists
- Infer gorilla NPC peaks by liftOver of the human NPC peaks
- Download FASTQ files of the Nanopore data
- Create minimap2 indices
- Map Nanopore reads using minimap2
- Merge Nanopore BAMs per species and cell type
- Reconstruct Nanopore transcripts using pinfish
- Annotate Nanopore transcripts
- Identify active transcriptional start sites
- Associate peaks to genes based on distance
- Retrieve motifs of the central regulators from the JASPAR and IMAGE databases
- Score motifs of each regulator in the peaks associated to their module members using Cluster-Buster
- Summarize motif scores per gene
Start here to skip computationally intensive steps and jump to core analysis
Sequence divergence
Expression pattern divergence
Analysis of POU5F1 ChIP-seq data
- Download FASTQ files of the ChIP_seq data
- Create bowtie indices
- Map ChIP-seq reads using bowtie
- Name-sort BAM files
- Call peaks using Genrich
- Calculate ChIP-seq coverage
- Check enrichment around transcriptional start sites
Start here to skip computationally intensive steps and jump to core analysis
Enrichment of LTR7 elements near POU5F1 module members
Analysis of POU5F1 single-cell CRISPRi data
- Annotate dCas9 cassette
- Create reference genome sequences and annotations with the dCas9 cassette
- Download FASTQ files of Pertub-seq data
- Map reads using CellRanger
- Species demultiplexing
- Identify the cells of each species
- Individual demultiplexing
- QC & filtering
Start here to skip computationally intensive steps and jump to core analysis
As a second, more complex example dataset, we used published snRNA-seq data of brain samples from five primate species (Jorstad et al. 2023). This study sampled the middle temporal gyrus of several human, chimp, gorilla, rhesus macaque and marmoset donors. The following scripts contain the code to run CroCoNet analysis on this dataset:
Data preparation
Network inference
Start here to skip computationally intensive steps and jump to core analysis
CroCoNet analysis
This directory contains all scripts required to reproduce the figures and tables featuring in the manuscript.
