SNU-Seq Data Processing and Analysis

These scripts are used in the following study:

Gerlevik, U., Lorenz, P., Lamstaes, A., Fischl, H., Xi, S., Saukko-Paavola, A., Murray, S., Brown, T., Welch, A., George, C., Angel, A., Furger, A., Mellor, J. (2025). "Single Nucleotide Resolution 4sU Sequencing (SNU-Seq) reveals the transcriptional responsiveness of an epigenetically primed human genome". bioRxiv.

Scripts in order

1. "scripts/HEK293/1_genome_and_annotations"

Build STAR genome with hg38 for mapping
Find genomic-A stretches and merge with ENCODE blacklist
Prepare GENCODE v46 annotations for metagenes and counting transcription start site (TSS), gene body, gene end and readthrough regions
Prepare housekeeping genes from RSeQC for counting in quality control steps

2. "scripts/HEK293/2_preprocessing_and_qualityControl"

Read quality control via FASTQC
Adapter and quality trimming using bbduk.sh
Map the reads to the built STAR genome
Report quality control via MultiQC
Principal component analysis via deepTools

3. "scripts/HEK293/3_filtering_normalisation_and_merging"

Filter blacklist, genomic A stretches and top/bottom 0.5% signal as "outliers" skewing the data using bedtools and awk
Count the spike-in reads via featureCounts
Estimate size factors from spike-ins using DESeq2
Merge replicates of the same libraries to enhance the signal via bedtools and awk

4. "scripts/HEK293/4_subtacting_noPAP_from_bPAP"

Count the reads at the 3' end region to normalise between +bPAP and no bPAP libraries using bedtools since they have different library complexity with an expectation of a fairly similar profile of polyadenylated transcripts at the 3' end
Estimate size factors from 3' end regions using DESeq2 and apply them via bedtools and awk
Subtract no bPAP from +bPAP at single nucleotide resolution using deepTools
Remove negatives emerged because of the unmatched regions between +bPAP and no bPAP, and quantify them using awk, ggplot2 and ggpubr
Generate bedgraphs to view on Integrative Genomics Viewer (IGV) with negative values for the reverse strand using awk

5. "scripts/HEK293/5_metagene_analysis"

Compute and plot metagenes using deepTools

6. "scripts/HEK293/6_spliceJunction_analysis"

Calculate splicing efficiency using SPLICE-q
Calculate mean splicing efficiency per sample and visualise the distributions using ggplot2 and ggpubr

7. "scripts/HEK293/7_synthesis_decay_pausing_rate_analysis"

Count the reads at TSS, gene body, gene end and readthrough regions using bedtools map by summing the signal in the processed bedgraph files
Prepare a counts table involving all region counts and all samples and normalise the summed signal dividing by the width of the regions
Calculate synthesis & decay rates, pausing index and termination efficiency. k-means clustering the genes based on the synthesis rate and visualise them using ggpubr

8. "scripts/HEK293/8_TSS_enrichment_in_bPAP_over_noPAP"

Calculate the median fold change of the signal in +bPAP over no bPAP at the TSS region

9. "scripts/HEK293/9_comparison_with_TTseq_PROseq_NETseq"

Put Phil's TT-Seq GSM5452296 and publicly available TT-Seq GSM4730176, PRO-Seq GSM4730174 and mNET-Seq GSM7990390 data in HEK293 cells to the same scale
Compute and plot metagenes of SNU-Seq, TT-Seq, PRO-Seq and mNET-Seq using deepTools
Generate negative reverse strand bedgraph to visualise Phil's TT-Seq data on IGV

10. "scripts/HEP3B/1_prepare_ATACseqPeaks_and_FANTOM5_forMetagenes"

Prepare Anna's ATAC-Seq GSE172053 peaks and FANTOM5 enhancers data in a similar way to the genome annotation preparation in HEK293 cells to get clean/non-intersecting regions for a reliable metagene representation

11. "scripts/HEP3B/2_prepare_SNUseq_ChIPseq_data"

Prepare Anna's SNU-Seq and ChIP-Seq of H3K27ac and H3K4me3 data from GSE172053 by file type conversions, concatenation and log2 transformation
Calculate H3K27ac to H3K4me3 ratio using deepTools and summarise the resulting ratios

12. "scripts/HEP3B/3_signal_presence_in_ATACpeak_and_FANTOM5_regions"

Determine the SNU-seq, H3K27ac and H3K4me3 signal and quantify the distribution of them on the ATAC-Seq peaks and FANTOM5 regions

13. "scripts/HEP3B/4_metagene_analysis"

Sort the ATAC-Seq peaks and FANTOM5 annotations accordingly, and compute and plot metagenes using deepTools

Dependencies & Environments

This pipeline utilizes multiple Conda environments to manage dependencies for different stages of the analysis. All environment configuration files are located in the envs/ directory.

Environment List

Analysis Stage	Environment File	Key Tools
DeepTools	`envs/deeptools_env.yml`	DeepTools, Python 3.8
QC aggregation	`envs/multiqc_env.yml`	MultiQC
Genomic-A flagging	`envs/py27.yml`	Python 2.7 scripts
Splicing analysis	`envs/spliceQ_env.yml`	SPLICE-q

Installation

To replicate a specific environment, use the following command structure:

# Example: Creating the DeepTools environment (via mamba, it is directly replacable with "conda")
mamba env create -f envs/deeptools_env.yml

# Activate the environment (via mamba, it is directly replacable with "conda")
mamba activate deeptools_env

R Dependencies

R packages are included within the respective conda YAML files where possible. For most of the R and package versions used in the scripts, please refer to envs/R_versions.txt.

References

manschmi/MexNab_3seq
manschmi/MS_Metagene_Tools
nf-core/rnaseq
See the used R packages and other tools in the scripts and under the envs/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
envs		envs
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SNU-Seq Data Processing and Analysis

These scripts are used in the following study:

Scripts in order

Dependencies & Environments

Environment List

Installation

R Dependencies

References

About

Uh oh!

Releases

Packages

Languages

ugerlevik/SNU-seq

Folders and files

Latest commit

History

Repository files navigation

SNU-Seq Data Processing and Analysis

These scripts are used in the following study:

Scripts in order

Dependencies & Environments

Environment List

Installation

R Dependencies

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages