Skip to content

Tutorial

Harry Li edited this page Mar 14, 2025 · 14 revisions

Demo Data source

For this tutorial, we'll be using the RNA-Seq data generated by Yu et al 2021. In this work, they report that Messenger RNA 5′ NAD+ capping is a dynamic regulatory epitranscriptome mark that is required for proper response to abscisic acid in Arabidopsis. A graphic abstract is shown below: tutorial_ref_article

Yu et al 2021 article graphic abstract

Pulling HAMRLNC Docker Image

To run HAMRLNC, you need to first pull the docker image for the pipeline to your computer if you haven't already done so.

Pull HAMRLNC docker image. This should take a few minutes depending on your internet speed.

docker pull chosenobih/hamrlnc:v0.04

After building the container, run the code below to be sure that you now have the image on your computer

docker image ls

Your output should be similar to the image below:

Screenshot from 2024-09-03 16-24-48

clone HAMRLNC repo

git clone https://github.com/harrlol/HAMRLNC
cd HAMRLNC

download the genome file for Arabidopsis thaliana from ENSEMBL

wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-57/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
gunzip Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz

download the annotation file for Arabidopsis thaliana from ENSEMBL

wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-57/gff3/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.57.gff3.gz
gunzip Arabidopsis_thaliana.TAIR10.57.gff3.gz

run HAMRLNC with SRA IDs with all three arms activated

docker run \
  --rm -v $(pwd):/working-dir \
  -w /working-dir chosenobih/hamrlnc:v0.04 \
  -o test_run \
  -c /demo/demo_filenames.csv \
  -g Arabidopsis_thaliana.TAIR10.dna.toplevel.fa \
  -i Arabidopsis_thaliana.TAIR10.59.gff3 \
  -l 50 -n 8 -k -p -u -r -t

Output Interpretation

All outputs of HAMRLNC are organized in corresponding subdirectories of the output directory. When run with all three core processing enabled, HAMRLINC produces ten subdirectories in the output directory. Three subdirectories contain key intermediates like genome index files, trimmed fastq files and bed files, which can be used in various downstream processing of the user’s choice. Three other subdirectories contain the raw output for each of the three core functionalities; one last subdirectory contains the visualizations and post-HAMR analysis results.

hamrlnc_fig1

Fig. 1 (a-g) Bar plots of the total abundance of HAMR predicted modifications by sample groups in CDS, exon, 5’ UTR, gene, ncRNA, primary mRNA, 3’ UTR regions

hamrlnc_fig2

Fig. 2 (a-b) Bar plots of HAMR predicted modification abundance located in different ncRNA types and RNA subtypes

NOTE: Under -p mode, the modifications found on predicted lncRNAs can be potentially duplicated because we have decided to keep all isoforms (see line 4011-4016 in table below for one such example). For all other overlapped libraries, only the primary isoform is kept, thus each count of modification does not involve duplications. screenshot_from_2024-08-24_15-34-17_720

hamrlnc_fig3

Fig. 3 (a-g) Bar plots of the abundance of HAMR predicted modification classes by sample groups in CDS, exon, 5’ UTR, gene, ncRNA, primary mRNA, 3’ UTR regions. (h) Number of HAMR predicted modifications per gene region.

hamrlnc_fig4

Fig. 4 (a) Distribution of modification types in gene regions by sample groups. (b) Distribution of modification types in gene regions.

hamrlnc_fig5

Fig. 5 GO term heatmap and predicted enrichment landscape

Clone this wiki locally