Skip to content

Raw signals were basecalled with Dorado and aligned to hg38.p13 using minimap2. Tumor fraction and CNVs were estimated with HMMcopy and ichorCNA. Motif analysis extracted the first four 5′ nucleotides per read, and NMF identified three dominant motif profiles capturing key variability among samples.

Notifications You must be signed in to change notification settings

DeCeccoLAB/cfDNA-ONT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FLARE Logo

Fragmenting the Future with FLARE:
A Comprehensive Fragmentomics Pipeline
Based on Long-read Nanopore Sequencing

After sequencing, the raw signal data were used for basecalling with the Dorado basecaller, and the resulting reads were aligned to the human reference genome (hg38) using minimap2. Tumor fraction and copy number variations (CNVs) were estimated using the HMMcopy and ichorCNA packages from Bioconductor/R.

For motif analysis, we extracted the first four nucleotides at the 5′ end of each aligned read using a method based on that described in Katsman et al., 2022. To reduce complexity and identify recurring patterns among samples, Non-negative Matrix Factorization (NMF) was applied to the matrix of motif frequencies. This approach allowed the extraction of three main profiles, each characterized by a specific set of dominant motifs, representing underlying patterns that explain the variability observed among the samples.

Basecalling and Demultiplexing

Command Basacaling

/path/to/dorado/bin/dorado basecaller hac \
  --modified-bases 5mCG_5hmCG \
  --emit-moves \
  --device cuda:all \
  --kit-name \
  --no-trim \
  --reference /path/to/reference/genome.fa \
  /path/to/input_pod5_directory \
  | samtools view -b -o /path/to/output/output_pass.bam

Command Demultiplexing

dorado demux \
  --output-dir ./bam_demuxed \
  --no-classify \
  output_pass.bam

Reference

Alignment and Filtering with minimap2

Before aligning reads with minimap2, you need to create an index (.mmi) from the reference genome FASTA file.

/path/to/minimap2 -d /path/to/reference_genome.mmi /path/to/reference_genome.fa
# Convert BAM to FASTQ
samtools fastq -@ 4 -O -n /path/to/input.bam | gzip > /path/to/output.fastq.gz

# Align FASTQ to reference genome with minimap2 and filter reads
/path/to/minimap2 -ax map-ont --MD -L /path/to/hg38.p13.mmi /path/to/output.fastq.gz \
  | samtools view -h -q 20 -F 0x4 -F 0x100 -F 0x800 \
  | awk '( $9 < 700 || $1 ~ /^@/ )' \
  | samtools view -bS -o /path/to/output.filtered.bam

Reference

Sorting and Indexing BAM Files

# Sort the filtered BAM file
samtools sort -@ 8 -o /path/to/output.sorted.bam /path/to/input.filtered.bam

# Index the sorted BAM file
samtools index /path/to/output.sorted.bam

Generating Read Counts with readCounter (HMMcopy)

/path/to/readCounter --window 1000000 --quality 20 \
  --chromosome "chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY" \
  /path/to/input.filtered.sorted.bam > /path/to/output.filtered.sorted.wig

Reference

Running ichorCNA for Tumor Fraction and CNV Analysis

# Create output directory
mkdir -p /path/to/ichorCNA_output

# Run ichorCNA
R_LIBS_USER=~/R/x86_64-pc-linux-gnu-library/4.5 \
Rscript /path/to/ichorCNA/scripts/runIchorCNA.R \
  --id SAMPLE_ID \
  --WIG /path/to/input.filtered.sorted.wig \
  --ploidy "c(2)" \
  --normal "c(0.95,0.99,0.995,0.999)" \
  --maxCN 3 \
  --gcWig /path/to/ichorCNA/extdata/gc_hg38_1000kb.wig \
  --mapWig /path/to/ichorCNA/extdata/map_hg38_1000kb.wig \
  --centromere /path/to/ichorCNA/extdata/GRCh38_centromere_acen.txt \
  --normalPanel /path/to/ichorCNA/extdata/HD_ULP_PoN_hg38_1Mb_median_normAutosome_median.rds \
  --includeHOMD False \
  --chrs "c(1:22, \"X\")" \
  --chrTrain "c(1:22)" \
  --estimateNormal True \
  --estimatePloidy True \
  --txnE 0.9999 \
  --txnStrength 10000 \
  --outDir /path/to/ichorCNA_output

Reference

🧬 Motif Analysis

For the analysis of 4-mer end motifs, a custom Python script was used: motif_analysis.py This script extracts end-motifs from Nanopore BAM files, generates frequency plots, pie charts of base composition, and outputs the top 20 motifs with counts and frequencies, along with a log file of filtering statistics.

The approach is based on the method described in [Katsman et al., 2022], adapted for genome-wide cfDNA analysis.

Usage example:

python3 /path/to/motif_analysis.py \
  --bam /path/to/sample.filtered.bam \
  --reference /path/to/reference_genome.fa \
  --chromosomes /path/to/chr_list.txt \
  --sample SAMPLE_NAME

To reduce complexity and identify recurring patterns among samples, Non-negative Matrix Factorization (NMF) was applied to the matrix of motif frequencies. This approach allowed the extraction of three main profiles, each characterized by a specific set of dominant motifs, representing underlying patterns that explain the variability observed among the samples.

Reference

📧 Contacts

If you have any questions or feedback, please contact us at:

About

Raw signals were basecalled with Dorado and aligned to hg38.p13 using minimap2. Tumor fraction and CNVs were estimated with HMMcopy and ichorCNA. Motif analysis extracted the first four 5′ nucleotides per read, and NMF identified three dominant motif profiles capturing key variability among samples.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages