Comprehensive genomic analysis of Salvelinus namaycush (lake trout) comparing two distinct ecotypes: lean and siscowet. This repository contains multiple integrated analyses including:
- RNAseq differential expression analysis using parasitized/non-parasitized liver tissue
- PacBio HiFi DNA methylation profiling and differential methylation analysis
- Presence-Absence Variation (PAV) analysis to identify structural genomic variations
- Interactive genome browser for visualizing genomic features
- Assembly: GCF_016432855.1 (SaNama_1.0)
- Species: Salvelinus namaycush (Lake Trout)
- Source: NCBI Genome
Analysis of liver RNAseq data from parasitized and non-parasitized samples (NCBI BioProject PRJNA316738) to identify:
- Differentially expressed genes (DEGs) between subspecies
- Differentially expressed transcripts (DETs) and alternative isoforms
- Expression differences related to parasite status
Key Results:
- 202 differentially expressed transcripts (p < 0.05)
- Analysis performed using Ballgown
- See
analyses/README.mdfor detailed results
Analysis Files:
code/01-ballgown-analysis.Rmd- Primary differential expression analysiscode/02-gene-explore.qmd- Gene-specific exploration
Whole-genome DNA methylation profiling using PacBio HiFi sequencing with 5mC modification calling:
- Sample-level methylation profiles for both ecotypes
- Differential methylation analysis between lean and siscowet
- Identification of Differentially Methylated Regions (DMRs)
Key Results:
- 540,040 CpG sites tested
- 4,440 significant differentially methylated cytosines (DMCs, p < 0.05)
- 302 Differentially Methylated Regions (DMRs)
- 20 hypermethylated in siscowet
- 282 hypomethylated in siscowet
Analysis Files:
code/04-pacbio/- PacBio workflow (alignment, QC, methylation calling)code/10-mCG-call.Rmd- Methylation callingcode/14-diff-meth.Rmd- Differential methylation analysiscode/14-diff-meth.py- Python implementation for DMR identification
Genome-wide structural variation analysis identifying insertions and deletions between ecotypes:
- Coverage-based detection of absent regions (deletions)
- CIGAR-based detection of novel insertions
- Ecotype-specific and shared structural variants
Key Results:
- Lean-specific: 996,228 variants (770,891 insertions + 225,337 deletions)
- Siscowet-specific: 1,332,705 variants (1,086,799 insertions + 245,906 deletions)
- Shared: 878,372 variants common to both ecotypes
Analysis Files:
code/11-pav.Rmd- PAV identification analysiscode/12-pav.py- Python implementationcode/15-diff-pav.py- Differential PAV analysis
Web-based genome browser for exploring PAV and methylation data across the genome.
Features:
- Visualize ecotype-specific insertions and deletions
- View differential methylation tracks
- Gene annotations with interactive navigation
- Mobile-responsive design
Live Demo: https://sr320.github.io/project-lake-trout/genome-browser/
Documentation: genome-browser/README.md
project-lake-trout/
├── code/ # Analysis scripts and notebooks
│ ├── 01-ballgown-analysis.Rmd # RNAseq differential expression
│ ├── 02-gene-explore.qmd # Gene exploration
│ ├── 04-pacbio/ # PacBio HiFi analysis workflow
│ ├── 05-pacbio-align.Rmd # PacBio alignment
│ ├── 07-pacbio-QC.Rmd # PacBio quality control
│ ├── 10-mCG-call.Rmd # Methylation calling
│ ├── 11-pav.Rmd # PAV analysis
│ ├── 14-diff-meth.Rmd/py # Differential methylation
│ └── 15-diff-pav.py # Differential PAV
├── data/ # Raw data and metadata
│ ├── SraRunTable.csv # RNAseq sample information
│ ├── ballgown-metadata.csv # Ballgown metadata
│ └── *.bed # Gene annotations
├── analyses/ # Analysis outputs and results
│ ├── DEG-*.csv # Differentially expressed genes
│ ├── DET-*.csv # Differentially expressed transcripts
│ ├── 04-pacbio/ # PacBio analysis outputs
│ ├── 14-diff-meth/ # Methylation results
│ └── 15-diff-pav/ # PAV results
├── genome-browser/ # Interactive genome browser
│ ├── index.html # Browser interface
│ ├── prepare_data.py # Data preparation script
│ └── data/ # Browser data files
└── figures/ # Generated figures and plots
See README files in each subdirectory for detailed information about specific analyses.
- Lean Nonparasitized: NPLL32, NPLL34, NPLL44, NPLL46, NPLL56, NPLL61
- Lean Parasitized: PLL20, PLL31, PLL43, PLL55, PLL59, PLL62
- Siscowet Nonparasitized: NPSL15, NPSL24, NPSL29, NPSL36, NPSL50, NPSL58
- Siscowet Parasitized: PSL13, PSL16, PSL35, PSL49, PSL53, PSL63
See data/SraRunTable.csv for complete RNAseq sample metadata.
- Lean: bc2041, bc2068, bc2069, bc2070
- Siscowet: bc2071, bc2072, bc2073, bc2096
The following data processing steps were performed prior to analyses in this repository:
- R/RStudio: Statistical analysis and visualization
- Ballgown: Differential expression analysis
- tidyverse: Data manipulation
- Python: Data processing and analysis pipelines
- pysam, pandas, numpy: Data manipulation
- modbampy: Modified base parsing
- PacBio Tools: HiFi sequencing analysis
- pbmm2: Read alignment
- pb-CpG-tools: Methylation calling
- IGV.js: Interactive genome visualization
- Quarto/RMarkdown: Reproducible analysis notebooks
If you use data or methods from this repository, please cite:
- Lake Trout RNAseq data: NCBI BioProject PRJNA316738
- Reference genome: GCF_016432855.1 (SaNama_1.0)
Roberts Lab
School of Aquatic and Fishery Sciences
University of Washington
For questions or issues, please open a GitHub issue in this repository.