Skip to content

RobertsLab/project-lake-trout

Repository files navigation

Lake Trout Genomics: Comparative Analysis of Lean and Siscowet Ecotypes

Overview

Comprehensive genomic analysis of Salvelinus namaycush (lake trout) comparing two distinct ecotypes: lean and siscowet. This repository contains multiple integrated analyses including:

  • RNAseq differential expression analysis using parasitized/non-parasitized liver tissue
  • PacBio HiFi DNA methylation profiling and differential methylation analysis
  • Presence-Absence Variation (PAV) analysis to identify structural genomic variations
  • Interactive genome browser for visualizing genomic features

Reference Genome

  • Assembly: GCF_016432855.1 (SaNama_1.0)
  • Species: Salvelinus namaycush (Lake Trout)
  • Source: NCBI Genome

Key Analyses

1. RNAseq Differential Expression

Analysis of liver RNAseq data from parasitized and non-parasitized samples (NCBI BioProject PRJNA316738) to identify:

  • Differentially expressed genes (DEGs) between subspecies
  • Differentially expressed transcripts (DETs) and alternative isoforms
  • Expression differences related to parasite status

Key Results:

  • 202 differentially expressed transcripts (p < 0.05)
  • Analysis performed using Ballgown
  • See analyses/README.md for detailed results

Analysis Files:

2. PacBio HiFi DNA Methylation Analysis

Whole-genome DNA methylation profiling using PacBio HiFi sequencing with 5mC modification calling:

  • Sample-level methylation profiles for both ecotypes
  • Differential methylation analysis between lean and siscowet
  • Identification of Differentially Methylated Regions (DMRs)

Key Results:

  • 540,040 CpG sites tested
  • 4,440 significant differentially methylated cytosines (DMCs, p < 0.05)
  • 302 Differentially Methylated Regions (DMRs)
    • 20 hypermethylated in siscowet
    • 282 hypomethylated in siscowet

Analysis Files:

3. Presence-Absence Variation (PAV)

Genome-wide structural variation analysis identifying insertions and deletions between ecotypes:

  • Coverage-based detection of absent regions (deletions)
  • CIGAR-based detection of novel insertions
  • Ecotype-specific and shared structural variants

Key Results:

  • Lean-specific: 996,228 variants (770,891 insertions + 225,337 deletions)
  • Siscowet-specific: 1,332,705 variants (1,086,799 insertions + 245,906 deletions)
  • Shared: 878,372 variants common to both ecotypes

Analysis Files:

4. Interactive Genome Browser

Web-based genome browser for exploring PAV and methylation data across the genome.

Features:

  • Visualize ecotype-specific insertions and deletions
  • View differential methylation tracks
  • Gene annotations with interactive navigation
  • Mobile-responsive design

Live Demo: https://sr320.github.io/project-lake-trout/genome-browser/

Documentation: genome-browser/README.md


Repository Structure

project-lake-trout/
├── code/                    # Analysis scripts and notebooks
│   ├── 01-ballgown-analysis.Rmd       # RNAseq differential expression
│   ├── 02-gene-explore.qmd            # Gene exploration
│   ├── 04-pacbio/                     # PacBio HiFi analysis workflow
│   ├── 05-pacbio-align.Rmd            # PacBio alignment
│   ├── 07-pacbio-QC.Rmd               # PacBio quality control
│   ├── 10-mCG-call.Rmd                # Methylation calling
│   ├── 11-pav.Rmd                     # PAV analysis
│   ├── 14-diff-meth.Rmd/py            # Differential methylation
│   └── 15-diff-pav.py                 # Differential PAV
├── data/                    # Raw data and metadata
│   ├── SraRunTable.csv                # RNAseq sample information
│   ├── ballgown-metadata.csv          # Ballgown metadata
│   └── *.bed                          # Gene annotations
├── analyses/                # Analysis outputs and results
│   ├── DEG-*.csv                      # Differentially expressed genes
│   ├── DET-*.csv                      # Differentially expressed transcripts
│   ├── 04-pacbio/                     # PacBio analysis outputs
│   ├── 14-diff-meth/                  # Methylation results
│   └── 15-diff-pav/                   # PAV results
├── genome-browser/          # Interactive genome browser
│   ├── index.html                     # Browser interface
│   ├── prepare_data.py                # Data preparation script
│   └── data/                          # Browser data files
└── figures/                 # Generated figures and plots

See README files in each subdirectory for detailed information about specific analyses.


Sample Information

RNAseq Samples

  • Lean Nonparasitized: NPLL32, NPLL34, NPLL44, NPLL46, NPLL56, NPLL61
  • Lean Parasitized: PLL20, PLL31, PLL43, PLL55, PLL59, PLL62
  • Siscowet Nonparasitized: NPSL15, NPSL24, NPSL29, NPSL36, NPSL50, NPSL58
  • Siscowet Parasitized: PSL13, PSL16, PSL35, PSL49, PSL53, PSL63

See data/SraRunTable.csv for complete RNAseq sample metadata.

PacBio HiFi Samples (for methylation and PAV analysis)

  • Lean: bc2041, bc2068, bc2069, bc2070
  • Siscowet: bc2071, bc2072, bc2073, bc2096

Pre-Analysis Data Processing

The following data processing steps were performed prior to analyses in this repository:


Technologies Used

  • R/RStudio: Statistical analysis and visualization
    • Ballgown: Differential expression analysis
    • tidyverse: Data manipulation
  • Python: Data processing and analysis pipelines
    • pysam, pandas, numpy: Data manipulation
    • modbampy: Modified base parsing
  • PacBio Tools: HiFi sequencing analysis
    • pbmm2: Read alignment
    • pb-CpG-tools: Methylation calling
  • IGV.js: Interactive genome visualization
  • Quarto/RMarkdown: Reproducible analysis notebooks

Citation

If you use data or methods from this repository, please cite:


Contact

Roberts Lab
School of Aquatic and Fishery Sciences
University of Washington

For questions or issues, please open a GitHub issue in this repository.

About

Lake Trout Genome Browser

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages