penguin-analysis

Benchmarking for PenguiN paper

|--benchmarking: Includes all data and scripts used for generating the benchmark results
|
|--results: Includes all relevant assembly results as text files
|
|--notebooks: Includes jupyter notebooks for generating figures
|
|--figures: Includes subfigures as generated by the jupyter notebooks
|
|--multipanels: Includes final paper figures

Required Software

The respective software versions of the assembly tools used in the benchmark are: PenguiN (GitHub commit 7571d37), Megahit (v1.2.9), metaSPAdes/rnaSPAdes/metaviralSPAdes/rnaviralSPAdes (v3.15.2), SAVAGE (v0.4.2), IVA (v1.0.8), VICUNA (v1.3), Haploflow (v0.1)

For simulating reads: randomreads.sh from the BBMAP software suite (version 38.71)

For preprocessing: Trimmomatic (version 0.39), Cutadapt (version 3.3)

For evaluation: MetaQUAST (version 5.0.2), MMseqs (GitHub commit efacc69)

For the ssRNA detection pipeline: HMMER package (version 3.2.1), Prodigal (version 2.6.3)

BENCHMARKING

rhinovirus-3-mixture

This folder includes the command line to generate the reads from the input three genomes, to run the assembly tools and subsequent metaquast for the analysis rhinovirus-3-mixture/input/ contains the benchmark reads and the orginial genomes rhinovirus-3-mixture/assembler/ contains how the assembly tools were called

HIV-mixtures

This folder includes the script to generate the cutAndDouble input genomes (see paper) and simulate the reads for all three coverage sets used in the benchmarks. Reads are not included due to size but can be re-geenerated using make_benchmarkset.sh or requested from the authors. HIV-mixtures/assembler/ contains how the assembly tools were called HIV-mixtures/evaluation/ contains all scripts used for the precision and sensitivity analysis

metatranscriptomes

This folder includes the scripts and data for the analysis of ssRNA phages from real metatranscriptomic samples from activated sludge and aquatic environments. It contains the list of 82 metatranscriptomics sample identifer for the samples from activated sludge and aquatic environments, used in Callanan et a., 2020 [1] as well as our download script.

preprocessData.sh describes the preprocessing steps using Trimmomatic and Cutadapt
metatranscriptomes/assembler includes the scripts to run the assembly tools
metatranscriptomes/evaluation includes the scripts to detect and classify ssRNA phage contigs in the assemblies

getContigsOfInterest.sh extracts all contigs >=750bp, including at least HMM protein hit evaluateContigs.sh performs clustering of contigs of interest at 99% idendity, filter step by step to the three sets: contigs containing at least 2 protein hits (partial genomes), 3 protein hits (near-complete genomes), 3 protein hits without edge proteins (complete genomes) This data are used in the per-sample analysis. This allows for example the per-location analysis (Fig. 6A)

For the over all sample analysis use run_all_summaries.sh (Fig. 5) It contains the following steps (1) run_total_detection collects for each assembler all contigs of interest over all samples, cluster subsequently at 99% identity, and writes the three sets (parital genomes, near-complete genomes, complete genomes) (1b) run_total_detection_unique does the same but with 100% identity (as in the orginal study, Callanan et al., 2020) (2) cluster_analysis clusters rdrp proteins from the complete genomes at diffefent idenity levels (Fig. 6C) (3) createAssemblyStatistic writes for each tool time, memory, and statistic of contigs (per sample and over all) (4) joinAssemblyStatsistic combine statistics over all assemblers (5) comparePhagesPairwise aligns complete genomes from different assembelrs (pairwise) to assess the inetrsection of the assembly sets

metatranscriptomes/16S_rRNA_genes: contains the detection and evaluation scripts for the 16s RRNA gene analysis

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
benchmarking		benchmarking
figures		figures
notebooks		notebooks
results		results
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

penguin-analysis

Required Software

BENCHMARKING

rhinovirus-3-mixture

HIV-mixtures

metatranscriptomes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

AnnSeidel/penguin-analysis

Folders and files

Latest commit

History

Repository files navigation

penguin-analysis

Required Software

BENCHMARKING

rhinovirus-3-mixture

HIV-mixtures

metatranscriptomes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages