Skip to content

Metatranscriptome performance #56

@sterrettJD

Description

@sterrettJD

Hey @bede,

Thanks for great tool! I've included it in my pipeline for host-microbiome dual transcriptomics, and our lab has been using it on a variety of projects.

I've been finding that it performs well for metagenomic-like simulated datasets, but for metatranscriptome-like simulated datasets, not all host reads are being removed.

I wonder if this is due to alternative splicing, and if inclusion of a splice-aware aligner as one of the options could address this. What do you think? Could you think of ways to use a different reference to avoid needing to add a splice-aware aligner?

Details on my simulated data can be found here. To summarize the results:

  1. If I naively simulate genomic-like community data with human reads from the human pangenome project, then run a linear regression on percent_removed_reads ~ true_percent_host:
    • $\beta$ = 0.999 (so almost flawless performance)
  2. If I simulate transcriptome-like community data using Polyester, where human reads are coming from the rna.fna.gz file found here, then run a linear regression on percent_removed_reads ~ true_percent_host:
    • $\beta$ = 0.856 (15% of host reads are missed)
  3. If I create semisynthetic communities, where real RNA-seq reads from bacterial isolates + human colon chip samples are subsampled and combined in known quantities,, then run a linear regression on percent_removed_reads ~ true_percent_host:
    • $\beta$ = 0.932 (7% of host reads are missed)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions