-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Hey @bede,
Thanks for great tool! I've included it in my pipeline for host-microbiome dual transcriptomics, and our lab has been using it on a variety of projects.
I've been finding that it performs well for metagenomic-like simulated datasets, but for metatranscriptome-like simulated datasets, not all host reads are being removed.
I wonder if this is due to alternative splicing, and if inclusion of a splice-aware aligner as one of the options could address this. What do you think? Could you think of ways to use a different reference to avoid needing to add a splice-aware aligner?
Details on my simulated data can be found here. To summarize the results:
- If I naively simulate genomic-like community data with human reads from the human pangenome project, then run a linear regression on
percent_removed_reads ~ true_percent_host:-
$\beta$ = 0.999 (so almost flawless performance)
-
- If I simulate transcriptome-like community data using Polyester, where human reads are coming from the rna.fna.gz file found here, then run a linear regression on
percent_removed_reads ~ true_percent_host:-
$\beta$ = 0.856 (15% of host reads are missed)
-
- If I create semisynthetic communities, where real RNA-seq reads from bacterial isolates + human colon chip samples are subsampled and combined in known quantities,, then run a linear regression on
percent_removed_reads ~ true_percent_host:-
$\beta$ = 0.932 (7% of host reads are missed)
-
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels