Skip to content

Stringite merge completely removing some tissue-specific gene models #487

@DustinSokolowski

Description

@DustinSokolowski

Hello!

I have RNAseq and ISOseq from a wide array of tissues and I noticed some peculiar behaviour with stringtie --merge. Specifically, It looks like for some (but not all) genes with expression limited to 1-2 tissues, stringite will not generate a final gene model with stringtie merge. Please find two examples below of Lin28b and Trpv5,Trpv6

One potential consideration is that both of these loci are duplicated on another region of the same chromosme, but my understanding is that stringite --merge does not care about MapQ or TPM and only relies o the GFF files within each tissue? the "full_annotation.gff" file is combining gene models with stringite, braker, toga, and liftoff using Mikado, which is how there is an empty "stringtie.merged.gtf" but an annotated final file. This being said, the ISO-seq data is the only source where we have proper UTRs for these genes, so it would be super valuable if stringtie kept the gene models.

Image Image

There are other tissue specific genes that were properly merged, so this is not a behaviour that happens every time:

Image

This being said, I've messed with -c and -f within each tissue's annotation and stringite --merge consistently fails to produce a final annotation for lin28b and Trpv5

I would very much appreciate any insight you may have with this and I'm happy to share files if you'd like to take a closer look.

Best!
Dustin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions