Skip to content

Problem with ER IDs in 1GTF geneMode when every transcript except 1 is removed due to IR #7

@KSB54

Description

@KSB54

The typical behavior of TranD 1 GTF geneMode has keepIR set to False. This removes any transcripts with exons overlapping intron areas (removing intron retention transcripts).

This can result in a case where every transcript of a multi-transcript gene is removed, except for 1, essentially making the gene a single-transcript gene.

TranD ER IDs are meant to look like this:

GENE:ER1, GENE:ER2,...

However, when the stated situation occurs, they instead look like this:

TRANSCRIPT_exon_1, TRANSCRIPT_exon_2,....

This is because what is essentially now a single-transcript gene is going through the multi-transcript gene process.

This error occurs in event_analysis.py.

The method do_ea_gene (line 504) in handles single-transcript and multi-transcript genes separately to assure they get the correct ER identifiers. However, the IR transcript removal using remove_ir_transcripts only occurs in the multi-transcript gene processing (lines 528-559), meaning the newly single-transcript genes continue through the multi-transcript gene process without any checks for single-transcript genes.

Potential solution:

The method get_strand (line 262) is where exon regions are created using BedTool.cat on every transcript (tx) in dictionary of transcripts (tx_data) for a gene. The variable all_tx is supposed to merge the exons of every transcript in a gene into an exon region using BedTool.cat. If there is only one transcript in the gene, no merge occurs. This creates a formatting issue where all_tx is is not in the correct format for ER IDs to be properly created. To fix this:

Add all_tx = all_tx.cat(tx, postmerge=True) after line 269.

This simply merges the transcript's bedtool information with itself to get it into the correct format for creating ER IDs. However, this is just an untested bandaid fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions