Problem with ER IDs in 1GTF geneMode when every transcript except 1 is removed due to IR

The typical behavior of TranD 1 GTF geneMode has keepIR set to False. This removes any transcripts with exons overlapping intron areas (removing intron retention transcripts).

This can result in a case where every transcript of a multi-transcript gene is removed, except for 1, essentially making the gene a single-transcript gene.

TranD ER IDs are meant to look like this:

GENE:ER1, GENE:ER2,...

However, when the stated situation occurs, they instead look like this:

TRANSCRIPT_exon_1, TRANSCRIPT_exon_2,....

This is because what is essentially now a single-transcript gene is going through the multi-transcript gene process.

This error occurs in `event_analysis.py`.

The method `do_ea_gene` (line 504) in handles single-transcript and multi-transcript genes separately to assure they get the correct ER identifiers. However, the IR transcript removal using `remove_ir_transcripts` only occurs in the multi-transcript gene processing (lines 528-559), meaning the newly single-transcript genes continue through the multi-transcript gene process without any checks for single-transcript genes.

Potential solution:

The method `get_strand` (line 262) is where exon regions are created using `BedTool.cat` on every transcript (tx) in dictionary of transcripts (tx_data) for a gene. The variable `all_tx` is supposed to merge the exons of every transcript in a gene into an exon region using `BedTool.cat`. If there is only one transcript in the gene, no merge occurs. This creates a formatting issue where `all_tx` is is not in the correct format for ER IDs to be properly created. To fix this:

Add `all_tx = all_tx.cat(tx, postmerge=True)` after line 269. 

This simply merges the transcript's bedtool information with itself to get it into the correct format for creating ER IDs. However, this is just an untested bandaid fix.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with ER IDs in 1GTF geneMode when every transcript except 1 is removed due to IR #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem with ER IDs in 1GTF geneMode when every transcript except 1 is removed due to IR #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions