Skip to content

wrong gene_id assignment in transcript_models.gtf #363

@defendant602

Description

@defendant602

Dear @andrewprzh ,
Thansk for you continue efforts on this great project, but I have another problem regarding with IsoQuant output files. This is my cmd

isoquant.py --reference ref.fa --genedb ref.gtf     --complete_genedb --bam 'multiple bam files' --labels 'multiple labels' --data_type nanopore --stranded forward     --output output --prefix all --threads 32 --sqanti_output --model_construction_strategy sensitive_ont     --report_novel_unspliced false

I find many transcripts were classified as "novel_not_in_catalog" in novel_vs_known.SQANTI-like.tsv, but these transcripts were assigned "novel_gene_***" gene_id attributes in transcript_models.gtf.

transcript242.NC_016195.1.nnic	NC_016195.1	-	370	2	novel_not_in_catalog	KEH30_p11	unassigned_transcript_945	1554	1	-55	-3 -55	-3	extra_intron_novel;terminal_site_match_left_precise	FALSE	True	NA	NA	NA	NA	NA	NA	NA	False	NA	NA	NA NA	NA	NA	NA	NA	5471	7021	NA	0.35	TTAAACACTCAGCCATTTTA	NA	NA	NA	NA	NA	NA	NA	NA	NA
transcript248.NC_016195.1.nnic	NC_016195.1	-	1059	2	novel_not_in_catalog	KEH30_p11	unassigned_transcript_945	1554	1	-50	-3 -50	-3	extra_intron_novel;terminal_site_match_left_precise	FALSE	True	NA	NA	NA	NA	NA	NA	NA	False	NA	NA	NA NA	NA	NA	NA	NA	5471	7021	NA	0.35	TTAAACACTCAGCCATTTTA	NA	NA	NA	NA	NA	NA	NA	NA	NA
NC_016195.1	IsoQuant	gene	5468	7100	.	-	.	gene_id "novel_gene_NC_016195.1_249"; transcripts "2"; 
NC_016195.1	IsoQuant	transcript	5468	7100	.	-	.	gene_id "novel_gene_NC_016195.1_249"; transcript_id "transcript242.NC_016195.1.nnic"; similar_reference_id "unassigned_transcript_945"; alternatives "extra_intron_novel:5783-7045,tes_match_precise:3"; exons "2";
NC_016195.1	IsoQuant	exon	7046	7100	.	-	.	gene_id "novel_gene_NC_016195.1_249"; transcript_id "transcript242.NC_016195.1.nnic"; exon_number "1"; exon_id "90"; 
NC_016195.1	IsoQuant	exon	5468	5782	.	-	.	gene_id "novel_gene_NC_016195.1_249"; transcript_id "transcript242.NC_016195.1.nnic"; exon_number "2"; exon_id "91"; 
NC_016195.1	IsoQuant	transcript	5468	7095	.	-	.	gene_id "novel_gene_NC_016195.1_249"; transcript_id "transcript248.NC_016195.1.nnic"; similar_reference_id "unassigned_transcript_945"; alternatives "extra_intron_novel:6477-7045,tes_match_precise:3"; exons "2";
NC_016195.1	IsoQuant	exon	7046	7095	.	-	.	gene_id "novel_gene_NC_016195.1_249"; transcript_id "transcript248.NC_016195.1.nnic"; exon_number "1"; exon_id "92"; 
NC_016195.1	IsoQuant	exon	5468	6476	.	-	.	gene_id "novel_gene_NC_016195.1_249"; transcript_id "transcript248.NC_016195.1.nnic"; exon_number "2"; exon_id "93";

I took a look at these transcripts in IGV browser, and they realy were transcripts of gene KEH30_p11, not novel genes.

I am using IsoQuant-3.10.0, why would these wrong gene_id assignments happen? It realy confuses me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    algorithmIssue requires algorithmic improvementweird resultsSomething looks odd in the resulting files

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions