Skip to content

how should I use DWGSIM to generate SNP in RNA data? #83

@SimoneRossi94

Description

@SimoneRossi94

Hello,
I'm writing to ask your opinion about generating SNP in RNA data.
I already used DWGSIM for genomic data and finding the global position of the variants in the VCF was relatively straightforward, I would look for the specific gene in the GTF file and add the position given by DWGSIM to the start of the gene in case of genes on the positive strand or subtract the position given by DWGSIM from the end of the gene in case of genes on the negative strand.

Now, I am using DWGSIM to generate SNP on RNA sequences, therefore I have no introns in the fasta sequences.
My question is, how does DWGSIM calculate the position of the variants?
From what I understood DWGSIM returns in the mutation file a local position of the variant in the specific gene only by basing the number on the fasta sequence passed as input, therefore if it is an RNA sequence it doesn't count for introns.
If this assumption is correct, how should I try to find the real global position of the variant in the VCF in the case of an RNA sequence?

I thought about a calculation that looks like this:

If the Gene is on the positive strand:
1 case: the variant is in the first exon -> I calculate the global position by doing: (the start of the gene) + (position given by DWGSIM)
2 case: the variant is NOT in the first exon, so I need to remove the length of the introns that precede the variant because their length is not taken into consideration by DWGSIM -> I calculate the global position by doing: (the start of the exon where the variant is located) + (position given by DWGSIM) - (the sum of the introns' length until that specific exon where the variant is located)

If the Gene is on the negative strand:
1 case: the variant is in the first exon -> I calculate the global position by doing: (the end of the gene) - (position given by DWGSIM)
2 case: the variant is NOT in the first exon -> I calculate the global position by doing: (the END of the exon where the variant is located) - (position given by DWGSIM) - (the sum of the introns' length until that specific exon where the variant is located)

Does this calculation make sense or perhaps there is a better way to do this?

Thank you so much for your support,
Kind Regards
Simone Rossi

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions