-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hello,
I am hitting an error with what I assume to be a file format issue. This is the error message that I get:
2022-09-13 10:09:24,357 | tejaas.utils.args | INFO | Running TEJAAS v1.0.1
2022-09-13 10:09:24,360 | tejaas.utils.args | INFO | Method: rr
2022-09-13 10:09:24,360 | tejaas.utils.args | INFO | Null Model: perm
2022-09-13 10:09:24,360 | tejaas.utils.args | INFO | Sigma_beta: 0.1
2022-09-13 10:09:24,360 | tejaas.main | DEBUG | Using 2 cores
2022-09-13 10:09:24,389 | tejaas.iotools.data | DEBUG | Reading expression levels for trans-eQTL discovery
2022-09-13 10:09:24,769 | tejaas.iotools.data | DEBUG | Reading expression levels for target gene discovery
2022-09-13 10:09:25,710 | tejaas.iotools.data | DEBUG | Found 7155 genes of 218 samples
2022-09-13 10:09:25,710 | tejaas.iotools.data | DEBUG | Reading gencode file for gene information
2022-09-13 10:09:30,776 | tejaas.iotools.data | DEBUG | Selecting common samples of genotype and gene expression
2022-09-13 10:09:30,776 | tejaas.iotools.data | DEBUG | Before expression selection: 7155 genes from 218 samples
Traceback (most recent call last):
File "/groups/stern/home/hanc/miniconda3/envs/py39/bin/tejaas", line 8, in <module>
sys.exit(main())
File "/groups/stern/home/hanc/miniconda3/envs/py39/lib/python3.9/site-packages/tejaas/main.py", line 73, in main
data.load()
File "/groups/stern/home/hanc/miniconda3/envs/py39/lib/python3.9/site-packages/tejaas/iotools/data.py", line 248, in load
expression_selected = rpkm._normalize_expr(expression[:, exprmask][indices, :])
IndexError: arrays used as indices must be of integer (or boolean) type
This is my Tejaas command:
mpirun -n ${NCORE} ${RUN_PATH} --vcf ${GENOFILE} --include-SNPs ${INCSTRING} --chrom ${CHROM} \
--gx ${GXPRFILE} --gxcorr ${GXPRFILE} --gtf ${GENEINFO} --trim --outprefix ${OUTPREFIX_RR} \
--method rr --prior-sigma 0.1 --knn 30 --cismask --null perm --psnpthres 0.1 --pgenethres 0.1 --biotype
This is the format of my expression file:
Gene_IDs A001 A002 A003 A004 A005
g5649 4.80323384689818 5.02117811663358 3.92602967965938 3.92602967965938 4.29661519218949
g5653 4.28097534826939 5.02117811663358 3.92602967965938 3.92602967965938 3.92602967965938
g5654 3.92602967965938 3.92602967965938 5.08096788235617 5.17450993990352 4.29661519218949
My VCF file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A001 A002 A003 A004 A005
1 243 ScPoQx8_4268;HRSCAF=5273_243 T C 131 . . GT:PL 0/1:64,0,34 0/0:. ./.:. 0/0:. ./.:.
1 247 ScPoQx8_4268;HRSCAF=5273_247 A G 125 . . GT:PL 0/1:64,0,35 0/0:. ./.:. 0/0:. ./.:.
1 275 ScPoQx8_4268;HRSCAF=5273_275 C T 159 . . GT:PL 0/0:. 0/0:. ./.:. 0/0:. ./.:.
My annotation file:
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS gene 1 1540 0.98 - . ID "g1"; gene_type "protein_coding";
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS mRNA 1 1540 0.98 - . ID "g1.t1"; Parent "g1" gene_type "protein_coding";
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS CDS 1110 1540 0.98 - 0 Parent "g1.t1" gene_type "protein_coding";
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS exon 1110 1540 . - . Parent "g1.t1" gene_type "protein_coding";
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS gene 15395 25390 0.65 + . ID "g2"; gene_type "protein_coding";
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS mRNA 15395 25390 0.65 + . ID "g2.t1"; Parent "g2" gene_type "protein_coding";
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS CDS 15395 15430 0.65 + 0 Parent "g2.t1" gene_type "protein_coding";
ScPoQx8_4355;HRSCAF=5393 AUGUSTUS exon 15395 15430 . + . Parent "g2.t1" gene_type "protein_coding";
This GTF was generated from a GFF3 using rtracklayer in R, and then I manually added gene_type "protein_coding"; to the last field.
Troubleshooting that I tried:
- Judging by the solution from the closed issue "GTF read error", I assumed my issue is the GTF file format. I tried to verify that the expression and VCF file formats are acceptable by test running Tejaas with my expression and VCF files and the gencode.v19 GTF that came with the example, but hit the same error as above.
- I thought perhaps Tejaas does not allow for VCF with missing data, so I removed all the sites with any missing genotypes in the VCF but also hit the same error as above. (Are VCFs with missing genotypes allowed?)
- I simplified the chromosome names in the annotation file to match those from gencode, like this:
chr1 AUGUSTUS gene 28451 29982 0.29 - . ID "g13599"; gene_type "protein_coding";
chr1 AUGUSTUS mRNA 28451 29982 0.29 - . ID "g13599.t1"; Parent "g13599" gene_type "protein_coding";
chr1 AUGUSTUS CDS 28451 28712 0.43 - 1 Parent "g13599.t1" gene_type "protein_coding";
chr1 AUGUSTUS exon 28451 28712 . - . Parent "g13599.t1" gene_type "protein_coding";
chr1 AUGUSTUS CDS 29864 29982 0.29 - 0 Parent "g13599.t1" gene_type "protein_coding";
chr1 AUGUSTUS exon 29864 29982 . - . Parent "g13599.t1" gene_type "protein_coding";
but ended up with a different error:
2022-09-13 12:23:29,867 | tejaas.utils.args | INFO | Running TEJAAS v1.0.1
2022-09-13 12:23:29,872 | tejaas.utils.args | INFO | Method: rr
2022-09-13 12:23:29,872 | tejaas.utils.args | INFO | Null Model: perm
2022-09-13 12:23:29,872 | tejaas.utils.args | INFO | Sigma_beta: 0.1
2022-09-13 12:23:29,872 | tejaas.main | DEBUG | Using 2 cores
2022-09-13 12:23:29,909 | tejaas.iotools.data | DEBUG | Reading expression levels for trans-eQTL discovery
2022-09-13 12:23:30,386 | tejaas.iotools.data | DEBUG | Reading expression levels for target gene discovery
2022-09-13 12:23:31,327 | tejaas.iotools.data | DEBUG | Found 7155 genes of 218 samples
2022-09-13 12:23:31,327 | tejaas.iotools.data | DEBUG | Reading gencode file for gene information
Traceback (most recent call last):
File "/groups/stern/home/hanc/miniconda3/envs/py39/bin/tejaas", line 8, in <module>
sys.exit(main())
File "/groups/stern/home/hanc/miniconda3/envs/py39/lib/python3.9/site-packages/tejaas/main.py", line 73, in main
data.load()
File "/groups/stern/home/hanc/miniconda3/envs/py39/lib/python3.9/site-packages/tejaas/iotools/data.py", line 241, in load
gene_info = readgtf.gencode(self.args.gtf_file, trim=self.args.gxtrim, biotype=self.args.biotype)
File "/groups/stern/home/hanc/miniconda3/envs/py39/lib/python3.9/site-packages/tejaas/iotools/readgtf.py", line 48, in gencode
gene_name = infolist[4].strip().split(' ')[1].replace('"','')
IndexError: list index out of range
Now I am not sure which input file is likely causing the issue.
I am able to successfully run the example files that came with the download.
Any suggestions would much appreciated. Thank you.
Metadata
Metadata
Assignees
Labels
No labels