Skip to content

EzAAI calculate fails with databases derived from extract command when there are ambiguous (N) sequences #35

@gracegyho

Description

@gracegyho

Hi there,

I have some genomes (in fasta format) which I have made into databases using EzAAI extract command. The resulting databases were written to a directory called db and I ran the following command:

EzAAI calculate -i /db -j db/ -o out/all-v-all-EzAAI.txt

And the stdout was as follows:

ESC[36;1m[OCT 15 15:07:24] EzAAI   |:ESC[0m  EzAAI - v1.2.2 [Aug. 2022]
ESC[36;1m[OCT 15 15:07:29] EzAAI   |:ESC[0m  Calculating AAI... [Task 1/484]
ESC[36;1m[OCT 15 15:07:31] EzAAI   |:ESC[0m  Calculating AAI... [Task 2/484]
ESC[36;1m[OCT 15 15:07:34] EzAAI   |:ESC[0m  Calculating AAI... [Task 3/484]
ESC[36;1m[OCT 15 15:07:37] EzAAI   |:ESC[0m  Calculating AAI... [Task 4/484]
ESC[36;1m[OCT 15 15:07:40] EzAAI   |:ESC[0m  Calculating AAI... [Task 5/484]
ESC[36;1m[OCT 15 15:07:43] EzAAI   |:ESC[0m  Calculating AAI... [Task 6/484]
ESC[36;1m[OCT 15 15:07:46] EzAAI   |:ESC[0m  Calculating AAI... [Task 7/484]
ESC[36;1m[OCT 15 15:07:48] EzAAI   |:ESC[0m  Calculating AAI... [Task 8/484]
ESC[36;1m[OCT 15 15:07:51] EzAAI   |:ESC[0m  Calculating AAI... [Task 9/484]
ESC[36;1m[OCT 15 15:07:53] EzAAI   |:ESC[0m  Calculating AAI... [Task 10/484]
java.lang.NullPointerException
        at leb.process.ProcCalcPairwiseAAI.calcIdentityWithDetails(ProcCalcPairwiseAAI.java:166)
        at leb.process.ProcCalcPairwiseAAI.pairwiseMmseqs(ProcCalcPairwiseAAI.java:386)
        at leb.process.ProcCalcPairwiseAAI.calculateProteomePairWithDetails(ProcCalcPairwiseAAI.java:61)
        at leb.main.EzAAI.runCalculate(EzAAI.java:432)
        at leb.main.EzAAI.run(EzAAI.java:582)
        at leb.main.EzAAI.main(EzAAI.java:617)
ESC[31m[OCT 15 15:07:56] ERROR   |:ESC[0m  Program terminated with error.

In a previous analysis, I had this same error which I fixed by (apparently) removing the N's since one genome had a stretch of N's. Since it was only a couple contigs, I believe I manually went in and cut the contig into two parts... but this is a little bit sketchy and not reproducible science so my question is:

Am I right that Is there a proper way to deal with ambiguous base pairs in genomes? Not necessarily with EzAAI but also maybe something like the makeshift solution I had, but scripted?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions