Skip to content

Conversation

@MichaelBarnett
Copy link
Contributor

Fixes an issue during tree traversal, where more than 3 tokens are returned from splitting some taxonstr, specifically when the name contains additional "[" and "]" brackets. For example in

(When using the NCBI Taxonomy database)

The fix is as fast as, or faster than the original code splitting method (tested on 1,000,000 randomly generated sequences), with slightly faster outcomes on longer name strings.

steps to reproduce bug:

result = pytaxonkit.list([668369, 11022])
for taxon, tree in result:
    subtaxa = [t for t in tree.traverse]
    print(f'Top level result: {taxon.name} ({taxon.taxid}); {len(subtaxa)} related taxa')

results in exception:

Traceback (most recent call last):
  File "/root/repo/testing/pytaxonkit_fix.py", line 70, in <module>
    for taxon, tree in result:
                       ^^^^^^
  File "/root/miniconda3/envs/commec-dev/lib/python3.14/site-packages/pytaxonkit.py", line 141, in __iter__
    taxid, rank, name = taxonstr.replace("[", "]").split("]")
    ^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 3, got 5)

Fixes an issue during tree traversal, where more than 3 tokens are returned from splitting some taxonstr, specifically when the name contains additional "[" and "]" brackets. For example in
668369 [strain] Escherichia coli DH5[alpha]
11022 [no rank] Eastern equine encephalitis virus (STRAIN VA33[TEN BROECK])
461749 [strain] Salmonella enterica subsp. enterica serovar Typhimurium var. monophasic 4,[5],12:i:-
The fix is as fast as, or faster than the original code splitting method, with faster outcomes on longer name strings.
Comment on lines 148 to 149
taxid, rank, name = taxonstr.replace("[", "]").split("]")
taxid, remainder = s.split("[", 1)
rank, name = remainder.split("]", 1)
Copy link
Member

@standage standage Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting this! The s variable isn't defined here, I'm guessing you mean taxonstr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies, yes. That's what I get for not using consistent local var names when investigating a fix!

@standage
Copy link
Member

standage commented Dec 1, 2025

The CI failures here have to do with conda, not with the code. So I'm going to go ahead and merge. Thanks!

@standage standage merged commit 9b50f36 into bioforensics:master Dec 1, 2025
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants