Open
Conversation
Construct the basic types needed to refactor this into type-safe, maintainable code.
Member
Author
|
@sgalkina I'm well aware that this is the PR from hell to review. If you want to do it, I suggest:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a complete rewrite of the package. It was motivated by the old code being flaky and brittle, and hard to understand. When digging into why, the package relied on some shady foundations, which this PR reworks from the bottom up:
Use of
pandascauses vague parsingUse of dataframes removes all type information
Once parsing is done, all type information is erased. That makes it hard, when reading the code, to understand what data is being loaded, and to understand whether the data is transformed correctly. This is the underlying cause of stuff like #8
Underspecified variants of the data being loaded
For example, what's the content of
data/clades.tsv? What does it assume? This was not encoded into the source code, and therefore, these assumptions were violated. For example, onmainnow, using the-mflag causes an internal error.Design in this PR
To review it, I suggest you read the new CLAUDE.md, or ask a chat bot to summarize the code.