generalized match algorithm for k loci #8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces several substantial improvements to the matching pipeline, focusing on correctness, robustness, and extensibility, while keeping performance and memory usage under control.
contributions:
Detailed Changes
1. Correct GVH/HVG and Mismatch Computation
Overview:
Alleles at each locus are now treated as sets (duplicates removed) for both patient and donor. This ensures accurate mismatch calculations.
Per-Locus Metrics:
For each locus ℓ, the following metrics are computed:
Patient–Donor Total Scores:
Aggregate scores are computed by summing across all loci:
Modified File:
grma/match/donors_matching.py2. Fix: missing donor in candidate list
In
cpdef tuple neighbors_2nd(self, UINT node), a duplicated-1placeholder caused one valid donor to be silently dropped from the candidate list.Removing the extra placeholder restores complete and correct candidate enumeration.
File:
grma/match/lol_graph.pyx3. Generalization to k loci
All hard-coded assumptions about the number of loci (e.g. the magic constant
10) were removed and replaced with configurable, data-driven logic.4. Support additional allele formats with compact storage
Allele parsing and handling were extended beyond the strict
xy:wzformat to allow:To prevent donor-tree explosion, alleles are stored using compact integer UIDs via a bidirectional mapping (
bidict):This preserves correctness while significantly reducing memory pressure.
Affected components:
New dependency:
bidict