Skip to content

Conversation

@fjahr
Copy link
Collaborator

@fjahr fjahr commented Nov 22, 2025

This PR doubles as a post-mortem for a bug that existed in the master of kartograf from late September to late November. Since the bug is fixed this is the only clean-up necessary.

The kartograf bug

The was deep in the parsing logic for IRR DBs and only affected a small number of records which had multiple ASNs for the same prefix so a tie-breaker needed to be applied.

It was in asmap/kartograf@00baaea which came in with asmap/kartograf#91. In this commit the variable route, which holds the parsed version of the route, is renamed to parsed_route. The same variable is reused further below as a cache key. The variable name isn't changed in that place though. Normally this should have resulted in an unknown variable error, however, route wasn't a newly created variable where it was renamed. It was pre-existing and only reassigned so a (unparsed) route variable still existed in the same namespace. This pre-existing route variable was used as the cache key instead of the parsed_route. This meant the cache wasn't working as intended and the IRR parsing gave different results than before in very specific edge cases where a tie-breaker needed to be applied between multiple ASNs for the same prefix.

I am mostly to blame for this issue because I initially created the code that re-used the route variable and didn't catch the bug in review of that commit later on.

Discovery and fix

In the conversation about the upcoming release @jurraca convinced me that there was some issue with reproducibility, though he suspected rpki-client to be the cause. My investigation uncovered the issue and I fixed it in asmap/kartograf#106 which is included in the new release 0.4.13.

Impact

None of the releases of kartograf are affected since the bug only existed in master after 0.4.12 was release and it was fixed before the release of 0.4.13.

However, it is not a surprise that many participants of the collaborative runs have been using master rather than the latest release. This seemed fine, we have always tried to ensure master is consistent with the releases and we rather used releases as checkpoints that would be recommended to participants to maximize the chances of a match and ensure a smooth experience for everyone.

There were two collaborative runs since the bug was instroduced into master 1760025600 in October and 1762444800 in November. The run in October failed so there is nothing to fix there. Very likely, the mismatch between the latest release and master contributed to the failure.

The November run, however, succeeded and the result is affected by the bug. The result of this run was a split between two hashes: 5 participants had ad7409c8d698cfdb7612ec3e1c72dc53f8bc130ac3b8e207ef731173345291fd and 2 had 8dec174852e3dad0854d41319f60dad03b017c61e822680bf9c3f8fd3ea8fc6d. It turns out that these are actually almost the same result and if the fixed kartograf code is run on the data that resulted in the former hash before, then the result is actually the latter hash. So we might have actually had a perfect result among the 7 participants and 5 participants ran the buggy master while 2 participants were actually on the latest release.

The diff between the two maps is also rather small luckily, there are just 12 differing entries:

$ ../../clones/bitcoin/contrib/asmap/asmap-tool.py diff out/r1762444800-with-bug/final_result.txt out/r1762444800/final_result.txt
2001:500:91::/48 AS15135 # was AS33517
2001:500:92::/47 AS15135 # was AS33517
2001:500:95::/48 AS15135 # was AS33517
2001:500:96::/47 AS15135 # was AS33517
2001:dc7:ffc2::/48 AS24151 # was AS24409
2001:dc7:ffc6::/48 AS24151 # was AS24409
2001:dc7:ffd2::/48 AS24151 # was AS24409
2001:df0:bd::/48 AS147171 # was AS45292
2401:1::/32 AS132788 # was AS9940
2602:fed2:770b::/48 AS34924 # was AS53356
2602:feda:c0::/48 AS1029 # was AS43126
2604:7c00:100::/40 AS29802 # was AS40244
# Summary
IPv4: 0 entries with 0 addresses changed
IPv6: 12 entries with 79552154633921058212365205504 (2^96.01) addresses changed

The first half of these differences are clearly between ASNs that are controlled by the same entity. For the second half this can not be said for sure on a first glance. The impact on security seems rather low regardless since there are currently no nodes hosted in these networks and these amount to a tiny share of the internet address space.

Fixing the result of 1762444800

If possible through confirming the clean change of the result hash from the fix, we would still like to amend the results here. This would allow us to see a reproducible result of 1762444800 with the latest release. If this isn't possible we could either decide to leave the data as is or remove it from the repository all-together. Given we will do the next run soon either option seems fine.

Preventing such issues in the future

While we have expanded our code coverage and CI in the last months they did not catch this since this was an edge case that wasn't covered explicitly. We will add coverage for this behavior and we are also going to be doing a full reproduction run as part of CI from now on rather than a limited one with selected example data. We should probably also introduce some further code guidelines that make such bugs easier to expose.

How to review

If you participated in #34 and didn't already get the correct result and you still have the raw input data saved from this run, it would be great if you could confirm the changed hash by running a reproduction run on that data with master or the latest release of kartograf:

./run map -r ./data/1757606400/ -t 1757606400

This should result in 8dec174852e3dad0854d41319f60dad03b017c61e822680bf9c3f8fd3ea8fc6d.

Additionally you could then confirm that the encoding of this new result was done correctly here:

$ sha256sum out/r1762444800/final_result.txt
8dec174852e3dad0854d41319f60dad03b017c61e822680bf9c3f8fd3ea8fc6d
$ python path/to/bitcoin/contrib/asmap/asmap-tool.py encode --fill out/r1762444800/final_result.txt 1762444800_asmap.dat
$ sha256sum 1762444800_asmap.dat
518898863a6b0b77e775d3e2c4fa61e700ad8a42f011eae14aadfa9aee8ab4f0
$ python path/to/bitcoin/contrib/asmap/asmap-tool.py encode out/r1762444800/final_result.txt 1762444800_asmap_unfilled.dat
$ sha256sum 1762444800_asmap_unfilled.dat
a19dab2c783115e8a19c030f76c0851922ecb62add3504ec642185fa39701c9e

@fjahr
Copy link
Collaborator Author

fjahr commented Nov 22, 2025

I guess if we don't get enough people to confirm that they get the same result and support the amendment my suggested "fix" would be to open an issue about this and inform people of this irregularity for as long as the data of 1762444800 is present in master here.

@jurraca
Copy link
Collaborator

jurraca commented Nov 23, 2025

ACK successfully reproduced hashes.

@laanwj
Copy link

laanwj commented Nov 25, 2025

Sadly i don't have the data anymore. Better luck next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants