Amendment of data in 1762444800 results #37
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR doubles as a post-mortem for a bug that existed in the master of kartograf from late September to late November. Since the bug is fixed this is the only clean-up necessary.
The kartograf bug
The was deep in the parsing logic for IRR DBs and only affected a small number of records which had multiple ASNs for the same prefix so a tie-breaker needed to be applied.
It was in asmap/kartograf@00baaea which came in with asmap/kartograf#91. In this commit the variable
route, which holds the parsed version of the route, is renamed toparsed_route. The same variable is reused further below as a cache key. The variable name isn't changed in that place though. Normally this should have resulted in an unknown variable error, however, route wasn't a newly created variable where it was renamed. It was pre-existing and only reassigned so a (unparsed)routevariable still existed in the same namespace. This pre-existing route variable was used as the cache key instead of the parsed_route. This meant the cache wasn't working as intended and the IRR parsing gave different results than before in very specific edge cases where a tie-breaker needed to be applied between multiple ASNs for the same prefix.I am mostly to blame for this issue because I initially created the code that re-used the route variable and didn't catch the bug in review of that commit later on.
Discovery and fix
In the conversation about the upcoming release @jurraca convinced me that there was some issue with reproducibility, though he suspected
rpki-clientto be the cause. My investigation uncovered the issue and I fixed it in asmap/kartograf#106 which is included in the new release 0.4.13.Impact
None of the releases of kartograf are affected since the bug only existed in master after 0.4.12 was release and it was fixed before the release of 0.4.13.
However, it is not a surprise that many participants of the collaborative runs have been using master rather than the latest release. This seemed fine, we have always tried to ensure master is consistent with the releases and we rather used releases as checkpoints that would be recommended to participants to maximize the chances of a match and ensure a smooth experience for everyone.
There were two collaborative runs since the bug was instroduced into master 1760025600 in October and 1762444800 in November. The run in October failed so there is nothing to fix there. Very likely, the mismatch between the latest release and master contributed to the failure.
The November run, however, succeeded and the result is affected by the bug. The result of this run was a split between two hashes: 5 participants had
ad7409c8d698cfdb7612ec3e1c72dc53f8bc130ac3b8e207ef731173345291fdand 2 had8dec174852e3dad0854d41319f60dad03b017c61e822680bf9c3f8fd3ea8fc6d. It turns out that these are actually almost the same result and if the fixed kartograf code is run on the data that resulted in the former hash before, then the result is actually the latter hash. So we might have actually had a perfect result among the 7 participants and 5 participants ran the buggy master while 2 participants were actually on the latest release.The diff between the two maps is also rather small luckily, there are just 12 differing entries:
The first half of these differences are clearly between ASNs that are controlled by the same entity. For the second half this can not be said for sure on a first glance. The impact on security seems rather low regardless since there are currently no nodes hosted in these networks and these amount to a tiny share of the internet address space.
Fixing the result of 1762444800
If possible through confirming the clean change of the result hash from the fix, we would still like to amend the results here. This would allow us to see a reproducible result of 1762444800 with the latest release. If this isn't possible we could either decide to leave the data as is or remove it from the repository all-together. Given we will do the next run soon either option seems fine.
Preventing such issues in the future
While we have expanded our code coverage and CI in the last months they did not catch this since this was an edge case that wasn't covered explicitly. We will add coverage for this behavior and we are also going to be doing a full reproduction run as part of CI from now on rather than a limited one with selected example data. We should probably also introduce some further code guidelines that make such bugs easier to expose.
How to review
If you participated in #34 and didn't already get the correct result and you still have the raw input data saved from this run, it would be great if you could confirm the changed hash by running a reproduction run on that data with master or the latest release of kartograf:
This should result in
8dec174852e3dad0854d41319f60dad03b017c61e822680bf9c3f8fd3ea8fc6d.Additionally you could then confirm that the encoding of this new result was done correctly here: