Conversation
adamnovak
left a comment
There was a problem hiding this comment.
I think this is good, but I could be happier if the justification for the cool subtraction trick was in here, because I can't tell why you're doing that and not bit masks.
| // While the version number is 3, we will store (max - version) | ||
| // to avoid getting confused with old indexes without version numbers | ||
| // that start with component count | ||
| const static size_t CURRENT_VERSION_NUMBER = 3; | ||
| const static size_t VERSION_NUMBER_SENTINEL = (1 << 26) - CURRENT_VERSION_NUMBER; |
There was a problem hiding this comment.
Setting some high bits will make for a big page in the paged vector, but that shouldn't be a big problem since it's just the first page.
Why 26? And why subtract from it instead of setting a high bit and masking down to the bits where the version actually is? I think this way we can keep our values at the high end in 1 fewer bits than with the masking approach, which then means 1 fewer bits in each entry in the page, which is nice. But if that's the reasoning it should probably be explained.
There was a problem hiding this comment.
It was 26 because when I tried to set it to the maximum size_t that was too big; it complained that it couldn't store it in 26 bits.
There was a problem hiding this comment.
... but the tests just complained again, so I'm gonna change it to 10 bits I guess. I could do bit masking; didn't think of that.
Since vgteam/vg#4745 is going to lead to old distance indexes being incompatible with new vg giraffe, this PR introduces a way to version check. The version number field is stored in the root record as the first item. To avoid possible confusion with the previous first item (the connected component count), the number is stored as (2^26 minus version). If anyone has a graph with that many connected components, something very weird has occurred. The distance index will error on load if you try to read something with the wrong version number.
@adamnovak please review.