-
Notifications
You must be signed in to change notification settings - Fork 4
[VIndex] Document novelty of index #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
One of the key blind spots in this doc was why this VIndex design was chosen over something with a general ReduceFn. This addition is likely missing some arguments, but we can add these over time.
|
|
||
| Using pointers as the values in this data structure is an important part of the design: | ||
|
|
||
| - Evolution of the value for a key is predictable: it's an append-only data structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that predictable evolution isn't a consequence of using pointers, it's due to your "implicit" reduceFn being sortByIndex (really ~v=append(v, indexOf(x))).
You could imagine a map of, say, "who are all the CAs which have issued a cert for ?" also having an append-only structure in the leaf, where the values are just the set of Issuers seen in source-log order. i.e. there are many reduceFn impls constrained so as to also provide append-only ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the same outcome can be achieved in other ways, but I was trying to be concise.
| - Evolution of the value for a key is predictable: it's an append-only data structure | ||
| - Values stay small: pointers to the values mean that the index doesn't need to duplicate values | ||
|
|
||
| Compare the above against the more powerful, but less efficient, general map (e.g. the [batchmap](https://github.com/google/trillian/tree/master/experimental/batchmap)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Efficiency is probably arguable in some senses:
- The vindex size is reduced, but clients are forced to make multiple roundtrips: first to the vindex and then to the source log or its mirror for each resource pointer (plus, if the source log is a tlog-tiles log, and the pointers are non-consecutive, then we have a 256x multiplier on bytes retrieved due to entry bundles).
- A map using an "append-only" constrained reduceFn as in the comment above could potentially result in a more efficient map if you're looking at it from a systemic PoV - potentially it'd eliminate the need for clients to go to the source log at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Efficiency is always arguable; case in point - I tried to keep this short in the hopes of being efficient, but now we're having this conversation 😆
I was trying to keep this short as this is very early on in the doc. Even if the index served the full leaf contents back and we were only interested in wire efficiency, this approach is still inefficient over time because every lookup returns stuff you've probably seen before. Of course that can be mitigated by having some since parameter, but... well, I didn't want to get into that before I've introduced MapFn at all yet.
|
My goals of getting this in early and iterating have been dashed. I'll take a look at this when I have some quality time to work out how this fits in better. I'm looking at you, 2026Q1! |
One of the key blind spots in this doc was why this VIndex design was chosen over something with a general ReduceFn. This addition is likely missing some arguments, but we can add these over time.