Skip to content

Conversation

@mhutchinson
Copy link
Contributor

One of the key blind spots in this doc was why this VIndex design was chosen over something with a general ReduceFn. This addition is likely missing some arguments, but we can add these over time.

One of the key blind spots in this doc was why this VIndex design was chosen over something with a general ReduceFn. This addition is likely missing some arguments, but we can add these over time.

Using pointers as the values in this data structure is an important part of the design:

- Evolution of the value for a key is predictable: it's an append-only data structure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that predictable evolution isn't a consequence of using pointers, it's due to your "implicit" reduceFn being sortByIndex (really ~v=append(v, indexOf(x))).

You could imagine a map of, say, "who are all the CAs which have issued a cert for ?" also having an append-only structure in the leaf, where the values are just the set of Issuers seen in source-log order. i.e. there are many reduceFn impls constrained so as to also provide append-only ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the same outcome can be achieved in other ways, but I was trying to be concise.

- Evolution of the value for a key is predictable: it's an append-only data structure
- Values stay small: pointers to the values mean that the index doesn't need to duplicate values

Compare the above against the more powerful, but less efficient, general map (e.g. the [batchmap](https://github.com/google/trillian/tree/master/experimental/batchmap)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Efficiency is probably arguable in some senses:

  • The vindex size is reduced, but clients are forced to make multiple roundtrips: first to the vindex and then to the source log or its mirror for each resource pointer (plus, if the source log is a tlog-tiles log, and the pointers are non-consecutive, then we have a 256x multiplier on bytes retrieved due to entry bundles).
  • A map using an "append-only" constrained reduceFn as in the comment above could potentially result in a more efficient map if you're looking at it from a systemic PoV - potentially it'd eliminate the need for clients to go to the source log at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Efficiency is always arguable; case in point - I tried to keep this short in the hopes of being efficient, but now we're having this conversation 😆

I was trying to keep this short as this is very early on in the doc. Even if the index served the full leaf contents back and we were only interested in wire efficiency, this approach is still inefficient over time because every lookup returns stuff you've probably seen before. Of course that can be mitigated by having some since parameter, but... well, I didn't want to get into that before I've introduced MapFn at all yet.

@mhutchinson
Copy link
Contributor Author

My goals of getting this in early and iterating have been dashed. I'll take a look at this when I have some quality time to work out how this fits in better. I'm looking at you, 2026Q1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants