-
-
Notifications
You must be signed in to change notification settings - Fork 159
[RFC 0195] init: Binary Cache Index Protocol #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
roberth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit rough around the edges (I know it's draft!), but seems like a good starting point. Do you plan to validate this?
|
|
||
| ## 4. Layer 1: Journal (Hot Layer) | ||
|
|
||
| The journal captures recent mutations with minimal latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about minimal.
This seems to be determined by segment_duration_seconds, and it requires many individual requests to catch up with the log.
Perhaps with HTTP range requests the number of requests could be reduced, turning this into a small number of bulk downloads.
Long polling could be an implementation strategy to make this even more realtime, without the added complexity of a push protocol.
When doing range requests instead of relying on split files, you'd still want a time interval parameter, but instead of journal.segment_duration_seconds it would be journal.segment_query_interval. Set to 0 for long polling.
If it's a dumb bucket, set a high value to reduce unnecessary / inefficient traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that 'minimal' oversells it. I'll soften the language. Regarding optimizations: range requests for catch-up and long polling for real-time are good implementation strategies, but I'm inclined to keep them out of the spec itself since they're optimizations that servers/clients can adopt independently without protocol changes. The protocol just needs to not preclude them. Would a note in the implementation considerations section acknowledging these optimization opportunities be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to incorporate those HTTP features in order to simplify the protocol.
By appending to a large journal file and relying on this feature, you may both reduce the spec complexity and improve performance, in terms of latency and number of requests.
Unless we have an overriding reason to provide this inefficient multi-file scheme, I think we'd be better off treating an append only log as an append only log at the HTTP level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about how to implement a single-file journal over dumb storage (S3 + CDN), and I don't see a clean path. S3 objects are immutable, so "appending" requires re-uploading the entire file. The multi-segment approach lets writers upload small files without touching existing data.
For smart servers (with actual append support or a proxy layer), a single-file journal with range requests would indeed be simpler and more efficient. But I want the baseline protocol to work with just static file hosting.
I see two options:
Option A: Define both modes in the spec
- Add a field to indicate journal mode (
segmentsvssingle) - Clients implement both code paths
- More flexible, but adds complexity to every client implementation
Option B: Segments as the only mode
- Keep segments as the baseline (works with dumb storage)
- Smart servers like ncps could still optimize internally but serve the segment format for compatibility
- Simpler client implementations
Given that cache.nixos.org (the largest cache) runs on dumb S3, I'm leaning toward Option B. But I'm open to Option A if you think the range-request efficiency is worth the added client complexity.
What's your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roberth circling back on this, which option should I put in the RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roberth what do you think about the options here? I think I'll keep it as-is for now (I will update the RFC with other comments from last round). Let me know if you prefer to alter this section.
| Total: 64 bytes | ||
| ``` | ||
|
|
||
| **Implementation Note**: The header is designed to avoid struct padding issues. All multi-byte integers are little-endian. Implementations in C/Rust should use explicit byte-level serialization or `#pragma pack(1)` / `#[repr(packed)]` to ensure correct layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you not specify big-endian above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there's an overriding reason, pick little endian. Big endian is just not how computers work anymore.
{big-endian above} had an overriding reason, which is the correspondence between lexicographic sorting and the ordering of number sorting. But that's part of the domain, whereas this here is just a trivial implementation-level detail.
For comparison, we wouldn't pick big/little endian because e.g. accounting applications have Arabic numerals which are big endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify what might be confusing: the RFC uses both, intentionally:
- Section 2.1 (hash interpretation): Big-endian so that lexicographic string ordering equals numeric ordering - required for prefix-based sharding to work correctly.
- Section 5.1 (header integers): Little-endian for uint64 fields like item count and offsets - just binary serialization matching modern CPUs.
These aren't contradictory; they serve different purposes. I'll add a note to Section 5.1 clarifying why header integers use little-endian while hash interpretation uses big-endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, Nix32 encoding follows the lexicographic sort of the reversed byte string.
So prefix-based sharding for Nix32 paths is - behind the scenes - suffix-based sharding as it relates to native hash bytes and the base-16 encoding.
See docs pr NixOS/nix#15004 referenced earlier.
I feel uneasy to perpetuate this syntactic quirk.
If I understand correctly, it causes the byte sequences in this spec to be reverse of the native hash bytes. That is very very ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that this is an unfortunate consequence of Nix's non-standard base32 encoding. The big-endian interpretation in Section 2.1 is required for the protocol to work correctly—it ensures lexicographic string ordering equals numeric ordering, which is essential for prefix-based sharding and delta encoding.
The alternative would be to have the protocol convert Nix32 → native bytes → use native byte order, but this would:
- Add a conversion step on every operation
- Break the correspondence between string prefixes and shard assignment
- Add complexity without functional benefit
I can add a note acknowledging that the 160-bit integers in the index are byte-reversed relative to the native hash representation, but I don't see a way to avoid this without significantly complicating the protocol. Is there a specific problem you foresee this causing?
I considered reversing the string so that shard prefixes correspond to native hash byte prefixes, but this would break the intuitive correspondence between a hash's visible prefix (b6gv...) and its shard location (b6/). Operators debugging cache issues would need to mentally reverse hashes to find the right shard.
The current design's 'ugliness' is confined to the internal byte representation, which most implementers won't encounter directly (they'll use libraries like go-nix). The reverse approach would surface the ugliness to every user interaction.
I'm open to other suggestions, but I think preserving hash_prefix == shard_name is worth the internal byte-order quirk.
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Refine binary cache index protocol with manifest URL discovery, structured base URLs, zstd compression for shards, and clarified format details.
kalbasit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the review. I have addressed all your comments. I wasn't sure about the etiquette regarding resolving threads—should I leave them open for you to resolve if you are satisfied, or should I resolve them? I did go ahead and resolve the obvious code changes since I adopted those directly.
|
|
||
| ## 4. Layer 1: Journal (Hot Layer) | ||
|
|
||
| The journal captures recent mutations with minimal latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that 'minimal' oversells it. I'll soften the language. Regarding optimizations: range requests for catch-up and long polling for real-time are good implementation strategies, but I'm inclined to keep them out of the spec itself since they're optimizations that servers/clients can adopt independently without protocol changes. The protocol just needs to not preclude them. Would a note in the implementation considerations section acknowledging these optimization opportunities be sufficient?
| 18 8 Sparse index offset from start of file (uint64, little-endian) | ||
| 26 8 Sparse index entry count (uint64, little-endian) | ||
| 34 8 XXH64 checksum of encoded data section (uint64, little-endian) | ||
| 42 22 Reserved for future use (must be zeros) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The intent is lenient: clients SHOULD ignore the reserved bytes to allow minor, backward-compatible additions without breaking old clients. Breaking changes would bump the magic number (e.g., NIXIDX02) or the manifest version field. I'll clarify this in the spec - something like: 'Clients MUST ignore non-zero values in reserved bytes to allow backward-compatible extensions. Incompatible format changes will use a new magic number.
| Total: 64 bytes | ||
| ``` | ||
|
|
||
| **Implementation Note**: The header is designed to avoid struct padding issues. All multi-byte integers are little-endian. Implementations in C/Rust should use explicit byte-level serialization or `#pragma pack(1)` / `#[repr(packed)]` to ensure correct layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify what might be confusing: the RFC uses both, intentionally:
- Section 2.1 (hash interpretation): Big-endian so that lexicographic string ordering equals numeric ordering - required for prefix-based sharding to work correctly.
- Section 5.1 (header integers): Little-endian for uint64 fields like item count and offsets - just binary serialization matching modern CPUs.
These aren't contradictory; they serve different purposes. I'll add a note to Section 5.1 clarifying why header integers use little-endian while hash interpretation uses big-endian.
I think resolving them is fine. We can always re-open if we feel like this is missing the point. |
…dback Major changes based on RFC review comments: - Inline manifest into nix-cache-info: Remove separate manifest.json file and embed all index configuration directly in nix-cache-info using Index-prefixed fields. This eliminates an HTTP request and avoids adding another file format. - Document Nix32 byte order quirk: Add note in Section 2.1 explaining that Nix's base32 encoding processes bytes in reverse order compared to RFC4648, and recommend using established libraries like go-nix. - Change journal segment ID to opaque identifier: IndexJournalCurrentSegment is now specified as "opaque monotonically increasing" rather than explicitly a Unix timestamp. - Remove "Client Implementation Effort" from Drawbacks: This isn't a drawback—it's just how new features work. - Remove speculative Future Work items: Drop SIMD decoding, GPU acceleration, and flake discovery (already solved via nix-cache-info). - Update all examples to use nix-cache-info format - Update algorithm pseudocode to reference cache_info.Index* fields
|
Most comments are now address, can you give it another review? Let me know if this is ready for next steps. |
Rendered
Previous discussions and references:
cc @Mic92 @edef1c @brianmcgee @roberth @zimbatm @Kernald