Skip to content

Comments

Update Split and Splice docs#353

Merged
tjgq merged 1 commit intobazelbuild:mainfrom
tyler-french:update-split-splice
Feb 4, 2026
Merged

Update Split and Splice docs#353
tjgq merged 1 commit intobazelbuild:mainfrom
tyler-french:update-split-splice

Conversation

@tyler-french
Copy link
Contributor

@tyler-french tyler-french commented Nov 25, 2025

This PR better aligns the language of the REv2 API to describe how a client should expect to interact with the Split and Splice APIs.

Generally speaking, when designing a Remote Cache service, the server is not always primarily responsible for doing splitting and splicing blobs. In fact, the Split and Splice APIs are extremely helpful from a client's context to store and retrieve this manifest for how content defined chunking can compose a blob.

For example, if a client calls Splice that maps blob digest A to chunks A1 and A2, this instructs the server to store this information. Later, if a client that is not chunking-aware calls Read on A, the server can use this stored state to compose A from A1 and A2 stored in the CAS, and serve it to the client.

Similarly, if a user calls Split on blob B (which could be some Action Result), the server would respond with its stored manifest: B1 and B2. A chunking aware client can then skip downloading B1 if it's available locally from some other file's chunks, and download only B2 without ever needing to download the entirety of B.

@tyler-french tyler-french changed the title update split and splice docs WIP/Don't review: update split and splice docs Nov 25, 2025
@tyler-french tyler-french force-pushed the update-split-splice branch 2 times, most recently from ea8f353 to d4064ca Compare November 25, 2025 20:23
@tyler-french tyler-french force-pushed the update-split-splice branch 2 times, most recently from 74544c7 to 6d71426 Compare December 1, 2025 21:42
@tyler-french tyler-french changed the title WIP/Don't review: update split and splice docs Update Split and Splice docs Dec 1, 2025
@tyler-french tyler-french marked this pull request as ready for review December 4, 2025 23:03
tyler-french added a commit to buildbuddy-io/buildbuddy that referenced this pull request Dec 23, 2025
An implementation for CAS/SplitBlob and SpliceBlob is described here:
bazelbuild/remote-apis#353

In order to do chunking on a layer above the abstraction of the cache,
and store individual chunks in the CAS separately, we need to create a
chunked metadata storage of some sort. To keep things simple, this
implementation uses the Action Cache for storage of the chunked
manifests, and stores them **under the original blobs digest**. This is
simpler than using a **derived digest** (i.e. a hash of the digest +
metadata) but has the same security. The AC entries are stored under a
versioned prefix in the instance name, which means we can change the
version to invalidate all cached manifests across all instances.

`Split`: is used to retrieve a chunked manifest for a blob. If any of
the chunks are not found, or the manifest is not found, it returns a
`NotFound` error.

`Splice`: is used to upsert a chunked manifest. All chunks should be
available in the CAS. If any are missing, it returns an
`InvalidArgument` error. Splice will also return an `InvalidArgument`
error if the chunks do not concatenate together to equal the original
blob digest.

The implementation uses the experiment config so that we can enable this
by group or user gradually to evaluate performance. Since AC entries are
stored by group, this is safe.

This PR is part of a series for an MVP of CDC. The next step is to
implement the `Read`/`Write` ByteStream APIs to read and write using CDC
if a blob matches conditions.

An example of how this can be used for `ByteStream/Read`:
#10997

Follow-ups included in
buildbuddy-io/buildbuddy-internal#6426
Copilot AI review requested due to automatic review settings December 23, 2025 23:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the documentation for the SplitBlob and SpliceBlob APIs in the Remote Execution v2 protocol to clarify their purpose and usage patterns. The changes emphasize that these APIs are primarily for storing and retrieving chunk composition metadata rather than performing the actual splitting and splicing operations on the server.

Key Changes:

  • Clarified that SplitBlob retrieves stored information about how a blob is chunked rather than performing the split operation
  • Updated SpliceBlob documentation to emphasize that clients tell the server how chunks compose a blob
  • Expanded error conditions for SplitBlob to include cases where split information or chunks are missing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tyler-french tyler-french force-pushed the update-split-splice branch 2 times, most recently from 215a11c to b13101c Compare December 24, 2025 04:08
@tyler-french
Copy link
Contributor Author

@tjgq Do you have time to take another look here also? Thanks!

@tyler-french tyler-french requested a review from sluongng February 3, 2026 22:47
@tjgq tjgq merged commit 080cf12 into bazelbuild:main Feb 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants