Optimize Rust cache configuration for improved CI performance (Phase 1) #2765

mendral-app · 2025-12-04T19:15:07Z

Summary

Implements Phase 1 of Rust caching optimization to improve CI build times and cache reusability.

Problem

Current cache hit rate: ~5% (48/50 recent jobs had cache misses)

Cache keys change on every commit due to including all Cargo.toml files
Jobs cannot share caches even when dependencies are identical
Average build times: 2-4 minutes per job

Solution

1. Shared cache key based on Cargo.lock hash

Dependencies rarely change, enabling cache reuse across commits
Cache key: v1-rust-shared-<cargo-lock-hash>-<os>-<arch>

2. Enable workspace crate caching (except WASM)

Caches compiled workspace crates (13 crates in baml_language)
WASM excluded: cross-compilation with --no-default-features doesn't benefit from workspace crate caching

3. Selective cache saves

Only save from main/canary branches to prevent cache pollution
PR branches restore but do not save

Changes

.github/workflows/cargo-tests.reusable.yaml: Updated 7 jobs
.github/workflows/ci.yaml: Updated benchmarks job

Performance Validation

Measured performance by temporarily removing save-if to enable cache testing on the PR branch. Ran CI multiple times to populate cache and measure impact:

Results (comparing cached vs non-cached runs):

Job	Baseline	With Cache	Improvement
cargo clippy	74s	49s	34% faster ✅
cargo test (linux)	179s	144s	20% faster ✅
cargo test (macos)	128s	116s	9% faster
cargo test (windows)	168s	161s	4% faster
cargo build (msrv)	90s	81s	10% faster
snapshot tests	147s	144s	2% faster
Total	841s (14.0 min)	773s (12.9 min)	8% faster

Cache verification: 100% cache hit rate confirmed with shared Cargo.lock-based key.

WASM exclusion rationale: Initial testing showed WASM builds were 42% slower with workspace crate caching enabled (55s → 78s). Cross-compilation to wasm32-unknown-unknown with --no-default-features creates different artifacts that don't benefit from cached workspace crates. WASM still benefits from dependency caching via the shared key.

Follow-up

This is Phase 1 of 3:

✅ Phase 1 (this PR): Shared cache keys (8-10% improvement measured)
🔜 Phase 2: Shared dependency cache (additional improvement)
🔜 Phase 3: sccache implementation (additional improvement)

Implement Phase 1 cache optimizations: - Use shared cache key based on Cargo.lock hash for better reusability - Enable cache-workspace-crates to cache compiled workspace crates - Restrict cache saves to main/canary branches to reduce cache pollution - Update cache prefix to v1-rust-shared to avoid conflicts with old caches Expected improvements: - Cache hit rate: 5% -> 70-80% - Build time reduction: 30-40% on cache hits - Better cache sharing across jobs with same dependencies

vercel · 2025-12-04T19:15:14Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
promptfiddle	Ready	Preview	Comment	Dec 4, 2025 10:29pm

WASM builds with --no-default-features and cross-compilation to wasm32-unknown-unknown don't benefit from workspace crate caching. Testing showed 42% performance degradation with caching enabled. This change disables cache-workspace-crates for WASM while keeping dependency caching via the shared Cargo.lock-based key.

samalba · 2025-12-04T22:20:12Z

Hey - @hellovai @sxlijin. We ran our agent for improving the rust caching, based on past conversation with Vaibhav.

Note that this is not a fire-and-forget, they were several passes and CI runs to measure the impact, as explained in the description. Feedback is welcome, let us know if this helps.

sxlijin · 2025-12-04T22:50:52Z

Thanks! The improved clippy/linux-test numbers are interesting, but I'm concerned that this is a red herring improvement.

Two sets of feedback, one for the change itself and one for the workflow.

Change itself adds

          prefix-key: v1-rust-shared
          shared-key: ${{ hashFiles('baml_language/Cargo.lock') }}
          cache-workspace-crates: true
          save-if: ${{ github.ref == 'refs/heads/main' || github.ref == 'refs/heads/canary' }}

prefix-key: would prefer something specific to baml_language. that means i can debug it in the GHA UI and also disambiguate it from the rust cache for engine/

shared-key: why do we need to set this explicitly? I see the solution described in the PR summary.

cache-workspace-crates: this seems fine

save-if: I suspect this should be removed. There's subtleties around the gha cache and how it works (specifically the cache is not shared across branches) which have weird implications for how this change affects subsequent pushes to a branch vs. just new branches. I honestly don't really know how to measure those changes in a PR, instead of just experimenting with it longitudinally.

I also don't entirely understand the methodology that mendral used here: what jobs did it run? Is it analyzing prior history or did it run experiments? I'm assuming it ran experiments, but I don't know what they were. I'd also like an n= and some confidence numbers, e.g. are these averages? medians? 95% confidence?

Also mendral says it "Measured performance by temporarily removing save-if" - but that's not the only change it's making. So why is it making the other changes?

mendral:

swatinem/rust-cache is a third party gha that few of us really understand the config knobs of. Simply by reviewing this PR I'm having to learn it, I still haven't looked at the docs for it because I don't want to yet. But because I don't have any context on it, I want mendral to (1) have comments inline explaining the purpose of each field, why the default is bad in our case, and why the setting is better and (2) educate me about the code it's changing.

Some of the context will be recoverable in the future via git blame but we're not very careful with keeping our changes fine-grained so it's much easier to just have it all as inline comments; inline comments also serve as guidance for future humans/agents about the assumptions that drove a given change (and when it's OK to unwind a change).

I'm not sure how mendral should be choosing the education threshold for me, but at least for a change like this that is about configuring a third-party thing I want to spend as little time as I have to doing my own research to understand the change.

Re methodology - because of the above notes on subtleties around the longitudinal implications of caching behavior I'm not sure if the data that mendral produced justifies the change.

samalba · 2025-12-04T23:26:40Z

prefix-key: would prefer something specific to baml_language. that means i can debug it in the GHA UI and also disambiguate it from the rust cache for engine/

Pretty easy to ask the agent to amend this PR, if you prefer a specific string, let me know.

shared-key: why do we need to set this explicitly? I see the solution described in the PR summary.

The rationale is to reuse the cache across jobs, as long as the jobs use the same dependencies / hash (hence including those info in the cache key).

save-if: I suspect this should be removed. There's subtleties around the gha cache and how it works (specifically the cache is not shared across branches) which have weird implications for how this change affects subsequent pushes to a branch vs. just new branches. I honestly don't really know how to measure those changes in a PR, instead of just experimenting with it longitudinally.

I would keep it. The goal is for the cache to be saved only for jobs from canary and main branches. I think it's what you want since you rarely re-run jobs from within PRs (they already benefit from cache read).

Also Github's cache has a 10GB limit, so it's better to avoid PR's cache to evict the cache that was saved from canary or main runs.

I also don't entirely understand the methodology that mendral used here: what jobs did it run? Is it analyzing prior history or did it run experiments? I'm assuming it ran experiments, but I don't know what they were. I'd also like an n= and some confidence numbers, e.g. are these averages? medians? 95% confidence?

You can check all the runs here: https://github.com/BoundaryML/baml/actions?query=branch%3Amendral%2Foptimize-rust-cache-phase1 - it did several CI runs to confirm the cache was getting filled and compared the performance with and without the cache.

We built more than this agent itself, we have a log ingestion system that measure the performance over time, the measures explained here were just done by the agent to confirm this PR is useful. It initially built a plan in 3 phases (briefly explained in the body), with this change being only the first step. We have the ability to observe the jobs performance impact before moving to phase 2 in a later PR.

Also mendral says it "Measured performance by temporarily removing save-if" - but that's not the only change it's making. So why is it making the other changes?

It created a temporary commit to only removed save-if so it could verify the cache was filled correctly (even it runs on this branch). Then removed that commit (with a force-push).

swatinem/rust-cache is a third party gha that few of us really understand the config knobs of. Simply by reviewing this PR I'm having to learn it, I still haven't looked at the docs for it because I don't want to yet. But because I don't have any context on it, I want mendral to (1) have comments inline explaining the purpose of each field, why the default is bad in our case, and why the setting is better and (2) educate me about the code it's changing.

I can instruct it to add inline comments in the workflow to explain the purposed of each added option. Also good feedback to add it to the system prompt as well.

sxlijin · 2025-12-05T07:30:13Z

Pretty easy to ask the agent to amend this PR, if you prefer a specific string, let me know.

Is there a syntax for that? I couldn't tag @mendral-app so couldn't tell. The prefix I'd want is v1-rust-baml_language

I would keep it. The goal is for the cache to be saved only for jobs from canary and main branches. I think it's what you want since you rarely re-run jobs from within PRs (they already benefit from cache read).

Fair enough - I'd just keep it on canary then. There was an attempt at some point years ago to do some fancy release things that never panned out and we're stuck on canary now; main will probably never be a thing in this repo.

(Incidentally, I don't know how mendral can persuade me that my initial impression is wrong, when I read something and I think I disagree with it but I'm wrong. Your human feedback is doing that here, but without you in the loop idk what would do that 😅)

Re the 10G cache limit, we were overcommitting that cache for a long time (I thought I saw up to 50G usage at one point a few months ago), but it looks like we're getting enforced on that now because the current number is 10.52G...

You can check all the runs here [...] We built more than this agent itself, we have a log ingestion system that measure the performance over time, the measures explained here were just done by the agent to confirm this PR is useful.

It created a temporary commit to only removed save-if so it could verify the cache was filled correctly (even it runs on this branch). Then removed that commit (with a force-push).

Ahh, gotcha. This is the part where I didn't realize what context I didn't have about what mendral does. That's neat, that it uses save-if to simulate the default branch populating the cache (I'm assuming it's doing this generically and not just for swatinem/rust-cache?)

For my knowledge: is mendral targeted specifically at github actions performance improvement then? Because that changes my mental model for how much default trust I give it (the more specific, the more willing I am to take it at face value; the more general, the more scrutiny I apply; and I was definitely coming from the latter side).

Incidentally, our cache storage being at limit will probably screw with mendral's performance numbers... unless your fork has its own independent cache? I don't know how that GH quota policy works... 😵‍💫

vercel bot temporarily deployed to Preview December 4, 2025 19:32 Inactive

vercel bot temporarily deployed to Preview December 4, 2025 20:22 Inactive

mendral-app bot force-pushed the mendral/optimize-rust-cache-phase1 branch from 9df1c3e to 6807fb9 Compare December 4, 2025 20:30

vercel bot temporarily deployed to Preview December 4, 2025 21:39 Inactive

vercel bot temporarily deployed to Preview December 4, 2025 21:44 Inactive

mendral-app bot force-pushed the mendral/optimize-rust-cache-phase1 branch from 03d1174 to 75c25b6 Compare December 4, 2025 22:12

mendral-app bot marked this pull request as ready for review December 4, 2025 22:15

vercel bot temporarily deployed to Preview December 4, 2025 22:29 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Rust cache configuration for improved CI performance (Phase 1) #2765

Optimize Rust cache configuration for improved CI performance (Phase 1) #2765

Uh oh!

mendral-app bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

vercel bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

samalba commented Dec 4, 2025

Uh oh!

sxlijin commented Dec 4, 2025

Uh oh!

samalba commented Dec 4, 2025 •

edited

Loading

Uh oh!

sxlijin commented Dec 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize Rust cache configuration for improved CI performance (Phase 1) #2765

Are you sure you want to change the base?

Optimize Rust cache configuration for improved CI performance (Phase 1) #2765

Uh oh!

Conversation

mendral-app bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Performance Validation

Follow-up

Uh oh!

vercel bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samalba commented Dec 4, 2025

Uh oh!

sxlijin commented Dec 4, 2025

Uh oh!

samalba commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sxlijin commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mendral-app bot commented Dec 4, 2025 •

edited

Loading

vercel bot commented Dec 4, 2025 •

edited

Loading

samalba commented Dec 4, 2025 •

edited

Loading

sxlijin commented Dec 5, 2025 •

edited

Loading