Skip to content

Conversation

@mendral-app
Copy link
Contributor

@mendral-app mendral-app bot commented Dec 4, 2025

Summary

Implements Phase 1 of Rust caching optimization to improve CI build times and cache reusability.

Problem

Current cache hit rate: ~5% (48/50 recent jobs had cache misses)

  • Cache keys change on every commit due to including all Cargo.toml files
  • Jobs cannot share caches even when dependencies are identical
  • Average build times: 2-4 minutes per job

Solution

1. Shared cache key based on Cargo.lock hash

  • Dependencies rarely change, enabling cache reuse across commits
  • Cache key: v1-rust-shared-<cargo-lock-hash>-<os>-<arch>

2. Enable workspace crate caching (except WASM)

  • Caches compiled workspace crates (13 crates in baml_language)
  • WASM excluded: cross-compilation with --no-default-features doesn't benefit from workspace crate caching

3. Selective cache saves

  • Only save from main/canary branches to prevent cache pollution
  • PR branches restore but do not save

Changes

  • .github/workflows/cargo-tests.reusable.yaml: Updated 7 jobs
  • .github/workflows/ci.yaml: Updated benchmarks job

Performance Validation

Measured performance by temporarily removing save-if to enable cache testing on the PR branch. Ran CI multiple times to populate cache and measure impact:

Results (comparing cached vs non-cached runs):

Job Baseline With Cache Improvement
cargo clippy 74s 49s 34% faster
cargo test (linux) 179s 144s 20% faster
cargo test (macos) 128s 116s 9% faster
cargo test (windows) 168s 161s 4% faster
cargo build (msrv) 90s 81s 10% faster
snapshot tests 147s 144s 2% faster
Total 841s (14.0 min) 773s (12.9 min) 8% faster

Cache verification: 100% cache hit rate confirmed with shared Cargo.lock-based key.

WASM exclusion rationale: Initial testing showed WASM builds were 42% slower with workspace crate caching enabled (55s → 78s). Cross-compilation to wasm32-unknown-unknown with --no-default-features creates different artifacts that don't benefit from cached workspace crates. WASM still benefits from dependency caching via the shared key.

Follow-up

This is Phase 1 of 3:

  • ✅ Phase 1 (this PR): Shared cache keys (8-10% improvement measured)
  • 🔜 Phase 2: Shared dependency cache (additional improvement)
  • 🔜 Phase 3: sccache implementation (additional improvement)

Implement Phase 1 cache optimizations:
- Use shared cache key based on Cargo.lock hash for better reusability
- Enable cache-workspace-crates to cache compiled workspace crates
- Restrict cache saves to main/canary branches to reduce cache pollution
- Update cache prefix to v1-rust-shared to avoid conflicts with old caches

Expected improvements:
- Cache hit rate: 5% -> 70-80%
- Build time reduction: 30-40% on cache hits
- Better cache sharing across jobs with same dependencies
@vercel
Copy link

vercel bot commented Dec 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
promptfiddle Ready Ready Preview Comment Dec 4, 2025 10:29pm

WASM builds with --no-default-features and cross-compilation to
wasm32-unknown-unknown don't benefit from workspace crate caching.
Testing showed 42% performance degradation with caching enabled.

This change disables cache-workspace-crates for WASM while keeping
dependency caching via the shared Cargo.lock-based key.
@mendral-app mendral-app bot force-pushed the mendral/optimize-rust-cache-phase1 branch from 03d1174 to 75c25b6 Compare December 4, 2025 22:12
@mendral-app mendral-app bot marked this pull request as ready for review December 4, 2025 22:15
@samalba
Copy link

samalba commented Dec 4, 2025

Hey - @hellovai @sxlijin. We ran our agent for improving the rust caching, based on past conversation with Vaibhav.

Note that this is not a fire-and-forget, they were several passes and CI runs to measure the impact, as explained in the description. Feedback is welcome, let us know if this helps.

@sxlijin
Copy link
Collaborator

sxlijin commented Dec 4, 2025

Thanks! The improved clippy/linux-test numbers are interesting, but I'm concerned that this is a red herring improvement.

Two sets of feedback, one for the change itself and one for the workflow.


Change itself adds

          prefix-key: v1-rust-shared
          shared-key: ${{ hashFiles('baml_language/Cargo.lock') }}
          cache-workspace-crates: true
          save-if: ${{ github.ref == 'refs/heads/main' || github.ref == 'refs/heads/canary' }}

prefix-key: would prefer something specific to baml_language. that means i can debug it in the GHA UI and also disambiguate it from the rust cache for engine/

shared-key: why do we need to set this explicitly? I see the solution described in the PR summary.

cache-workspace-crates: this seems fine

save-if: I suspect this should be removed. There's subtleties around the gha cache and how it works (specifically the cache is not shared across branches) which have weird implications for how this change affects subsequent pushes to a branch vs. just new branches. I honestly don't really know how to measure those changes in a PR, instead of just experimenting with it longitudinally.

I also don't entirely understand the methodology that mendral used here: what jobs did it run? Is it analyzing prior history or did it run experiments? I'm assuming it ran experiments, but I don't know what they were. I'd also like an n= and some confidence numbers, e.g. are these averages? medians? 95% confidence?

Also mendral says it "Measured performance by temporarily removing save-if" - but that's not the only change it's making. So why is it making the other changes?


mendral:

swatinem/rust-cache is a third party gha that few of us really understand the config knobs of. Simply by reviewing this PR I'm having to learn it, I still haven't looked at the docs for it because I don't want to yet. But because I don't have any context on it, I want mendral to (1) have comments inline explaining the purpose of each field, why the default is bad in our case, and why the setting is better and (2) educate me about the code it's changing.

Some of the context will be recoverable in the future via git blame but we're not very careful with keeping our changes fine-grained so it's much easier to just have it all as inline comments; inline comments also serve as guidance for future humans/agents about the assumptions that drove a given change (and when it's OK to unwind a change).

I'm not sure how mendral should be choosing the education threshold for me, but at least for a change like this that is about configuring a third-party thing I want to spend as little time as I have to doing my own research to understand the change.

Re methodology - because of the above notes on subtleties around the longitudinal implications of caching behavior I'm not sure if the data that mendral produced justifies the change.

@samalba
Copy link

samalba commented Dec 4, 2025

prefix-key: would prefer something specific to baml_language. that means i can debug it in the GHA UI and also disambiguate it from the rust cache for engine/

Pretty easy to ask the agent to amend this PR, if you prefer a specific string, let me know.

shared-key: why do we need to set this explicitly? I see the solution described in the PR summary.

The rationale is to reuse the cache across jobs, as long as the jobs use the same dependencies / hash (hence including those info in the cache key).

save-if: I suspect this should be removed. There's subtleties around the gha cache and how it works (specifically the cache is not shared across branches) which have weird implications for how this change affects subsequent pushes to a branch vs. just new branches. I honestly don't really know how to measure those changes in a PR, instead of just experimenting with it longitudinally.

I would keep it. The goal is for the cache to be saved only for jobs from canary and main branches. I think it's what you want since you rarely re-run jobs from within PRs (they already benefit from cache read).

Also Github's cache has a 10GB limit, so it's better to avoid PR's cache to evict the cache that was saved from canary or main runs.

I also don't entirely understand the methodology that mendral used here: what jobs did it run? Is it analyzing prior history or did it run experiments? I'm assuming it ran experiments, but I don't know what they were. I'd also like an n= and some confidence numbers, e.g. are these averages? medians? 95% confidence?

You can check all the runs here: https://github.com/BoundaryML/baml/actions?query=branch%3Amendral%2Foptimize-rust-cache-phase1 - it did several CI runs to confirm the cache was getting filled and compared the performance with and without the cache.

We built more than this agent itself, we have a log ingestion system that measure the performance over time, the measures explained here were just done by the agent to confirm this PR is useful. It initially built a plan in 3 phases (briefly explained in the body), with this change being only the first step. We have the ability to observe the jobs performance impact before moving to phase 2 in a later PR.

Also mendral says it "Measured performance by temporarily removing save-if" - but that's not the only change it's making. So why is it making the other changes?

It created a temporary commit to only removed save-if so it could verify the cache was filled correctly (even it runs on this branch). Then removed that commit (with a force-push).

swatinem/rust-cache is a third party gha that few of us really understand the config knobs of. Simply by reviewing this PR I'm having to learn it, I still haven't looked at the docs for it because I don't want to yet. But because I don't have any context on it, I want mendral to (1) have comments inline explaining the purpose of each field, why the default is bad in our case, and why the setting is better and (2) educate me about the code it's changing.

I can instruct it to add inline comments in the workflow to explain the purposed of each added option. Also good feedback to add it to the system prompt as well.

@sxlijin
Copy link
Collaborator

sxlijin commented Dec 5, 2025

Pretty easy to ask the agent to amend this PR, if you prefer a specific string, let me know.

Is there a syntax for that? I couldn't tag @mendral-app so couldn't tell. The prefix I'd want is v1-rust-baml_language

I would keep it. The goal is for the cache to be saved only for jobs from canary and main branches. I think it's what you want since you rarely re-run jobs from within PRs (they already benefit from cache read).

Fair enough - I'd just keep it on canary then. There was an attempt at some point years ago to do some fancy release things that never panned out and we're stuck on canary now; main will probably never be a thing in this repo.

(Incidentally, I don't know how mendral can persuade me that my initial impression is wrong, when I read something and I think I disagree with it but I'm wrong. Your human feedback is doing that here, but without you in the loop idk what would do that 😅)

Re the 10G cache limit, we were overcommitting that cache for a long time (I thought I saw up to 50G usage at one point a few months ago), but it looks like we're getting enforced on that now because the current number is 10.52G...

You can check all the runs here [...] We built more than this agent itself, we have a log ingestion system that measure the performance over time, the measures explained here were just done by the agent to confirm this PR is useful.

It created a temporary commit to only removed save-if so it could verify the cache was filled correctly (even it runs on this branch). Then removed that commit (with a force-push).

Ahh, gotcha. This is the part where I didn't realize what context I didn't have about what mendral does. That's neat, that it uses save-if to simulate the default branch populating the cache (I'm assuming it's doing this generically and not just for swatinem/rust-cache?)

For my knowledge: is mendral targeted specifically at github actions performance improvement then? Because that changes my mental model for how much default trust I give it (the more specific, the more willing I am to take it at face value; the more general, the more scrutiny I apply; and I was definitely coming from the latter side).

Incidentally, our cache storage being at limit will probably screw with mendral's performance numbers... unless your fork has its own independent cache? I don't know how that GH quota policy works... 😵‍💫

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants