Recipes: switch to checkpoint directory URIs #4498

relh · 2025-12-22T21:36:14Z

What

update experiment recipes to use checkpoint directory URIs
adjust CVC mission variant curriculum checkpoint paths
refresh ABES/quantile/cloner recipes to the bundle format
align recipe defaults with the new checkpoint loading flow

Why

keep examples consistent with policy_spec bundles
avoid legacy .mpt or raw weight paths in recipes
make local and S3 checkpoints interchangeable
reduce friction when running evals from recipes

relh · 2025-12-22T21:36:28Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

Recipes: switch to checkpoint directory URIs #4498 👈 (View in Graphite)
Remove .mpt paths, expand metta:// policy resolution #4497
Use CheckpointPolicy broadly #4508 : 1 other dependent PR (#4496 )
CheckpointPolicy bundles everywhere (replace .mpt save/load) #4502
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

datadog-official · 2025-12-22T21:40:23Z

⚠️ Tests

⚠️ Warnings

🧪 57 Tests failed

    test_alternate_run_format[training_facility.harvest] from test_all_games_eval.py

    test_mission_run[evals.diagnostic_agile] from test_all_games_eval.py

    test_mission_run[evals.diagnostic_agile_hard] from test_all_games_eval.py
View all

ℹ️ Info

❄️ No new flaky tests detected

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 3ad7a52 | Docs | Was this helpful? Give us feedback!}

no more .mpt Merge remote-tracking branch 'origin/main' into richard-unifympt slim policy spec handler more concise cleanup simplify Merge remote-tracking branch 'origin/main' into richard-unifympt re-add fix policy spex Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Merge remote-tracking branch 'origin/main' into richard-unifympt bundles Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt bundle Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt simplify? ugh compat cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt cleanup tests Merge remote-tracking branch 'origin/main' into richard-unifympt more tests Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt simplify? Merge branch 'main' into richard-unifympt cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt no more .mpt remove all .mpt and lint cleanup local data path fixes mpt re-add re-add artifact lint Merge remote-tracking branch 'origin/main' into richard-unifympt more cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt diff cleanup ftt lint fix error Merge remote-tracking branch 'origin/main' into richard-unifympt more tests lint Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt checkpoint policy does save/load lint checkpoint moving catcus lint Merge branch 'main' into richard-unifympt fold-in [pyright 4] Get pyright to pass on app_backend (#4478) Merge remote-tracking branch 'origin/main' into richard-unifympt Fix command, add space (#4456) added space to --app:lib--tlsEmulation:off which makes it --app:lib --tlsEmulation:off now it runs Rename HyperUpdateRule to ScheduleRule (#4483) - rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via target_path - update recipes and teacher scheduling to use ScheduleRule - report PPO stats using ppo_actor/ppo_critic hyperparam keys and update tests - not run (not requested) --------- Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Merge remote-tracking branch 'origin/main' into richard-unifympt Fix supervisor teacher behavior and legacy BC mode (#4484) - gate PPO actor during supervisor teacher phase - fix supervisor/no-teacher behavior and add legacy BC (no gating, no PPO resume) - require supervisor policy URI for sliced_cloner_no_ppo - not run (not requested) --------- Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com> Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492) cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt fold in training environments and eval environments mismatched (#4487) I ran a direct config comparison using the training entrypoint (recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"] and compared the eval suite config it builds (difficulty standard + heart_chorus) for an overlapping mission: hello_world.oxygen_bottleneck. Findings: - Compass is ON in both training and eval (global_obs.compass=True). - Vibe count and change‑vibe settings match (152 vibes; change_vibe.number_of_vibes=152). - But the mission parameters differ between training and eval for the same mission name: - game.objects.carbon_extractor.max_uses: train 25 vs eval 100 - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20 - game.objects.germanium_extractor.max_uses: train 5 vs eval 20 - game.objects.silicon_extractor.max_uses: train 120 vs eval 480 So the mismatch isn’t compass — it’s the mission definitions used by training vs eval. Training uses base missions (cogames.cogs_vs_clips.missions), while eval uses integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals) that have different extractor settings. Also: the eval suite used by recipes/experiment/cogs_v_clips.train does not include machina_1.open_world at all (it only evaluates the 7 HELLO_WORLD integrated evals). So training can be creating hearts on easier missions while your eval runs on machina_1 are a different environment entirely. Relevant files: - Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py - Training entrypoint + eval suite wiring: recipes/experiment/cogs_v_clips.py - Eval mission definitions: packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py If you want true parity, we should align which mission templates eval uses (and/or include machina_1.open_world in the eval suite). I can patch this if you want — tell me whether you prefer: 1. Eval suite uses the same mission templates as training (from missions.py), or 2. Training uses the integrated eval mission definitions, or 3. Add machina_1.open_world to the eval suite. ripping out Merge remote-tracking branch 'origin/main' into richard-unifympt simplify fix and lint choke

relh · 2025-12-23T21:27:22Z

Folded into #4497 as part of stack consolidation.

This was referenced Dec 22, 2025

Tighten checkpoint URI resolution + docs #4496

Closed

Remove .mpt paths, expand metta:// policy resolution #4497

Closed

relh changed the title ~~init~~ Update experiment recipes to policy_spec checkpoints Dec 22, 2025

relh marked this pull request as ready for review December 22, 2025 21:39

github-actions bot assigned relh Dec 22, 2025

relh mentioned this pull request Dec 22, 2025

CheckpointPolicy bundles everywhere (replace .mpt save/load) #4502

Open

relh force-pushed the mpt-artifact-relocation-v2 branch from 5710760 to 4df89a7 Compare December 22, 2025 22:42

relh force-pushed the recipes-update-v2 branch 2 times, most recently from 14280dc to 1e96191 Compare December 22, 2025 22:52

relh force-pushed the mpt-artifact-relocation-v2 branch from 4df89a7 to 246fffc Compare December 22, 2025 22:52

relh force-pushed the recipes-update-v2 branch from 1e96191 to 44e9073 Compare December 22, 2025 22:56

relh force-pushed the mpt-artifact-relocation-v2 branch 2 times, most recently from fbac88c to 7bb2b76 Compare December 22, 2025 23:00

relh force-pushed the recipes-update-v2 branch 2 times, most recently from 7552087 to 13ca835 Compare December 22, 2025 23:15

relh force-pushed the mpt-artifact-relocation-v2 branch from 7bb2b76 to 3f03c28 Compare December 22, 2025 23:15

relh force-pushed the recipes-update-v2 branch from 13ca835 to f183bf5 Compare December 22, 2025 23:20

relh force-pushed the mpt-artifact-relocation-v2 branch from 3f03c28 to 419d120 Compare December 22, 2025 23:20

relh force-pushed the recipes-update-v2 branch from f183bf5 to ac9ec48 Compare December 22, 2025 23:28

relh force-pushed the mpt-artifact-relocation-v2 branch from 419d120 to 7f66c78 Compare December 22, 2025 23:28

relh force-pushed the recipes-update-v2 branch from ac9ec48 to 3625286 Compare December 22, 2025 23:33

relh force-pushed the mpt-artifact-relocation-v2 branch from 7f66c78 to b09b28d Compare December 22, 2025 23:33

relh force-pushed the recipes-update-v2 branch from 3625286 to 94305ca Compare December 23, 2025 00:03

relh force-pushed the mpt-artifact-relocation-v2 branch 2 times, most recently from adb552d to f78453b Compare December 23, 2025 00:42

relh force-pushed the recipes-update-v2 branch 2 times, most recently from 660ed12 to f00a69c Compare December 23, 2025 00:47

relh force-pushed the mpt-artifact-relocation-v2 branch from f78453b to 1ca20f6 Compare December 23, 2025 00:47

relh force-pushed the mpt-artifact-relocation-v2 branch from 4a73d38 to b686217 Compare December 23, 2025 18:54

relh force-pushed the recipes-update-v2 branch from f4cc09f to 7228dc6 Compare December 23, 2025 18:54

relh force-pushed the mpt-artifact-relocation-v2 branch from b686217 to 9b77c18 Compare December 23, 2025 18:57

relh force-pushed the recipes-update-v2 branch 2 times, most recently from 024a193 to 4a77289 Compare December 23, 2025 19:00

relh force-pushed the mpt-artifact-relocation-v2 branch 2 times, most recently from 86f651f to 8605574 Compare December 23, 2025 19:05

relh force-pushed the recipes-update-v2 branch 2 times, most recently from df399f5 to 8c89221 Compare December 23, 2025 19:06

relh force-pushed the mpt-artifact-relocation-v2 branch from 8605574 to 3d752e1 Compare December 23, 2025 19:06

relh force-pushed the recipes-update-v2 branch from 8c89221 to 0a58265 Compare December 23, 2025 19:07

relh force-pushed the mpt-artifact-relocation-v2 branch 2 times, most recently from 2b5f42d to e24540c Compare December 23, 2025 19:10

relh force-pushed the recipes-update-v2 branch 3 times, most recently from aa68564 to 2cc70c2 Compare December 23, 2025 19:15

relh force-pushed the mpt-artifact-relocation-v2 branch from 035ec07 to 3fdee18 Compare December 23, 2025 19:15

relh changed the title ~~Update recipes for checkpoint directory URIs~~ Update recipes to checkpoint directory URIs Dec 23, 2025

relh force-pushed the mpt-artifact-relocation-v2 branch from 3fdee18 to 9b138ca Compare December 23, 2025 20:20

relh force-pushed the recipes-update-v2 branch from 2cc70c2 to 4ebc693 Compare December 23, 2025 20:20

relh force-pushed the mpt-artifact-relocation-v2 branch from 9b138ca to bf1e9e0 Compare December 23, 2025 20:28

relh force-pushed the recipes-update-v2 branch from 4ebc693 to 78bd3c2 Compare December 23, 2025 20:28

relh force-pushed the mpt-artifact-relocation-v2 branch from bf1e9e0 to f08f91d Compare December 23, 2025 20:53

relh force-pushed the recipes-update-v2 branch from 78bd3c2 to 7cf1abf Compare December 23, 2025 20:53

relh force-pushed the mpt-artifact-relocation-v2 branch from f08f91d to 1c52af2 Compare December 23, 2025 20:56

relh force-pushed the recipes-update-v2 branch from 7cf1abf to 3ad7a52 Compare December 23, 2025 20:56

relh changed the title ~~Update recipes to checkpoint directory URIs~~ Recipes: switch to checkpoint directory URIs Dec 23, 2025

relh closed this Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recipes: switch to checkpoint directory URIs #4498

Recipes: switch to checkpoint directory URIs #4498

Uh oh!

relh commented Dec 22, 2025 •

edited

Loading

Uh oh!

relh commented Dec 22, 2025 •

edited

Loading

Uh oh!

datadog-official bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

relh commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Recipes: switch to checkpoint directory URIs #4498

Recipes: switch to checkpoint directory URIs #4498

Uh oh!

Conversation

relh commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Uh oh!

relh commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

relh commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

relh commented Dec 22, 2025 •

edited

Loading

relh commented Dec 22, 2025 •

edited

Loading

datadog-official bot commented Dec 22, 2025 •

edited

Loading