Skip to content

Conversation

@relh
Copy link
Contributor

@relh relh commented Dec 22, 2025

Summary

  • rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via target_path
  • update recipes and teacher scheduling to use ScheduleRule
  • report PPO stats using ppo_actor/ppo_critic hyperparam keys and update tests

Testing

  • not run (not requested)

Copy link
Contributor Author

relh commented Dec 22, 2025

@relh relh changed the title scheduler rule for managing teacher Rename HyperUpdateRule to ScheduleRule Dec 22, 2025
@relh relh marked this pull request as ready for review December 22, 2025 16:13
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@relh relh force-pushed the rb-stack-2-schedule-rule branch from 7464590 to ce97bc2 Compare December 22, 2025 16:32
@relh relh force-pushed the rb-stack-2-schedule-rule branch from ce97bc2 to 888fe15 Compare December 22, 2025 16:41
@relh relh assigned subho406 and unassigned relh Dec 22, 2025
@relh relh force-pushed the rb-stack-2-schedule-rule branch from 888fe15 to b4e3638 Compare December 22, 2025 16:48
@relh relh force-pushed the rb-stack-2-schedule-rule branch from b4e3638 to 69e695f Compare December 22, 2025 16:57
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

## Summary
- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

## Summary
- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

## Summary
- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

## Summary
- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

## Summary
- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

## Summary
- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

## Summary
- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

## Summary
- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke

simplify submission zip creation

use policy_spec for submission zips

tighten checkpoint io helpers

shorten checkpoint arg help

inline checkpoint policy helpers

restore policy spec docstring

validate checkpoint data_path before download

require checkpoint directory URIs

expand policy spec s3 docstring
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

## Summary
- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

## Summary
- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke

simplify submission zip creation

use policy_spec for submission zips

tighten checkpoint io helpers

shorten checkpoint arg help

inline checkpoint policy helpers

restore policy spec docstring

validate checkpoint data_path before download

require checkpoint directory URIs

expand policy spec s3 docstring
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke

simplify submission zip creation

use policy_spec for submission zips

tighten checkpoint io helpers

shorten checkpoint arg help

inline checkpoint policy helpers

restore policy spec docstring

validate checkpoint data_path before download

require checkpoint directory URIs

expand policy spec s3 docstring
relh added a commit that referenced this pull request Dec 23, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke

simplify submission zip creation

use policy_spec for submission zips

tighten checkpoint io helpers

shorten checkpoint arg help

inline checkpoint policy helpers

restore policy spec docstring

validate checkpoint data_path before download

require checkpoint directory URIs

expand policy spec s3 docstring
relh added a commit that referenced this pull request Dec 24, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke

simplify submission zip creation

use policy_spec for submission zips

tighten checkpoint io helpers

shorten checkpoint arg help

inline checkpoint policy helpers

restore policy spec docstring

validate checkpoint data_path before download

require checkpoint directory URIs

expand policy spec s3 docstring
relh added a commit that referenced this pull request Dec 24, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

## Summary
- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

## Summary
- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

## Testing
- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
relh added a commit that referenced this pull request Dec 24, 2025
no more .mpt

Merge remote-tracking branch 'origin/main' into richard-unifympt

slim

policy spec handler

more concise

cleanup

simplify

Merge remote-tracking branch 'origin/main' into richard-unifympt

re-add

fix policy spex

Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundles

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

bundle

Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

ugh compat

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

cleanup

tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify?

Merge branch 'main' into richard-unifympt

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

no more .mpt

remove all .mpt and lint

cleanup

local data path fixes

mpt re-add

re-add artifact

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

more cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

diff cleanup

ftt

lint

fix error

Merge remote-tracking branch 'origin/main' into richard-unifympt

more tests

lint

Merge remote-tracking branch 'origin/main' into richard-unifympt

Merge remote-tracking branch 'origin/main' into richard-unifympt

checkpoint policy does save/load

lint

checkpoint moving

catcus

lint

Merge branch 'main' into richard-unifympt

fold-in

[pyright 4] Get pyright to pass on app_backend (#4478)

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix command, add space (#4456)

added space to --app:lib--tlsEmulation:off which makes it --app:lib
--tlsEmulation:off
now it runs

Rename HyperUpdateRule to ScheduleRule (#4483)

- rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via
target_path
- update recipes and teacher scheduling to use ScheduleRule
- report PPO stats using ppo_actor/ppo_critic hyperparam keys and update
tests

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into richard-unifympt

Fix supervisor teacher behavior and legacy BC mode (#4484)

- gate PPO actor during supervisor teacher phase
- fix supervisor/no-teacher behavior and add legacy BC (no gating, no
PPO resume)
- require supervisor policy URI for sliced_cloner_no_ppo

- not run (not requested)

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com>

Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492)

cleanup

Merge remote-tracking branch 'origin/main' into richard-unifympt

fold in

training environments and eval environments mismatched (#4487)

I ran a direct config comparison using the training entrypoint
(recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"]
and compared the eval suite
config it builds (difficulty standard + heart_chorus) for an overlapping
mission: hello_world.oxygen_bottleneck.

  Findings:

  - Compass is ON in both training and eval (global_obs.compass=True).
- Vibe count and change‑vibe settings match (152 vibes;
change_vibe.number_of_vibes=152).
- But the mission parameters differ between training and eval for the
same mission name:
      - game.objects.carbon_extractor.max_uses: train 25 vs eval 100
      - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20
      - game.objects.germanium_extractor.max_uses: train 5 vs eval 20
      - game.objects.silicon_extractor.max_uses: train 120 vs eval 480

So the mismatch isn’t compass — it’s the mission definitions used by
training vs eval. Training uses base missions
(cogames.cogs_vs_clips.missions), while eval uses
integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals)
that have different extractor settings.

Also: the eval suite used by recipes/experiment/cogs_v_clips.train does
not include machina_1.open_world at all (it only evaluates the 7
HELLO_WORLD integrated
evals). So training can be creating hearts on easier missions while your
eval runs on machina_1 are a different environment entirely.

  Relevant files:

- Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py
- Training entrypoint + eval suite wiring:
recipes/experiment/cogs_v_clips.py
- Eval mission definitions:
packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py

If you want true parity, we should align which mission templates eval
uses (and/or include machina_1.open_world in the eval suite). I can
patch this if you want —
  tell me whether you prefer:

1. Eval suite uses the same mission templates as training (from
missions.py), or
  2. Training uses the integrated eval mission definitions, or
  3. Add machina_1.open_world to the eval suite.

ripping out

Merge remote-tracking branch 'origin/main' into richard-unifympt

simplify

fix and lint

choke
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants