Skip to content

Commit caed3d0

Browse files
committed
init
no more .mpt Merge remote-tracking branch 'origin/main' into richard-unifympt slim policy spec handler more concise cleanup simplify Merge remote-tracking branch 'origin/main' into richard-unifympt re-add fix policy spex Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Merge remote-tracking branch 'origin/main' into richard-unifympt bundles Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt bundle Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt simplify? ugh compat cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt cleanup tests Merge remote-tracking branch 'origin/main' into richard-unifympt more tests Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt simplify? Merge branch 'main' into richard-unifympt cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt no more .mpt remove all .mpt and lint cleanup local data path fixes mpt re-add re-add artifact lint Merge remote-tracking branch 'origin/main' into richard-unifympt more cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt diff cleanup ftt lint fix error Merge remote-tracking branch 'origin/main' into richard-unifympt more tests lint Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt checkpoint policy does save/load lint checkpoint moving catcus lint Merge branch 'main' into richard-unifympt fold-in [pyright 4] Get pyright to pass on app_backend (#4478) Merge remote-tracking branch 'origin/main' into richard-unifympt Fix command, add space (#4456) added space to --app:lib--tlsEmulation:off which makes it --app:lib --tlsEmulation:off now it runs Rename HyperUpdateRule to ScheduleRule (#4483) - rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via target_path - update recipes and teacher scheduling to use ScheduleRule - report PPO stats using ppo_actor/ppo_critic hyperparam keys and update tests - not run (not requested) --------- Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Merge remote-tracking branch 'origin/main' into richard-unifympt Fix supervisor teacher behavior and legacy BC mode (#4484) - gate PPO actor during supervisor teacher phase - fix supervisor/no-teacher behavior and add legacy BC (no gating, no PPO resume) - require supervisor policy URI for sliced_cloner_no_ppo - not run (not requested) --------- Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com> Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492) cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt fold in training environments and eval environments mismatched (#4487) I ran a direct config comparison using the training entrypoint (recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"] and compared the eval suite config it builds (difficulty standard + heart_chorus) for an overlapping mission: hello_world.oxygen_bottleneck. Findings: - Compass is ON in both training and eval (global_obs.compass=True). - Vibe count and change‑vibe settings match (152 vibes; change_vibe.number_of_vibes=152). - But the mission parameters differ between training and eval for the same mission name: - game.objects.carbon_extractor.max_uses: train 25 vs eval 100 - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20 - game.objects.germanium_extractor.max_uses: train 5 vs eval 20 - game.objects.silicon_extractor.max_uses: train 120 vs eval 480 So the mismatch isn’t compass — it’s the mission definitions used by training vs eval. Training uses base missions (cogames.cogs_vs_clips.missions), while eval uses integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals) that have different extractor settings. Also: the eval suite used by recipes/experiment/cogs_v_clips.train does not include machina_1.open_world at all (it only evaluates the 7 HELLO_WORLD integrated evals). So training can be creating hearts on easier missions while your eval runs on machina_1 are a different environment entirely. Relevant files: - Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py - Training entrypoint + eval suite wiring: recipes/experiment/cogs_v_clips.py - Eval mission definitions: packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py If you want true parity, we should align which mission templates eval uses (and/or include machina_1.open_world in the eval suite). I can patch this if you want — tell me whether you prefer: 1. Eval suite uses the same mission templates as training (from missions.py), or 2. Training uses the integrated eval mission definitions, or 3. Add machina_1.open_world to the eval suite. ripping out Merge remote-tracking branch 'origin/main' into richard-unifympt simplify fix and lint choke
1 parent 7d79b2e commit caed3d0

File tree

9 files changed

+33
-26
lines changed

9 files changed

+33
-26
lines changed

recipes/experiment/abes/kickstart/checked.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def train(
112112
ppo_critic=PPOCriticConfig(enabled=True),
113113
sl_checkpointed_kickstarter=SLCheckpointedKickstarterConfig(
114114
enabled=True,
115-
teacher_uri="s3://softmax-public/policies/av.teach.24checks.11.10.10/av.teach.24checks.11.10.10:v8016.mpt",
115+
teacher_uri="s3://softmax-public/policies/av.teach.24checks.11.10.10/av.teach.24checks.11.10.10:v8016",
116116
checkpointed_interval=24,
117117
epochs_per_checkpoint=1,
118118
terminating_epoch=334,

recipes/experiment/abes/kickstart/cortex_100m.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ def train(
150150
losses_config = LossesConfig()
151151
default_teacher_steps = 600_000_000
152152
teacher = teacher or TeacherConfig(
153-
policy_uri="s3://softmax-public/policies/subho.abes.vit_baseline/subho.abes.vit_baseline:v2340.mpt",
153+
policy_uri="s3://softmax-public/policies/subho.abes.vit_baseline/subho.abes.vit_baseline:v2340",
154154
mode="sliced_kickstarter",
155155
steps=default_teacher_steps,
156156
teacher_led_proportion=0.2,
@@ -192,11 +192,11 @@ def evaluate(policy_uris: Optional[Sequence[str]] = None) -> EvaluateTool:
192192

193193
def evaluate_latest_in_dir(dir_path: Path) -> EvaluateTool:
194194
"""Evaluate the latest policy on arena simulations."""
195-
checkpoints = dir_path.glob("*.mpt")
196-
policy_uri = [checkpoint.as_posix() for checkpoint in sorted(checkpoints, key=lambda x: x.stat().st_mtime)]
197-
if not policy_uri:
195+
checkpoints = [p for p in dir_path.iterdir() if p.is_dir() and (p / "policy_spec.json").exists()]
196+
checkpoints = sorted(checkpoints, key=lambda x: x.stat().st_mtime)
197+
if not checkpoints:
198198
raise ValueError(f"No policies found in {dir_path}")
199-
policy_uri = policy_uri[-1]
199+
policy_uri = checkpoints[-1].as_posix()
200200
sim = mettagrid(num_agents=6)
201201
return EvaluateTool(
202202
simulations=[SimulationConfig(suite="arena", name="very_basic", env=sim)], policy_uris=[policy_uri]

recipes/experiment/abes/kickstart/logit.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ def train(
110110
losses_config = LossesConfig()
111111
trainer_cfg = TrainerConfig(losses=losses_config)
112112
teacher = teacher or TeacherConfig(
113-
policy_uri="s3://softmax-public/policies/av.sliced.mb.11.22.110.ctrl/av.sliced.mb.11.22.110.ctrl:v9900.mpt",
113+
policy_uri="s3://softmax-public/policies/av.sliced.mb.11.22.110.ctrl/av.sliced.mb.11.22.110.ctrl:v9900",
114114
mode="logit_kickstarter",
115115
steps=1_000_000_000,
116116
teacher_led_proportion=1.0,
@@ -169,11 +169,11 @@ def evaluate(policy_uris: Optional[Sequence[str]] = None) -> EvaluateTool:
169169

170170
def evaluate_latest_in_dir(dir_path: Path) -> EvaluateTool:
171171
"""Evaluate the latest policy on arena simulations."""
172-
checkpoints = dir_path.glob("*.mpt")
173-
policy_uri = [checkpoint.as_posix() for checkpoint in sorted(checkpoints, key=lambda x: x.stat().st_mtime)]
174-
if not policy_uri:
172+
checkpoints = [p for p in dir_path.iterdir() if p.is_dir() and (p / "policy_spec.json").exists()]
173+
checkpoints = sorted(checkpoints, key=lambda x: x.stat().st_mtime)
174+
if not checkpoints:
175175
raise ValueError(f"No policies found in {dir_path}")
176-
policy_uri = policy_uri[-1]
176+
policy_uri = checkpoints[-1].as_posix()
177177
sim = mettagrid(num_agents=6)
178178
return EvaluateTool(
179179
simulations=[SimulationConfig(suite="arena", name="very_basic", env=sim)], policy_uris=[policy_uri]

recipes/experiment/abes/kickstart/sliced.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ def train(
114114
losses_config = LossesConfig()
115115
trainer_cfg = TrainerConfig(losses=losses_config)
116116
teacher = teacher or TeacherConfig(
117-
policy_uri="s3://softmax-public/policies/av.student.11.26.28/av.student.11.26.28:v4000.mpt",
117+
policy_uri="s3://softmax-public/policies/av.student.11.26.28/av.student.11.26.28:v4000",
118118
mode="sliced_kickstarter",
119119
steps=1_000_000_000,
120120
teacher_led_proportion=0.2,
@@ -148,11 +148,11 @@ def evaluate(policy_uris: Optional[Sequence[str]] = None) -> EvaluateTool:
148148

149149
def evaluate_latest_in_dir(dir_path: Path) -> EvaluateTool:
150150
"""Evaluate the latest policy on arena simulations."""
151-
checkpoints = dir_path.glob("*.mpt")
152-
policy_uri = [checkpoint.as_posix() for checkpoint in sorted(checkpoints, key=lambda x: x.stat().st_mtime)]
153-
if not policy_uri:
151+
checkpoints = [p for p in dir_path.iterdir() if p.is_dir() and (p / "policy_spec.json").exists()]
152+
checkpoints = sorted(checkpoints, key=lambda x: x.stat().st_mtime)
153+
if not checkpoints:
154154
raise ValueError(f"No policies found in {dir_path}")
155-
policy_uri = policy_uri[-1]
155+
policy_uri = checkpoints[-1].as_posix()
156156
sim = mettagrid(num_agents=6)
157157
return EvaluateTool(
158158
simulations=[SimulationConfig(suite="arena", name="very_basic", env=sim)], policy_uris=[policy_uri]

recipes/experiment/abes/quantile.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,11 @@ def evaluate(policy_uris: Optional[Sequence[str]] = None) -> EvaluateTool:
132132

133133
def evaluate_latest_in_dir(dir_path: Path) -> EvaluateTool:
134134
"""Evaluate the latest policy on arena simulations."""
135-
checkpoints = dir_path.glob("*.mpt")
136-
policy_uri = [checkpoint.as_posix() for checkpoint in sorted(checkpoints, key=lambda x: x.stat().st_mtime)]
137-
if not policy_uri:
135+
checkpoints = [p for p in dir_path.iterdir() if p.is_dir() and (p / "policy_spec.json").exists()]
136+
checkpoints = sorted(checkpoints, key=lambda x: x.stat().st_mtime)
137+
if not checkpoints:
138138
raise ValueError(f"No policies found in {dir_path}")
139-
policy_uri = policy_uri[-1]
139+
policy_uri = checkpoints[-1].as_posix()
140140
sim = mettagrid(num_agents=6)
141141
return EvaluateTool(
142142
simulations=[SimulationConfig(suite="arena", name="very_basic", env=sim)], policy_uris=[policy_uri]

recipes/experiment/cogs_v_clips.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ def make_curriculum(
308308

309309

310310
# uv run cogames submit \
311-
# -p class=mpt,kw.checkpoint_uri=s3://softmax-public/policies/...:v1.mpt \
311+
# -p class=checkpoint,data=s3://softmax-public/policies/...:v1 \
312312
# -n your-policy-name-for-leaderboard \
313313
# --skip-validation
314314
#

recipes/experiment/cvc/cloner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def make_curriculum(
112112
# How to submit a policy trained here to the CoGames leaderboard:
113113
#
114114
# uv run cogames submit \
115-
# -p class=mpt,kw.checkpoint_uri=s3://softmax-public/policies/...:v1.mpt \
115+
# -p class=checkpoint,data=s3://softmax-public/policies/...:v1 \
116116
# -n your-policy-name-for-leaderboard \
117117
# --skip-validation
118118
#

recipes/experiment/cvc/mission_variant_curriculum.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -554,12 +554,19 @@ def _get_policy_action_space(policy_uri: str) -> Optional[int]:
554554
return None
555555

556556
try:
557-
from metta.rl.mpt_artifact import load_mpt
557+
from pathlib import Path
558558

559-
artifact = load_mpt(policy_uri)
559+
from safetensors.torch import load as load_safetensors
560+
561+
from mettagrid.util.uri_resolvers.schemes import policy_spec_from_uri
562+
563+
spec = policy_spec_from_uri(policy_uri)
564+
if not spec.data_path:
565+
return None
566+
state_dict = load_safetensors(Path(spec.data_path).read_bytes())
560567

561568
# Look for actor head weight to determine action space
562-
for key, tensor in artifact.state_dict.items():
569+
for key, tensor in state_dict.items():
563570
if "actor_head" in key and "weight" in key and len(tensor.shape) == 2:
564571
return tensor.shape[0]
565572
return None

recipes/experiment/cvc/sliced_cloner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ def make_curriculum(
136136
# How to submit a policy trained here to the CoGames leaderboard:
137137
#
138138
# uv run cogames submit \
139-
# -p class=mpt,kw.checkpoint_uri=s3://softmax-public/policies/...:v1.mpt \
139+
# -p class=checkpoint,data=s3://softmax-public/policies/...:v1 \
140140
# -n your-policy-name-for-leaderboard \
141141
# --skip-validation
142142
#

0 commit comments

Comments
 (0)