init

relh · relh · commit 0a8c6981ef18 · 2025-12-23T22:37:47.000-05:00
no more .mpt Merge remote-tracking branch 'origin/main' into richard-unifympt slim policy spec handler more concise cleanup simplify Merge remote-tracking branch 'origin/main' into richard-unifympt re-add fix policy spex Update packages/mettagrid/python/src/mettagrid/util/uri_resolvers/schemes.py Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Merge remote-tracking branch 'origin/main' into richard-unifympt bundles Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt bundle Merge remote-tracking branch 'origin/richard-unifympt' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt simplify? ugh compat cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt cleanup tests Merge remote-tracking branch 'origin/main' into richard-unifympt more tests Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt simplify? Merge branch 'main' into richard-unifympt cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt no more .mpt remove all .mpt and lint cleanup local data path fixes mpt re-add re-add artifact lint Merge remote-tracking branch 'origin/main' into richard-unifympt more cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt diff cleanup ftt lint fix error Merge remote-tracking branch 'origin/main' into richard-unifympt more tests lint Merge remote-tracking branch 'origin/main' into richard-unifympt Merge remote-tracking branch 'origin/main' into richard-unifympt checkpoint policy does save/load lint checkpoint moving catcus lint Merge branch 'main' into richard-unifympt fold-in [pyright 4] Get pyright to pass on app_backend (#4478) Merge remote-tracking branch 'origin/main' into richard-unifympt Fix command, add space (#4456) added space to --app:lib--tlsEmulation:off which makes it --app:lib --tlsEmulation:off now it runs Rename HyperUpdateRule to ScheduleRule (#4483) ## Summary - rename HyperUpdateRule to ScheduleRule and apply to TrainerConfig via target_path - update recipes and teacher scheduling to use ScheduleRule - report PPO stats using ppo_actor/ppo_critic hyperparam keys and update tests ## Testing - not run (not requested) --------- Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Merge remote-tracking branch 'origin/main' into richard-unifympt Fix supervisor teacher behavior and legacy BC mode (#4484) ## Summary - gate PPO actor during supervisor teacher phase - fix supervisor/no-teacher behavior and add legacy BC (no gating, no PPO resume) - require supervisor policy URI for sliced_cloner_no_ppo ## Testing - not run (not requested) --------- Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Co-authored-by: Adam S <134907338+gustofied@users.noreply.github.com> Minor fixes to the slstm triton kernel, causing failures for certain kernel sizes (#4492) cleanup Merge remote-tracking branch 'origin/main' into richard-unifympt fold in training environments and eval environments mismatched (#4487) I ran a direct config comparison using the training entrypoint (recipes/experiment/cogs_v_clips.train) with variants=["heart_chorus"] and compared the eval suite config it builds (difficulty standard + heart_chorus) for an overlapping mission: hello_world.oxygen_bottleneck. Findings: - Compass is ON in both training and eval (global_obs.compass=True). - Vibe count and change‑vibe settings match (152 vibes; change_vibe.number_of_vibes=152). - But the mission parameters differ between training and eval for the same mission name: - game.objects.carbon_extractor.max_uses: train 25 vs eval 100 - game.objects.oxygen_extractor.max_uses: train 5 vs eval 20 - game.objects.germanium_extractor.max_uses: train 5 vs eval 20 - game.objects.silicon_extractor.max_uses: train 120 vs eval 480 So the mismatch isn’t compass — it’s the mission definitions used by training vs eval. Training uses base missions (cogames.cogs_vs_clips.missions), while eval uses integrated eval missions (cogames.cogs_vs_clips.evals.integrated_evals) that have different extractor settings. Also: the eval suite used by recipes/experiment/cogs_v_clips.train does not include machina_1.open_world at all (it only evaluates the 7 HELLO_WORLD integrated evals). So training can be creating hearts on easier missions while your eval runs on machina_1 are a different environment entirely. Relevant files: - Compass default: packages/cogames/src/cogames/cogs_vs_clips/mission.py - Training entrypoint + eval suite wiring: recipes/experiment/cogs_v_clips.py - Eval mission definitions: packages/cogames/src/cogames/cogs_vs_clips/evals/integrated_evals.py If you want true parity, we should align which mission templates eval uses (and/or include machina_1.open_world in the eval suite). I can patch this if you want — tell me whether you prefer: 1. Eval suite uses the same mission templates as training (from missions.py), or 2. Training uses the integrated eval mission definitions, or 3. Add machina_1.open_world to the eval suite. ripping out Merge remote-tracking branch 'origin/main' into richard-unifympt simplify fix and lint choke
diff --git a/agent/src/metta/agent/policies/vit.py b/agent/src/metta/agent/policies/vit.py
@@ -54,7 +54,7 @@ class ViTDefaultConfig(PolicyArchitecture):
 
     def make_policy(self, policy_env_info: PolicyEnvInterface) -> Policy:
         # If the architecture spec already bundled a component list (common for saved
-        # .mpt checkpoints), reuse it instead of regenerating with current defaults.
+        # checkpoint bundles), reuse it instead of regenerating with current defaults.
         # This keeps restored policies aligned with the shapes they were trained with.
         if self.components:
             return super().make_policy(policy_env_info)
diff --git a/metta/rl/mpt_artifact.py b/metta/rl/mpt_artifact.py
@@ -0,0 +1,139 @@
+from __future__ import annotations
+
+import tempfile
+import zipfile
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Mapping, MutableMapping, Protocol
+
+import torch
+from safetensors.torch import load as load_safetensors
+from safetensors.torch import save as save_safetensors
+
+from mettagrid.policy.checkpoint_policy import architecture_from_spec, prepare_state_dict_for_save
+from mettagrid.policy.policy_env_interface import PolicyEnvInterface
+from mettagrid.util.file import local_copy, write_file
+from mettagrid.util.uri_resolvers.schemes import parse_uri
+
+
+class PolicyArchitectureProtocol(Protocol):
+    def make_policy(self, policy_env_info: PolicyEnvInterface) -> Any: ...
+
+    def to_spec(self) -> str:
+        """Serialize this architecture to a string specification."""
+        ...
+
+    @classmethod
+    def from_spec(cls, spec: str) -> "PolicyArchitectureProtocol":
+        """Deserialize an architecture from a string specification."""
+        ...
+
+
+@dataclass
+class MptArtifact:
+    architecture: Any
+    state_dict: MutableMapping[str, torch.Tensor]
+
+    def instantiate(
+        self,
+        policy_env_info: PolicyEnvInterface,
+        device: str = "cpu",
+        *,
+        strict: bool = True,
+    ) -> Any:
+        torch_device = torch.device(device)
+
+        policy = self.architecture.make_policy(policy_env_info)
+        policy = policy.to(torch_device)
+
+        missing, unexpected = policy.load_state_dict(dict(self.state_dict), strict=strict)
+        if strict and (missing or unexpected):
+            raise RuntimeError(f"Strict loading failed. Missing: {missing}, Unexpected: {unexpected}")
+
+        if hasattr(policy, "initialize_to_environment"):
+            policy.initialize_to_environment(policy_env_info, torch_device)
+
+        return policy
+
+
+def load_mpt(uri: str) -> MptArtifact:
+    """Load an .mpt checkpoint from a local path or s3:// URI."""
+    with local_copy(uri) as local_path:
+        return _load_local_mpt_file(local_path)
+
+
+def _load_local_mpt_file(path: Path) -> MptArtifact:
+    if not path.exists():
+        raise FileNotFoundError(f"MPT file not found: {path}")
+
+    with zipfile.ZipFile(path, mode="r") as archive:
+        names = set(archive.namelist())
+
+        if "weights.safetensors" not in names:
+            raise ValueError(f"Invalid .mpt file: {path} (missing weights)")
+
+        if "modelarchitecture.txt" in names:
+            architecture_blob = archive.read("modelarchitecture.txt").decode("utf-8")
+        else:
+            raise ValueError(f"Invalid .mpt file: {path} (missing architecture)")
+        architecture = architecture_from_spec(architecture_blob)
+
+        weights_blob = archive.read("weights.safetensors")
+        state_dict = load_safetensors(weights_blob)
+        if not isinstance(state_dict, MutableMapping):
+            raise TypeError("Loaded safetensors state_dict is not a mutable mapping")
+
+    return MptArtifact(architecture=architecture, state_dict=state_dict)
+
+
+def save_mpt(
+    uri: str | Path,
+    *,
+    architecture: Any,
+    state_dict: Mapping[str, torch.Tensor],
+) -> str:
+    """Save an .mpt checkpoint to a URI or local path. Returns the saved URI."""
+    parsed = parse_uri(str(uri), allow_none=False)
+
+    if parsed.scheme == "s3":
+        with tempfile.NamedTemporaryFile(suffix=".mpt", delete=False) as tmp:
+            tmp_path = Path(tmp.name)
+        try:
+            _save_mpt_file_locally(tmp_path, architecture=architecture, state_dict=state_dict)
+            write_file(parsed.canonical, str(tmp_path))
+        finally:
+            tmp_path.unlink(missing_ok=True)
+        return parsed.canonical
+    else:
+        output_path = parsed.local_path or Path(str(uri)).expanduser().resolve()
+        _save_mpt_file_locally(output_path, architecture=architecture, state_dict=state_dict)
+        return f"file://{output_path.resolve()}"
+
+
+def _save_mpt_file_locally(
+    path: Path,
+    *,
+    architecture: Any,
+    state_dict: Mapping[str, torch.Tensor],
+) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    prepared_state = prepare_state_dict_for_save(state_dict)
+
+    with tempfile.NamedTemporaryFile(
+        dir=path.parent,
+        prefix=f".{path.name}.",
+        suffix=".tmp",
+        delete=False,
+    ) as temp_file:
+        temp_path = Path(temp_file.name)
+
+        try:
+            with zipfile.ZipFile(temp_path, mode="w", compression=zipfile.ZIP_DEFLATED) as archive:
+                weights_blob = save_safetensors(dict(prepared_state))
+                archive.writestr("weights.safetensors", weights_blob)
+                archive.writestr("modelarchitecture.txt", architecture.to_spec())
+
+            temp_path.replace(path)
+        except Exception:
+            temp_path.unlink(missing_ok=True)
+            raise
diff --git a/metta/rl/mpt_policy.py b/metta/rl/mpt_policy.py
@@ -0,0 +1,75 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+from metta.rl.mpt_artifact import load_mpt, save_mpt
+from mettagrid.policy.policy import AgentPolicy, MultiAgentPolicy
+from mettagrid.policy.policy_env_interface import PolicyEnvInterface
+from mettagrid.util.uri_resolvers.schemes import parse_uri
+
+
+class MptPolicy(MultiAgentPolicy):
+    """Load a policy from an .mpt checkpoint file.
+
+    The .mpt format stores weights and architecture configuration. This allows
+    loading trained policies without a build dependency on the training code.
+    """
+
+    short_names = ["mpt"]
+
+    def __init__(
+        self,
+        policy_env_info: PolicyEnvInterface,
+        *,
+        checkpoint_uri: str | None = None,
+        device: str = "cpu",
+        strict: bool = True,
+    ):
+        super().__init__(policy_env_info, device=device)
+
+        self._policy = None
+        self._architecture = None
+        self._strict = strict
+        self._device = device
+
+        if checkpoint_uri:
+            self._load_from_checkpoint(checkpoint_uri, device=device)
+
+    def _load_from_checkpoint(self, checkpoint_uri: str, *, device: str) -> None:
+        artifact = load_mpt(checkpoint_uri)
+        self._architecture = artifact.architecture
+        self._policy = artifact.instantiate(self._policy_env_info, device=device, strict=self._strict)
+        self._policy.eval()
+
+    def load_policy_data(self, policy_data_path: str) -> None:
+        self._load_from_checkpoint(policy_data_path, device=self._device)
+
+    def agent_policy(self, agent_id: int) -> AgentPolicy:
+        if self._policy is None:
+            raise RuntimeError("MptPolicy has not been initialized with checkpoint data")
+        return self._policy.agent_policy(agent_id)
+
+    def eval(self) -> "MptPolicy":
+        """Ensure wrapped policy enters eval mode for rollout/play compatibility."""
+        if self._policy is not None:
+            self._policy.eval()
+        return self
+
+    def save_policy(
+        self,
+        destination: str | Path,
+        *,
+        policy_architecture: Any | None = None,
+    ) -> str:
+        """Save the wrapped policy to a URI or local path."""
+        architecture = policy_architecture or self._architecture
+        if architecture is None:
+            raise ValueError("policy_architecture is required to save policy")
+        if self._policy is None:
+            raise ValueError("Policy has not been loaded; cannot save")
+
+        save_mpt(str(destination), architecture=architecture, state_dict=self._policy.state_dict())
+
+        parsed = parse_uri(str(destination), allow_none=False)
+        return parsed.canonical