Skip to content

Add environment state snapshotting for RL research#53

Open
sarvanithin wants to merge 2 commits intowithmartian:mainfrom
sarvanithin:main
Open

Add environment state snapshotting for RL research#53
sarvanithin wants to merge 2 commits intowithmartian:mainfrom
sarvanithin:main

Conversation

@sarvanithin
Copy link
Contributor

Summary

Implements snapshot/restore functionality to save and replay episodes from specific checkpoints. Addresses issue #39.

Changes

  • Added EnvironmentSnapshot dataclass for serializing environment state
  • Implemented export_state() and load_from_state() methods on CodeBaseEnv
  • Added serialization support for both SWEBench and Harbor environments
  • Container filesystem saved as tarball with JSON metadata
  • Comprehensive test coverage (14 tests)
  • Example demonstrating usage in examples/03_state_snapshotting.py

Key Features

  • Snapshot creation at episode boundaries (after reset or final step)
  • Saves: task state, container filesystem, agent message history, metadata
  • Restores: full environment state from saved snapshots
  • Auto-generates UUID for snapshot IDs
  • Works with both Daytona and Docker containers

Limitations

  • Snapshots only work at episode boundaries (can't snapshot mid-episode)
  • Cannot serialize running async tasks/futures
  • Large filesystem snapshots (100MB-2GB tarballs)

Test Plan

  • All 14 snapshot tests passing
  • All 81 project tests passing
  • Linting passes (ruff check)
  • Formatting correct (ruff format)

Usage

# Export state
async with env:
    ts = await env.reset()
    # ... take some steps ...
    snap = await env.export_state(pathlib.Path("./snapshots"))

# Restore state
loaded_snap = EnvironmentSnapshot.load_from_file("./snapshots/abc-123/snapshot.json")
restored_env = await SweBenchEnv.load_from_state(loaded_snap)

Closes #39

Implements snapshot/restore functionality to save and replay episodes
from specific checkpoints. Useful for debugging, trajectory analysis,
and mechanistic interpretability.

- Add EnvironmentSnapshot dataclass for serializing env state
- Implement export_state() and load_from_state() methods on CodeBaseEnv
- Support both SWEBench and Harbor environments
- Save container filesystem as tarball with JSON metadata
- Snapshots only work at episode boundaries (can't snapshot mid-episode)
- Add comprehensive test coverage

Closes withmartian#39
@rsmith49
Copy link
Contributor

Thanks for taking a crack at this @sarvanithin! This is definitely going to be a complex feature that will take a bit more work, and because of that we were delaying until after the planned repo launch on Thursday to focus on it. If you want to continue working on it in the meantime, there are a couple issues with the current approach here:

  • The usecase for snapshotting environments basically only cares about min-episode snapshots - so being able to fully reconstruct the environment + the latest messages to the agent at that point. Capturing the CodeAgent's run state here will be very tricky, and that's one of the reasons we're waiting to dive into this.
  • Given how many files are loaded, it would be much more ideal to use some kind of "Docker diff" functionality (if that exists, I have no idea) to basically only save & download files that differ from the base image. The current approach could still work, but I'd like to explore some lower filesize options first to see if anything exists.
  • We actually will be switching to HarborEnv (technically there will just be one CodeEnv) as the primary environment class - so just implementing methods in SweBenchEnv doesn't work unfortunately.

Moved load_from_state to SweBenchEnv and HarborEnv since they need
different constructor arguments (tasks list). Base class now provides
_restore_from_snapshot helper that subclasses call after init.

- Add load_from_state implementation to SweBenchEnv
- Add load_from_state implementation to HarborEnv
- Refactor base class to use _restore_from_snapshot helper
- Add test for load_from_state functionality
Comment on lines +494 to +496
fs_path = snap.snapshot_dir / "container_fs.tar.gz"
if fs_path.exists():
await container.upload_dir(fs_path, "/")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical

[Logic] This will fail at runtime. download_dir("/", fs_path) writes a tarball to fs_path, but both our Docker and Daytona container implementations expect upload_dir(local_path, remote_path) to be called with local_path pointing to a directory so they can walk it and stream a new tar archive (see the existing usage in HarborEnv._compute_reward, where we upload an actual directory). When you hand them a .tar.gz file here they hit os.walk/tar.add on a file and raise NotADirectoryError, so restoration aborts before the filesystem is restored. Please unpack the archive to a temporary directory (or stream it directly via the container API) and pass that directory to upload_dir instead of the tarball path.

Context for Agents
This will fail at runtime. `download_dir("/", fs_path)` writes a tarball to `fs_path`, but both our Docker and Daytona container implementations expect `upload_dir(local_path, remote_path)` to be called with `local_path` pointing to a directory so they can walk it and stream a new tar archive (see the existing usage in `HarborEnv._compute_reward`, where we upload an actual directory). When you hand them a `.tar.gz` file here they hit `os.walk`/`tar.add` on a file and raise `NotADirectoryError`, so restoration aborts before the filesystem is restored. Please unpack the archive to a temporary directory (or stream it directly via the container API) and pass that directory to `upload_dir` instead of the tarball path.

File: src/ares/environments/base.py
Line: 496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a way to snapshot state

2 participants