Add environment state snapshotting for RL research by sarvanithin · Pull Request #53 · withmartian/ares

sarvanithin · 2026-01-26T16:56:13Z

Summary

Implements snapshot/restore functionality to save and replay episodes from specific checkpoints. Addresses issue #39.

Changes

Added EnvironmentSnapshot dataclass for serializing environment state
Implemented export_state() and load_from_state() methods on CodeBaseEnv
Added serialization support for both SWEBench and Harbor environments
Container filesystem saved as tarball with JSON metadata
Comprehensive test coverage (14 tests)
Example demonstrating usage in examples/03_state_snapshotting.py

Key Features

Snapshot creation at episode boundaries (after reset or final step)
Saves: task state, container filesystem, agent message history, metadata
Restores: full environment state from saved snapshots
Auto-generates UUID for snapshot IDs
Works with both Daytona and Docker containers

Limitations

Snapshots only work at episode boundaries (can't snapshot mid-episode)
Cannot serialize running async tasks/futures
Large filesystem snapshots (100MB-2GB tarballs)

Test Plan

All 14 snapshot tests passing
All 81 project tests passing
Linting passes (ruff check)
Formatting correct (ruff format)

Usage

# Export state
async with env:
    ts = await env.reset()
    # ... take some steps ...
    snap = await env.export_state(pathlib.Path("./snapshots"))

# Restore state
loaded_snap = EnvironmentSnapshot.load_from_file("./snapshots/abc-123/snapshot.json")
restored_env = await SweBenchEnv.load_from_state(loaded_snap)

Closes #39

Implements snapshot/restore functionality to save and replay episodes from specific checkpoints. Useful for debugging, trajectory analysis, and mechanistic interpretability. - Add EnvironmentSnapshot dataclass for serializing env state - Implement export_state() and load_from_state() methods on CodeBaseEnv - Support both SWEBench and Harbor environments - Save container filesystem as tarball with JSON metadata - Snapshots only work at episode boundaries (can't snapshot mid-episode) - Add comprehensive test coverage Closes withmartian#39

src/ares/environments/base.py

rsmith49 · 2026-01-26T18:59:56Z

Thanks for taking a crack at this @sarvanithin! This is definitely going to be a complex feature that will take a bit more work, and because of that we were delaying until after the planned repo launch on Thursday to focus on it. If you want to continue working on it in the meantime, there are a couple issues with the current approach here:

The usecase for snapshotting environments basically only cares about min-episode snapshots - so being able to fully reconstruct the environment + the latest messages to the agent at that point. Capturing the CodeAgent's run state here will be very tricky, and that's one of the reasons we're waiting to dive into this.
Given how many files are loaded, it would be much more ideal to use some kind of "Docker diff" functionality (if that exists, I have no idea) to basically only save & download files that differ from the base image. The current approach could still work, but I'd like to explore some lower filesize options first to see if anything exists.
We actually will be switching to HarborEnv (technically there will just be one CodeEnv) as the primary environment class - so just implementing methods in SweBenchEnv doesn't work unfortunately.

Moved load_from_state to SweBenchEnv and HarborEnv since they need different constructor arguments (tasks list). Base class now provides _restore_from_snapshot helper that subclasses call after init. - Add load_from_state implementation to SweBenchEnv - Add load_from_state implementation to HarborEnv - Refactor base class to use _restore_from_snapshot helper - Add test for load_from_state functionality

propel-code-bot · 2026-01-26T19:05:12Z

src/ares/environments/base.py

+        fs_path = snap.snapshot_dir / "container_fs.tar.gz"
+        if fs_path.exists():
+            await container.upload_dir(fs_path, "/")


[Logic] This will fail at runtime. download_dir("/", fs_path) writes a tarball to fs_path, but both our Docker and Daytona container implementations expect upload_dir(local_path, remote_path) to be called with local_path pointing to a directory so they can walk it and stream a new tar archive (see the existing usage in HarborEnv._compute_reward, where we upload an actual directory). When you hand them a .tar.gz file here they hit os.walk/tar.add on a file and raise NotADirectoryError, so restoration aborts before the filesystem is restored. Please unpack the archive to a temporary directory (or stream it directly via the container API) and pass that directory to upload_dir instead of the tarball path.

Context for Agents

This will fail at runtime. `download_dir("/", fs_path)` writes a tarball to `fs_path`, but both our Docker and Daytona container implementations expect `upload_dir(local_path, remote_path)` to be called with `local_path` pointing to a directory so they can walk it and stream a new tar archive (see the existing usage in `HarborEnv._compute_reward`, where we upload an actual directory). When you hand them a `.tar.gz` file here they hit `os.walk`/`tar.add` on a file and raise `NotADirectoryError`, so restoration aborts before the filesystem is restored. Please unpack the archive to a temporary directory (or stream it directly via the container API) and pass that directory to `upload_dir` instead of the tarball path. File: src/ares/environments/base.py Line: 496

propel-code-bot bot reviewed Jan 26, 2026

View reviewed changes

src/ares/environments/base.py Outdated Show resolved Hide resolved

propel-code-bot bot reviewed Jan 26, 2026

View reviewed changes

sarvanithin mentioned this pull request Jan 26, 2026

Add helper methods to eliminate duplicate null checks #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add environment state snapshotting for RL research#53

Add environment state snapshotting for RL research#53
sarvanithin wants to merge 2 commits intowithmartian:mainfrom
sarvanithin:main

sarvanithin commented Jan 26, 2026

Uh oh!

Uh oh!

rsmith49 commented Jan 26, 2026

Uh oh!

propel-code-bot bot Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sarvanithin commented Jan 26, 2026

Summary

Changes

Key Features

Limitations

Test Plan

Usage

Uh oh!

Uh oh!

rsmith49 commented Jan 26, 2026

Uh oh!

propel-code-bot bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants