From 1f01d1a411860cfd8cec37e16a1e22d8fdba6352 Mon Sep 17 00:00:00 2001
From: Mackenzie Zastrow <zastrowm@users.noreply.github.com>
Date: Wed, 28 Jan 2026 16:16:00 -0500
Subject: [PATCH 1/4] design: Propose low-level snapshot api

---
 designs/snapshot-api.md | 135 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)
 create mode 100644 designs/snapshot-api.md

diff --git a/designs/snapshot-api.md b/designs/snapshot-api.md
new file mode 100644
index 00000000..8424a086
--- /dev/null
+++ b/designs/snapshot-api.md
@@ -0,0 +1,135 @@
+# Design Doc: Low-Level Snapshot API
+
+**Status**: Proposed
+
+**Date**: 2026-01-28
+
+**Issue**: https://github.com/strands-agents/sdk-python/issues/1138
+
+## Context
+
+Developers need a way to preserve and restore the exact state of an agent at a specific point in time. The existing SessionManagement doesn't address this:
+
+- SessionManager works in the background, incrementally recording messages rather than full state. This means it's not possible to restore to arbitrary points in time.
+- After a message is saved, there is no way to modify it and have it recorded in session-management, preventing more advance context-management strategies while being able to pause & restore agents.
+- There is no way to proactively trigger session-management (e.g., after modifying `agent.messages` or `agent.state` directly)
+
+## Decision
+
+Add a low-level, explicit snapshot API as an alternative to automatic session-management. This enables preserving the exact state of an agent at a specific point and restoring it later — useful for evaluation frameworks, custom session management, and checkpoint/restore workflows.
+
+### API Changes
+
+```python
+class Snapshot:
+    type: str              # the type of data stored (e.g., "agent")
+    state: dict[str, Any]  # opaque; do not modify — format subject to change
+    metadata: dict         # user-provided data to be stored with the snapshot
+
+class Agent:
+    def save_snapshot(self, metadata: dict | None = None) -> Snapshot:
+        """Capture the current agent state as a snapshot."""
+        ...
+
+    def load_snapshot(self, snapshot: Snapshot) -> None:
+        """Restore agent state from a snapshot."""
+        ...
+```
+
+### Behavior
+
+Snapshots capture **agent state** (data), not **runtime behavior** (code):
+
+- **Agent State** — Data persisted as part of session-management: conversation messages, context, and other JSON-serializable data. This is what snapshots save and restore.
+- **Runtime Behavior** — Configuration that defines how the agent operates: model, tools, ConversationManager, etc. These are *not* included in snapshots and must be set separately when creating or restoring an agent.
+
+The intent is that anything stored or restored by session-management would be stored in a snapshot — so this proposal is *not* documenting or changing what is persisted, but rather providing an explicit way to do what session-management does automatically.
+
+### Contract
+
+- **`metadata`** — Caller-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data.
+- **`type` and `state`** — Strands-owned. These fields are managed internally and should be treated as opaque. The format of `state` is subject to change; do not modify or depend on its structure.
+- **Serialization** — Strands guarantees that `type` and `state` will only contain JSON-serializable values.
+
+### Future Concerns
+
+- Snapshotting for MultiAgent constructs: This proposal would 
+- Providing a storage API for snapshot CRUD operations (save to disk, database, etc.)
+- Providing APIs to customize serialization formats
+
+## Developer Experience
+
+### Evaluations via Rewind and Replay
+
+```python
+agent = Agent(tools=[tool1, tool2])
+snapshot = agent.save_snapshot()
+
+result1 = agent("What is the weather?")
+
+agent2 = Agent(tools=[tool3, tool4])
+agent2.load_snapshot(snapshot)
+result2 = agent2("What is the weather?")
+# Compare result1 and result2
+```
+
+### Advanced Context Management
+
+```python
+agent = Agent(conversation_manager=CompactingConversationManager())
+snapshot = agent.save_snapshot(metadata={"checkpoint": "before_long_task"})
+
+# ... later ...
+later_agent = Agent(conversation_manager=CompactingConversationManager())
+later_agent.load_snapshot(snapshot)
+```
+
+### Persisting Snapshots
+
+```python
+import json
+from dataclasses import asdict
+
+agent = Agent(tools=[tool1, tool2])
+agent("Remember that my favorite color is orange.")
+
+# Save to file
+snapshot = agent.save_snapshot(metadata={"user_id": "123"})
+with open("checkpoint.json", "w") as f:
+    json.dump(asdict(snapshot), f)
+
+# Later, restore from file
+with open("checkpoint.json", "r") as f:
+    data = json.load(f)
+snapshot = Snapshot(**data)
+
+agent = Agent(tools=[tool1, tool2])
+agent.load_snapshot(snapshot)
+agent("What is my favorite color?")  # "Your favorite color is orange."
+```
+
+### Edge cases
+
+Restoring runtime behavior (e.g., tools) is explicitly not supported:
+
+```python
+agent1 = Agent(tools=[tool1, tool2])
+snapshot = agent1.save_snapshot()
+agent_no = Agent(snapshot)  # tools are NOT restored
+```
+
+## Consequences
+
+**Easier:**
+- Building evaluation frameworks with rewind/replay capabilities
+- Implementing custom session management strategies
+- Creating checkpoints during long-running agent tasks
+- Cloning agents (load the same snapshot into multiple agent instances)
+- Resetting agents to a known state (we do this manually for Graphs)
+
+**More difficult:**
+- N/A — this is an additive API
+
+## Willingness to Implement
+
+Yes

From 1f68c5a041a46219a4d77b9c84d500605965f8a4 Mon Sep 17 00:00:00 2001
From: Mackenzie Zastrow <zastrowm@users.noreply.github.com>
Date: Wed, 28 Jan 2026 16:33:26 -0500
Subject: [PATCH 2/4] Update file naming

---
 designs/snapshot-api.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/designs/snapshot-api.md b/designs/snapshot-api.md
index 8424a086..46277c48 100644
--- a/designs/snapshot-api.md
+++ b/designs/snapshot-api.md
@@ -95,11 +95,11 @@ agent("Remember that my favorite color is orange.")
 
 # Save to file
 snapshot = agent.save_snapshot(metadata={"user_id": "123"})
-with open("checkpoint.json", "w") as f:
+with open("snapshot.json", "w") as f:
     json.dump(asdict(snapshot), f)
 
 # Later, restore from file
-with open("checkpoint.json", "r") as f:
+with open("snapshot.json", "r") as f:
     data = json.load(f)
 snapshot = Snapshot(**data)
 

From b8268ace0fecba8d6858b14a098bb7aea4496df0 Mon Sep 17 00:00:00 2001
From: Mackenzie Zastrow <zastrowm@users.noreply.github.com>
Date: Thu, 29 Jan 2026 10:55:02 -0500
Subject: [PATCH 3/4] fix: clarify design

---
 designs/snapshot-api.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/designs/snapshot-api.md b/designs/snapshot-api.md
index 46277c48..32720825 100644
--- a/designs/snapshot-api.md
+++ b/designs/snapshot-api.md
@@ -47,13 +47,13 @@ The intent is that anything stored or restored by session-management would be st
 
 ### Contract
 
-- **`metadata`** — Caller-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data.
+- **`metadata`** — Application-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data without the need for a separate/standalone object/datastore.
 - **`type` and `state`** — Strands-owned. These fields are managed internally and should be treated as opaque. The format of `state` is subject to change; do not modify or depend on its structure.
 - **Serialization** — Strands guarantees that `type` and `state` will only contain JSON-serializable values.
 
 ### Future Concerns
 
-- Snapshotting for MultiAgent constructs: This proposal would 
+- Snapshotting for MultiAgent constructs - Snapshot is designed in a way that the snapshot could be reused for multi-agent with a similar api
 - Providing a storage API for snapshot CRUD operations (save to disk, database, etc.)
 - Providing APIs to customize serialization formats
 

From d1602a54a0a1e3ffd4eb8d51d65a4d31ed66bf45 Mon Sep 17 00:00:00 2001
From: Mackenzie Zastrow <zastrowm@users.noreply.github.com>
Date: Thu, 29 Jan 2026 14:40:20 -0500
Subject: [PATCH 4/4] Clarify proposed api/behavior

---
 designs/snapshot-api.md | 51 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 45 insertions(+), 6 deletions(-)

diff --git a/designs/snapshot-api.md b/designs/snapshot-api.md
index 32720825..9db99e8b 100644
--- a/designs/snapshot-api.md
+++ b/designs/snapshot-api.md
@@ -6,6 +6,19 @@
 
 **Issue**: https://github.com/strands-agents/sdk-python/issues/1138
 
+## Motivation
+
+Today, developers who want to manually snapshot and restore agent state can *almost* do so by saving and loading these properties directly:
+
+- `Agent.messages` — the conversation history
+- `Agent.state` — custom application state
+- `Agent._interrupt_state` — internal state for interrupt handling
+- Conversation manager internal state — state held by the conversation manager (e.g., sliding window position)
+
+However, this approach is fragile: it requires knowledge of internal implementation details, and the set of properties may change between versions. This proposal introduces a stable, convenient API to accomplish the same thing without relying on internals.
+
+**This API does not change agent behavior** — it simply provides a clean way to serialize and restore the existing state that already exists on the agent.
+
 ## Context
 
 Developers need a way to preserve and restore the exact state of an agent at a specific point in time. The existing SessionManagement doesn't address this:
@@ -21,7 +34,7 @@ Add a low-level, explicit snapshot API as an alternative to automatic session-ma
 ### API Changes
 
 ```python
-class Snapshot:
+class Snapshot(TypedDict):
     type: str              # the type of data stored (e.g., "agent")
     state: dict[str, Any]  # opaque; do not modify — format subject to change
     metadata: dict         # user-provided data to be stored with the snapshot
@@ -67,10 +80,14 @@ snapshot = agent.save_snapshot()
 
 result1 = agent("What is the weather?")
 
+# ...
+
 agent2 = Agent(tools=[tool3, tool4])
 agent2.load_snapshot(snapshot)
 result2 = agent2("What is the weather?")
-# Compare result1 and result2
+# ... 
+# Human/manual evaluation if one outcome was better than the other
+# ...
 ```
 
 ### Advanced Context Management
@@ -88,7 +105,6 @@ later_agent.load_snapshot(snapshot)
 
 ```python
 import json
-from dataclasses import asdict
 
 agent = Agent(tools=[tool1, tool2])
 agent("Remember that my favorite color is orange.")
@@ -96,12 +112,11 @@ agent("Remember that my favorite color is orange.")
 # Save to file
 snapshot = agent.save_snapshot(metadata={"user_id": "123"})
 with open("snapshot.json", "w") as f:
-    json.dump(asdict(snapshot), f)
+    json.dump(snapshot, f)
 
 # Later, restore from file
 with open("snapshot.json", "r") as f:
-    data = json.load(f)
-snapshot = Snapshot(**data)
+    snapshot: Snapshot = json.load(f)
 
 agent = Agent(tools=[tool1, tool2])
 agent.load_snapshot(snapshot)
@@ -118,6 +133,30 @@ snapshot = agent1.save_snapshot()
 agent_no = Agent(snapshot)  # tools are NOT restored
 ```
 
+## Up for Debate
+
+### What state should be included in a snapshot?
+
+The current proposal includes:
+
+- **messages** — conversation history
+- **interrupt state** — internal state for paused/resumed interrupts
+- **agent state** — custom application state (`agent.state`)
+- **conversation manager state** — internal state of the conversation manager (but not the conversation manager itself)
+
+This draws a distinction between "evolving state" (data that changes as the agent runs) and "agent definition" (configuration that defines what the agent *is*):
+
+| Evolving State (snapshotted) | Agent Definition (not snapshotted) |
+|------------------------------|-----------------------------------|
+| messages | system_prompt |
+| interrupt state | tools |
+| agent state | model |
+| conversation manager state | conversation_manager |
+
+Further justification: these three properties are also what SessionManagement persists today, so this API aligns with existing behavior.
+
+**Open question:** Is this the right boundary? Are there other properties that should be considered "evolving state"?
+
 ## Consequences
 
 **Easier:**