Propose low-level snapshot api #497

zastrowm · 2026-01-28T21:18:18Z

Description

Unblock customers who would like to explicitly control snapshots of an agent. Related issue: strands-agents/sdk-python#1138

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

strands-agent · 2026-01-28T21:22:55Z

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/

JackYPCOnline · 2026-01-28T21:27:05Z

designs/snapshot-api.md

+
+# Save to file
+snapshot = agent.save_snapshot(metadata={"user_id": "123"})
+with open("checkpoint.json", "w") as f:


We should keep our terminology consistent. I'm fine using checkpoint.json to match yours, but would you be open to snapshot.json instead?

Snapshot: Replica of the serialized version of an agent / multi-agent

Transcript: Append-only historical Message history (Extendable in future)

Checkpoint: Abstract concept represents the lifecycle moment where we create a Snapshot artifact

Yep; this should have been snapshot.json; updated.

In this case the filename is just an example and would be caller determined; but the example should match the api name so updated to snapshot.json

strands-agent · 2026-01-28T21:38:13Z

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/

Unshure · 2026-01-29T15:32:58Z

designs/snapshot-api.md

+
+### Future Concerns
+
+- Snapshotting for MultiAgent constructs: This proposal would 


Seems like this line is unfinished

Unshure · 2026-01-29T15:36:03Z

designs/snapshot-api.md

+
+### Behavior
+
+Snapshots capture **agent state** (data), not **runtime behavior** (code):


I agree with this, it is not easy to capture and persist code, and I dont think strands should try to do this.

However, we should explore how one would restore an agent from a snapshot, and load lets say tools back into the agent after persisting it. I would like to see an example devex of what this looks like.

I view the tool state as a feature that we'd be adding to the agent to make "enabled" tools into a state on the agent. So, if we had that I imagine it would be something like:

agent = Agent(tools=[tool1, tool2, tool3, tool4], enabled_tools=["tool1"])

Where only tool1 would be enabled/available on the agent. Then to enable other tools something would eventually trigger:

agent.enabled_tools = ["tool1", "tool3"]

and for restoring an agent with specific tools, it would be the same as

agent2 = Agent(tools=[tool3, tool4]) agent2.load_snapshot(snapshot)

and the snapshot would be restoring the enabled_tools state back into the agent.

+1 the term "snapshot" makes me think of a disk snapshot - literally everything. I would like to see this incorporate tools etc in the future.

Unshure · 2026-01-29T15:37:04Z

designs/snapshot-api.md

+
+### Contract
+
+- **`metadata`** — Caller-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data.


I guess this could be used to store metadata about certain tools that were attached to an agent before persisting, and then loading those tools back?

It could if the application wanted to do that; it could also be date/time, a "name" for the snapshot, or other application specific metadata.

The intent here is to allow applications to include data that strands isn't managing. So that if they chose to just serialize the session to disk, they wouldn't - for example - need to store another file for metadata associated with it.

strands-agent · 2026-01-29T15:59:39Z

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/

mkmeral · 2026-01-29T17:41:00Z

designs/snapshot-api.md

+### API Changes
+
+```python
+class Snapshot:


should we have this as TypedDict so that it can be serialized easily? I saw in DevX that we need to explicitly call asdict

Either TypedDict or implementing json serialization explicitly yeah; will update the example(s)

mkmeral · 2026-01-29T17:44:04Z

designs/snapshot-api.md

+Snapshots capture **agent state** (data), not **runtime behavior** (code):
+
+- **Agent State** — Data persisted as part of session-management: conversation messages, context, and other JSON-serializable data. This is what snapshots save and restore.
+- **Runtime Behavior** — Configuration that defines how the agent operates: model, tools, ConversationManager, etc. These are *not* included in snapshots and must be set separately when creating or restoring an agent.


Do we allow these components to expose "snapshot-able data"? e.g. I am a conv manager developer, I want my data to be restored with snapshots

What's the recommendation? Keeping that data in agent state?

What's the recommendation? Keeping that data in agent state?

Yeah; the recommendation is agent state

Do we allow these components to expose "snapshot-able data"?

It should be Agent State (AgentState directly; or if we're missing something, an equivalent thereof). The idea that I'm trying to get across in this section is "Snapshots do not represent anything other than what already exists in agent state/session-management, it just provides a more direct api to control it".

mkmeral · 2026-01-29T17:53:17Z

designs/snapshot-api.md

+agent2 = Agent(tools=[tool3, tool4])
+agent2.load_snapshot(snapshot)
+result2 = agent2("What is the weather?")
+# Compare result1 and result2


I could use couple more sentences here to see what the expectation is, and maybe do we want to enforce tools?

For example, if you restore a list of messages with a toolset of (1,2) to an agent with toolset of (3,4); you are a lot more likely to get hallucinations. The agent tries to follow the examples in message history, as you essentially turn your context into "few-shot"

I'll update it to better illustrate that it's evaluating the result of result1 and result2.

and maybe do we want to enforce tools?

...if you restore a list of messages with a toolset of (1,2) to an agent with toolset of (3,4); you are a lot more likely to get hallucinations.

ACK that this is a concern, but IMO this is not the goal of the snapshot api. Snapshots are intended to only save/load the agent state - transformation of state or normalizing would be something that could be built on top of the low level primitive. If I "resume" an agent from a snapshot or a session-management, it shouldn't be doing any conversion/munging of behavior on the way in or out.

strands-agent · 2026-01-29T19:45:07Z

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/

Unshure · 2026-01-29T20:31:31Z

designs/snapshot-api.md

+
+Further justification: these three properties are also what SessionManagement persists today, so this API aligns with existing behavior.
+
+**Open question:** Is this the right boundary? Are there other properties that should be considered "evolving state"?


This might be the right boundary, but I want to understand the devex a bit more for restoring "Agent Definition" after loading a snapshot. I get you can do this:

agent = Agent(tools=[tool1, tool2]) snapshot = agent.save_snapshot() result = agent("What is the weather?") # ... agent = Agent(tools=[tool1, tool2]) agent.load_snapshot(snapshot)

Im thinking about defining custom serializers and deserializers you can pass into save_snapshot and load_snapshot, but I guess that doesnt really make sense since the customer can do that themselves anyway today like this:

snapshot = custom_serializer(agent) agent = custom_deserializer(agent)

Maybe this is where AgentConfig comes in to save the day?

agent = Agent.from_config(config) agent.load_snapshot(snapshot)

cagataycali · 2026-02-02T18:05:10Z

designs/snapshot-api.md

+
+```python
+class Snapshot(TypedDict):
+    type: str              # the type of data stored (e.g., "agent")


i'd want to see a timestamp of snapshot so we can go back in time.

+1, Probably it can be in the sate / metadata.

I notice that otel trace / span properties are good to have in many cases. I wonder if some of them can be added into state or metadata.

mkmeral · 2026-02-02T18:06:44Z

designs/snapshot-api.md

+    state: dict[str, Any]  # opaque; do not modify — format subject to change
+    metadata: dict         # user-provided data to be stored with the snapshot
+
+class Agent:


Do we want specific agent methods, or Snapshottable (wip name) interface?

Having an interface like below, would make implementation a lot easier. And it would allow us to extend to other types (multi-agent, etc)

Snapshottable: def save(): def load():

This is similar to @JackYPCOnline 's idea on SessionAwarebut different naming pretty much

afarntrog · 2026-02-02T18:06:55Z

designs/snapshot-api.md

+    def save_snapshot(self, metadata: dict | None = None) -> Snapshot:
+        """Capture the current agent state as a snapshot."""
+        ...
+
+    def load_snapshot(self, snapshot: Snapshot) -> None:
+        """Restore agent state from a snapshot."""
+        ...


Great idea! I would consider moving these methods to their own class and inject it in the agent class agent.snapshot.load()

I am picky on naming, what it actually does is to_dict() and from_dict(), we are not saving anything

cagataycali · 2026-02-02T18:07:00Z

designs/snapshot-api.md

+## Consequences
+
+**Easier:**
+- Building evaluation frameworks with rewind/replay capabilities


seems like timestamps are missing for rewind ^^

afarntrog · 2026-02-02T18:09:39Z

designs/snapshot-api.md

+agent2 = Agent(tools=[tool3, tool4])
+agent2.load_snapshot(snapshot)


We should consider allowing for passing in the snapshot in the Agent init (as well)

mkmeral · 2026-02-02T18:11:06Z

designs/snapshot-api.md

+- **agent state** — custom application state (`agent.state`)
+- **conversation manager state** — internal state of the conversation manager (but not the conversation manager itself)
+
+This draws a distinction between "evolving state" (data that changes as the agent runs) and "agent definition" (configuration that defines what the agent *is*):


I'd argue system prompt is also evolving state. You can inject context at runtime to update system prompt, same with tools and conversation manager to be honest.

Thinking further, I don't think I like this distinction. Given meta-agents, skills, context maangement, etc. I'd argue this distinction is pretty blurry

I think we should snapshot all the info that we list below.

I'd do something like agent code is the superset, and snapshot is current data.

E.g. You might have tools in python definition of the code

Agent(tools=[a,b,c,d,e])

but your snapshot might include

tools: [d,e,f]

then the loaded agent should be tools=[d,e] (f would load because we dont know the definition)

afarntrog · 2026-02-02T18:13:15Z

designs/snapshot-api.md

+with open("snapshot.json", "w") as f:
+    json.dump(snapshot, f)


I think we can have these as methods or as different snapshot implementations such as S3Snapshot etc and then users call snapshot.persist()

dbschmigelski · 2026-02-02T18:09:39Z

designs/snapshot-api.md

+
+- **`metadata`** — Application-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data without the need for a separate/standalone object/datastore.


nit: naming is hard but I think describing this as metadata is a stretch. If it is just store checkpoint labels, timestamps I agree but the any application-specific data part seems like this is broader

dbschmigelski · 2026-02-02T18:11:16Z

designs/snapshot-api.md

+Snapshots capture **agent state** (data), not **runtime behavior** (code):
+
+- **Agent State** — Data persisted as part of session-management: conversation messages, context, and other JSON-serializable data. This is what snapshots save and restore.
+- **Runtime Behavior** — Configuration that defines how the agent operates: model, tools, ConversationManager, etc. These are *not* included in snapshots and must be set separately when creating or restoring an agent.


Are we saying that we don't believe these things should be part of a snapshot or are we just saying that we are not trying to expand the scoep by limiting to the current capabilities of Session Management.

For example I could see the following being important

models: what happens if I change this as the agent runs

tools: what happens if I add a tool and want to revert back to a time when I did not

dbschmigelski · 2026-02-02T18:13:10Z

designs/snapshot-api.md

+# ...
+
+agent2 = Agent(tools=[tool3, tool4])
+agent2.load_snapshot(snapshot)


if this is the intended flow where we are creating a new instance, did you consider pros/cons of acting on the constructor?

Agent(snapshot=...)

dbschmigelski · 2026-02-02T18:15:38Z

designs/snapshot-api.md

+
+This draws a distinction between "evolving state" (data that changes as the agent runs) and "agent definition" (configuration that defines what the agent *is*):
+
+| Evolving State (snapshotted) | Agent Definition (not snapshotted) |


Mentioned above, but I think this makes the assumption that all of these things are static when we have seen dynamic use cases for each of these (excluding conversation manager)

poshinchen · 2026-02-02T18:18:46Z

designs/snapshot-api.md

+
+```python
+agent = Agent(tools=[tool1, tool2])
+snapshot = agent.save_snapshot()


QQ: what happens if an user calls save_snapshot() twice?

mehtarac · 2026-02-02T18:19:06Z

designs/snapshot-api.md

+agent = Agent(tools=[tool1, tool2])
+snapshot = agent.save_snapshot()
+
+result1 = agent("What is the weather?")
+
+# ...
+
+agent2 = Agent(tools=[tool3, tool4])


I'm a little unclear here. agent has tools tool1 and tool2. agent2, is initialized with tool3 and tool4. When agent2 loads the snapshot, does it have all the tools or just the tool1 and tool2

pgrayy · 2026-02-02T18:07:58Z

designs/snapshot-api.md

+Developers need a way to preserve and restore the exact state of an agent at a specific point in time. The existing SessionManagement doesn't address this:
+
+- SessionManager works in the background, incrementally recording messages rather than full state. This means it's not possible to restore to arbitrary points in time.
+- After a message is saved, there is no way to modify it and have it recorded in session-management, preventing more advance context-management strategies while being able to pause & restore agents.


Do we have a use case for wanting to modify a past message?

pgrayy · 2026-02-02T18:13:38Z

designs/snapshot-api.md

+Snapshots capture **agent state** (data), not **runtime behavior** (code):
+
+- **Agent State** — Data persisted as part of session-management: conversation messages, context, and other JSON-serializable data. This is what snapshots save and restore.
+- **Runtime Behavior** — Configuration that defines how the agent operates: model, tools, ConversationManager, etc. These are *not* included in snapshots and must be set separately when creating or restoring an agent.


So does that mean a configuration like model_id wouldn't be stored in the snapshot? Is there a specific reason why? May have missed it.

pgrayy · 2026-02-02T18:17:01Z

designs/snapshot-api.md

+# ...
+
+agent2 = Agent(tools=[tool3, tool4])
+agent2.load_snapshot(snapshot)


Would we want to give customers the option to override specific data in a snapshot? So keep most things the same but try tweaking one value to see how the agent behaves.

pgrayy · 2026-02-02T18:19:12Z

designs/snapshot-api.md

+- Providing a storage API for snapshot CRUD operations (save to disk, database, etc.)
+- Providing APIs to customize serialization formats
+
+## Developer Experience


Do we want to allow users to snapshot in hooks on life cycle events?

cagataycali · 2026-02-02T18:21:33Z

designs/snapshot-api.md

+
+```python
+class Snapshot(TypedDict):
+    type: str              # the type of data stored (e.g., "agent")


Adding SHA is also crucial here ^^

poshinchen · 2026-02-02T18:26:56Z

designs/snapshot-api.md

+- **`type` and `state`** — Strands-owned. These fields are managed internally and should be treated as opaque. The format of `state` is subject to change; do not modify or depend on its structure.
+- **Serialization** — Strands guarantees that `type` and `state` will only contain JSON-serializable values.
+
+### Future Concerns


Good to have: We can enable a traces for snapshot actions

dbschmigelski · 2026-02-02T18:48:46Z

designs/snapshot-api.md

+|------------------------------|-----------------------------------|
+| messages | system_prompt |
+| interrupt state | tools |
+| agent state | model |


note for if we do model, we should be persisting model configuration probably be a ModelSnapshot not an str

design: Propose low-level snapshot api

1f01d1a

zastrowm temporarily deployed to auto-approve January 28, 2026 21:18 — with GitHub Actions Inactive

JackYPCOnline reviewed Jan 28, 2026

View reviewed changes

Update file naming

1f68c5a

zastrowm temporarily deployed to auto-approve January 28, 2026 21:33 — with GitHub Actions Inactive

zastrowm mentioned this pull request Jan 28, 2026

[FEATURE] Agent State Management - Snapshot, Pause, and Resume strands-agents/sdk-python#1138

Open

JackYPCOnline previously approved these changes Jan 28, 2026

View reviewed changes

Unshure reviewed Jan 29, 2026

View reviewed changes

fix: clarify design

b8268ac

zastrowm dismissed JackYPCOnline’s stale review via b8268ac January 29, 2026 15:55

zastrowm temporarily deployed to auto-approve January 29, 2026 15:55 — with GitHub Actions Inactive

mkmeral reviewed Jan 29, 2026

View reviewed changes

Clarify proposed api/behavior

d1602a5

zastrowm temporarily deployed to auto-approve January 29, 2026 19:40 — with GitHub Actions Inactive

Unshure reviewed Jan 29, 2026

View reviewed changes

cagataycali reviewed Feb 2, 2026

View reviewed changes

mkmeral reviewed Feb 2, 2026

View reviewed changes

afarntrog reviewed Feb 2, 2026

View reviewed changes

cagataycali reviewed Feb 2, 2026

View reviewed changes

afarntrog reviewed Feb 2, 2026

View reviewed changes

mkmeral reviewed Feb 2, 2026

View reviewed changes

afarntrog reviewed Feb 2, 2026

View reviewed changes

dbschmigelski reviewed Feb 2, 2026

View reviewed changes

poshinchen reviewed Feb 2, 2026

View reviewed changes

mehtarac reviewed Feb 2, 2026

View reviewed changes

pgrayy reviewed Feb 2, 2026

View reviewed changes

cagataycali reviewed Feb 2, 2026

View reviewed changes

poshinchen reviewed Feb 2, 2026

View reviewed changes

dbschmigelski reviewed Feb 2, 2026

View reviewed changes


		### Future Concerns

		- Snapshotting for MultiAgent constructs: This proposal would


		### Behavior

		Snapshots capture agent state (data), not runtime behavior (code):


		### Contract

		- `metadata` — Caller-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data.


		Further justification: these three properties are also what SessionManagement persists today, so this API aligns with existing behavior.

		Open question: Is this the right boundary? Are there other properties that should be considered "evolving state"?

		agent2 = Agent(tools=[tool3, tool4])
		agent2.load_snapshot(snapshot)


		- `metadata` — Application-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data without the need for a separate/standalone object/datastore.


		This draws a distinction between "evolving state" (data that changes as the agent runs) and "agent definition" (configuration that defines what the agent is):

		\| Evolving State (snapshotted) \| Agent Definition (not snapshotted) \|

Propose low-level snapshot api #497

Are you sure you want to change the base?

Propose low-level snapshot api #497

Conversation

zastrowm commented Jan 28, 2026

Description

Uh oh!

strands-agent commented Jan 28, 2026

Documentation Deployment Complete

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

strands-agent commented Jan 28, 2026

Documentation Deployment Complete

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

strands-agent commented Jan 29, 2026

Documentation Deployment Complete

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

strands-agent commented Jan 29, 2026

Documentation Deployment Complete

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackYPCOnline Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

poshinchen Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

JackYPCOnline Feb 2, 2026 •

edited

Loading

poshinchen Feb 2, 2026 •

edited

Loading

mehtarac Feb 2, 2026 •

edited

Loading