feat: add ToolSimulator for tool response simulation #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

ybdarrenwang wants to merge 14 commits into strands-agents:main from ybdarrenwang:feature/tool-simulator

Collaborator

ybdarrenwang commented Jan 29, 2026

Description

Introduces ToolSimulator framework for simulating realistic tool responses during agent evaluation without calling production APIs. Enables systematic testing of agents with API-based, Python function-based, and MCP-based tools through LLM-powered dynamic simulation.

Key capabilities:

Three simulation modes: Dynamic (LLM-generated), static (predefined responses), and mock (custom functions)
Shared state management across multiple tools via share_state_id for stateful testing scenarios
Decorator-based registration for function tools (@ToolSimulator.function_tool), MCP tools (@ToolSimulator.mcp_tool), and API tools (@ToolSimulator.api_tool)
Integration with Strands Evals workflow including Experiment, Case, and evaluators
Multi-agent support with tool simulation across sub-agents (agent-as-tool pattern)

Design principles:

Centralized registry for tool management and state tracking
Context-aware response generation using initial state descriptions and conversation history
Seamless integration with existing Strands tool decorator patterns
Comprehensive unit test coverage for all simulation modes

Related Issues

Documentation PR

strands-agents/docs#500

Type of Change

New feature

Testing

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

ybdarrenwang added 9 commits

January 23, 2026 22:33


          init tool simulator pr

86159c2


          fix tool init state registry and prompt

be24b11


          support static and mock modes

1d507f0


          unit test tool simulator

7b82fc8


          remove override and simplify pr

cd0365a


          replace llm call with agent; simplify error raise

4d57a53


          refactor and address mypy errors

15d3fcd


          fix tool simulator integration with strands tool decorator

9701bbe


          update test

e6702f5

ybdarrenwang requested a deployment to manual-approval

January 29, 2026 22:15

— with

GitHub Actions Waiting


          fix test

367c637

ybdarrenwang requested a deployment to manual-approval

January 29, 2026 23:52

— with

GitHub Actions Waiting

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+              import logging
+              import warnings
+              from datetime import datetime
+              from typing import Any, Callable, Dict, List, Optional

Contributor

poshinchen Feb 2, 2026

Can you use Python built-in collection types?

Collaborator Author

ybdarrenwang Feb 2, 2026

Done in the latest commit


          utilize built-in collection types; improve readability

087ce6e

ybdarrenwang requested a deployment to manual-approval

February 2, 2026 19:09

— with

GitHub Actions Waiting

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                      # Store framework selection
+                      self.framework = framework
+                      # Store model configuration for creating internal agents
+                      self.model_id = model

Contributor

poshinchen Feb 2, 2026

nit: we could keep it as model instead of model_id.

Collaborator Author

ybdarrenwang Feb 3, 2026

Addressed in the latest commit.

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                      self.model_id = model
+                      # Set custom prompts or use defaults
+                      if function_tool_prompt is None:

Contributor

poshinchen Feb 2, 2026

I'm fine with static loading so the code can be cleaner.

Collaborator Author

ybdarrenwang Feb 3, 2026

Addressed in the latest commit.

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

Comment on lines 232 to 235

+                      if state_registry:
+                          self._state_registry = state_registry
+                      elif self._state_registry is None:
+                          self._state_registry = StateRegistry(max_tool_call_cache_size=max_tool_call_cache_size)

Contributor

poshinchen Feb 2, 2026

It should be the same as :

self._state_registry = state_registry or StateRegistry(max_tool_call_cache_size=max_tool_call_cache_size)

If self._state_registry isn't initialized anywhere before.

Collaborator Author

ybdarrenwang Feb 3, 2026

Addressed in the latest commit.


          clean loading and init

c084aea

ybdarrenwang requested a deployment to manual-approval

February 3, 2026 17:46

— with

GitHub Actions Waiting

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                          raise ValueError(f"Tool '{tool_name}' not registered")
+                      # Handle different simulation modes
+                      if registered_tool.mode == "static":

Contributor

poshinchen Feb 3, 2026

Since we support 3.10+, let's use match instead?

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                              callback_handler=None,
+                          )
+                          result = agent(prompt, structured_output_model=MCPToolResponse)
+                          if result.structured_output:

Contributor

poshinchen Feb 3, 2026

Is this check required? And is the dump required?

Strands-agent should handle this logic: https://github.com/strands-agents/sdk-python/blob/7db79bbeb53847006c1b6caad84dc6862e836477/src/strands/agent/agent_result.py#L52-L53

Collaborator Author

ybdarrenwang Feb 3, 2026

The logic at agent_result.py#L52-L53 seems to handle cases when the agent could return structured output or NL output (and prioritize the former if present). I think it's different from our case cuz for Tool Simulator we mandate the response to follow the predetermined format, otherwise it's seen as a failed simulation.

But you're right that the check and dump can be simplified given agent_result's support.

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                          # Parse JSON response for function tools since they vary based on function signature
+                          response_text = (
+                              getattr(result, "response", None) or str(result.content) if hasattr(result, "content") else str(result)

Contributor

poshinchen Feb 3, 2026

The class of the result should be AgentResult, which you can check message.content? It should be possible to simplify this logic a bit.

https://github.com/strands-agents/sdk-python/blob/7db79bbeb53847006c1b6caad84dc6862e836477/src/strands/agent/agent_result.py#L32

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py

+                              previous_responses=json.dumps(current_state, indent=2) or "{}",
+                          )
+                          # Create agent and generate response with structured output

Contributor

poshinchen Feb 3, 2026 •

edited

Loading

Create agent and generate response with structured output
It says structured_output but below it sets structured_output_model=None. Is it expected? Probably this is the reason why you need json loads below?

Collaborator Author

ybdarrenwang Feb 3, 2026

For function tool we don't have specific data model, unlike MCP and API tools. That's why for function tools we simply attempt to parse the text response into json dict. (I tried dynamically create base class based on python function signature, but the codes seems long and error prone to me.)

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                          prompt = self.function_tool_prompt.format(
+                              tool_name=tool_name,
+                              parameters=json.dumps(parameters, indent=2) if parameters else "{}",

Contributor

poshinchen Feb 3, 2026

if parameters else "{}", seems to be redundant.

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                              tool_name=tool_name,
+                              parameters=json.dumps(parameters, indent=2) if parameters else "{}",
+                              initial_state_description=initial_state_description,
+                              previous_responses=json.dumps(current_state, indent=2) or "{}",

Contributor

poshinchen Feb 3, 2026 •

edited

Loading

or "{}" seems to be redundant

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                          prompt = self.mcp_tool_prompt.format(
+                              tool_name=tool_name,
+                              mcp_payload=json.dumps(input_mcp_payload, indent=2) if input_mcp_payload else "{}",

Contributor

poshinchen Feb 3, 2026

if input_mcp_payload else "{}", is redundant.

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                              tool_name=tool_name,
+                              mcp_payload=json.dumps(input_mcp_payload, indent=2) if input_mcp_payload else "{}",
+                              initial_state_description=initial_state_description,
+                              previous_responses=json.dumps(current_state, indent=2) or "{}",

Contributor

poshinchen Feb 3, 2026

same here

poshinchen reviewed

View reviewed changes

src/strands_evals/simulation/tool_simulator.py Outdated

+                          logger.error(f"Error generating MCP response: {e}")
+                          return {"isError": True, "content": [{"type": "text", "text": f"Error generating response: {str(e)}"}]}
+                  def _handle_api_tool(self, input_data: Dict[str, Any], state_key: str) -> Dict[str, Any]:

Contributor

poshinchen Feb 3, 2026

Same comments here from previous functions


          use python3.10 types and match case, simplify response validation

33c1b40

ybdarrenwang requested a deployment to manual-approval

February 3, 2026 23:42

— with

GitHub Actions Waiting


          fix forward referencing with Union

f8c5cd3

ybdarrenwang requested a deployment to manual-approval

February 3, 2026 23:54

— with

GitHub Actions Waiting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet