Skip to content

Conversation

@ybdarrenwang
Copy link
Collaborator

Description

Introduces ToolSimulator framework for simulating realistic tool responses during agent evaluation without calling production APIs. Enables systematic testing of agents with API-based, Python function-based, and MCP-based tools through LLM-powered dynamic simulation.

Key capabilities:

  • Three simulation modes: Dynamic (LLM-generated), static (predefined responses), and mock (custom functions)
  • Shared state management across multiple tools via share_state_id for stateful testing scenarios
  • Decorator-based registration for function tools (@ToolSimulator.function_tool), MCP tools (@ToolSimulator.mcp_tool), and API tools (@ToolSimulator.api_tool)
  • Integration with Strands Evals workflow including Experiment, Case, and evaluators
  • Multi-agent support with tool simulation across sub-agents (agent-as-tool pattern)

Design principles:

  • Centralized registry for tool management and state tracking
  • Context-aware response generation using initial state descriptions and conversation history
  • Seamless integration with existing Strands tool decorator patterns
  • Comprehensive unit test coverage for all simulation modes

Related Issues

#93

Documentation PR

strands-agents/docs#500

Type of Change

New feature

Testing

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

import logging
import warnings
from datetime import datetime
from typing import Any, Callable, Dict, List, Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Python built-in collection types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest commit

# Store framework selection
self.framework = framework
# Store model configuration for creating internal agents
self.model_id = model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could keep it as model instead of model_id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit.

self.model_id = model

# Set custom prompts or use defaults
if function_tool_prompt is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with static loading so the code can be cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit.

Comment on lines 232 to 235
if state_registry:
self._state_registry = state_registry
elif self._state_registry is None:
self._state_registry = StateRegistry(max_tool_call_cache_size=max_tool_call_cache_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the same as :

self._state_registry = state_registry or StateRegistry(max_tool_call_cache_size=max_tool_call_cache_size)

If self._state_registry isn't initialized anywhere before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit.

raise ValueError(f"Tool '{tool_name}' not registered")

# Handle different simulation modes
if registered_tool.mode == "static":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we support 3.10+, let's use match instead?

callback_handler=None,
)
result = agent(prompt, structured_output_model=MCPToolResponse)
if result.structured_output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check required? And is the dump required?

Strands-agent should handle this logic: https://github.com/strands-agents/sdk-python/blob/7db79bbeb53847006c1b6caad84dc6862e836477/src/strands/agent/agent_result.py#L52-L53

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic at agent_result.py#L52-L53 seems to handle cases when the agent could return structured output or NL output (and prioritize the former if present). I think it's different from our case cuz for Tool Simulator we mandate the response to follow the predetermined format, otherwise it's seen as a failed simulation.

But you're right that the check and dump can be simplified given agent_result's support.


# Parse JSON response for function tools since they vary based on function signature
response_text = (
getattr(result, "response", None) or str(result.content) if hasattr(result, "content") else str(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class of the result should be AgentResult, which you can check message.content? It should be possible to simplify this logic a bit.

https://github.com/strands-agents/sdk-python/blob/7db79bbeb53847006c1b6caad84dc6862e836477/src/strands/agent/agent_result.py#L32

previous_responses=json.dumps(current_state, indent=2) or "{}",
)

# Create agent and generate response with structured output
Copy link
Contributor

@poshinchen poshinchen Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create agent and generate response with structured output
It says structured_output but below it sets structured_output_model=None. Is it expected? Probably this is the reason why you need json loads below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For function tool we don't have specific data model, unlike MCP and API tools. That's why for function tools we simply attempt to parse the text response into json dict. (I tried dynamically create base class based on python function signature, but the codes seems long and error prone to me.)


prompt = self.function_tool_prompt.format(
tool_name=tool_name,
parameters=json.dumps(parameters, indent=2) if parameters else "{}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if parameters else "{}", seems to be redundant.

tool_name=tool_name,
parameters=json.dumps(parameters, indent=2) if parameters else "{}",
initial_state_description=initial_state_description,
previous_responses=json.dumps(current_state, indent=2) or "{}",
Copy link
Contributor

@poshinchen poshinchen Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "{}" seems to be redundant


prompt = self.mcp_tool_prompt.format(
tool_name=tool_name,
mcp_payload=json.dumps(input_mcp_payload, indent=2) if input_mcp_payload else "{}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if input_mcp_payload else "{}", is redundant.

tool_name=tool_name,
mcp_payload=json.dumps(input_mcp_payload, indent=2) if input_mcp_payload else "{}",
initial_state_description=initial_state_description,
previous_responses=json.dumps(current_state, indent=2) or "{}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

logger.error(f"Error generating MCP response: {e}")
return {"isError": True, "content": [{"type": "text", "text": f"Error generating response: {str(e)}"}]}

def _handle_api_tool(self, input_data: Dict[str, Any], state_key: str) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments here from previous functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants