-
Notifications
You must be signed in to change notification settings - Fork 15
feat: add ToolSimulator for tool response simulation #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add ToolSimulator for tool response simulation #111
Conversation
| import logging | ||
| import warnings | ||
| from datetime import datetime | ||
| from typing import Any, Callable, Dict, List, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use Python built-in collection types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the latest commit
| # Store framework selection | ||
| self.framework = framework | ||
| # Store model configuration for creating internal agents | ||
| self.model_id = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could keep it as model instead of model_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest commit.
| self.model_id = model | ||
|
|
||
| # Set custom prompts or use defaults | ||
| if function_tool_prompt is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with static loading so the code can be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest commit.
| if state_registry: | ||
| self._state_registry = state_registry | ||
| elif self._state_registry is None: | ||
| self._state_registry = StateRegistry(max_tool_call_cache_size=max_tool_call_cache_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the same as :
self._state_registry = state_registry or StateRegistry(max_tool_call_cache_size=max_tool_call_cache_size)
If self._state_registry isn't initialized anywhere before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest commit.
| raise ValueError(f"Tool '{tool_name}' not registered") | ||
|
|
||
| # Handle different simulation modes | ||
| if registered_tool.mode == "static": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we support 3.10+, let's use match instead?
| callback_handler=None, | ||
| ) | ||
| result = agent(prompt, structured_output_model=MCPToolResponse) | ||
| if result.structured_output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check required? And is the dump required?
Strands-agent should handle this logic: https://github.com/strands-agents/sdk-python/blob/7db79bbeb53847006c1b6caad84dc6862e836477/src/strands/agent/agent_result.py#L52-L53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic at agent_result.py#L52-L53 seems to handle cases when the agent could return structured output or NL output (and prioritize the former if present). I think it's different from our case cuz for Tool Simulator we mandate the response to follow the predetermined format, otherwise it's seen as a failed simulation.
But you're right that the check and dump can be simplified given agent_result's support.
|
|
||
| # Parse JSON response for function tools since they vary based on function signature | ||
| response_text = ( | ||
| getattr(result, "response", None) or str(result.content) if hasattr(result, "content") else str(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class of the result should be AgentResult, which you can check message.content? It should be possible to simplify this logic a bit.
| previous_responses=json.dumps(current_state, indent=2) or "{}", | ||
| ) | ||
|
|
||
| # Create agent and generate response with structured output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create agent and generate response with structured output
It says structured_output but below it sets structured_output_model=None. Is it expected? Probably this is the reason why you need json loads below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For function tool we don't have specific data model, unlike MCP and API tools. That's why for function tools we simply attempt to parse the text response into json dict. (I tried dynamically create base class based on python function signature, but the codes seems long and error prone to me.)
|
|
||
| prompt = self.function_tool_prompt.format( | ||
| tool_name=tool_name, | ||
| parameters=json.dumps(parameters, indent=2) if parameters else "{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if parameters else "{}", seems to be redundant.
| tool_name=tool_name, | ||
| parameters=json.dumps(parameters, indent=2) if parameters else "{}", | ||
| initial_state_description=initial_state_description, | ||
| previous_responses=json.dumps(current_state, indent=2) or "{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "{}" seems to be redundant
|
|
||
| prompt = self.mcp_tool_prompt.format( | ||
| tool_name=tool_name, | ||
| mcp_payload=json.dumps(input_mcp_payload, indent=2) if input_mcp_payload else "{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if input_mcp_payload else "{}", is redundant.
| tool_name=tool_name, | ||
| mcp_payload=json.dumps(input_mcp_payload, indent=2) if input_mcp_payload else "{}", | ||
| initial_state_description=initial_state_description, | ||
| previous_responses=json.dumps(current_state, indent=2) or "{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| logger.error(f"Error generating MCP response: {e}") | ||
| return {"isError": True, "content": [{"type": "text", "text": f"Error generating response: {str(e)}"}]} | ||
|
|
||
| def _handle_api_tool(self, input_data: Dict[str, Any], state_key: str) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments here from previous functions
Description
Introduces ToolSimulator framework for simulating realistic tool responses during agent evaluation without calling production APIs. Enables systematic testing of agents with API-based, Python function-based, and MCP-based tools through LLM-powered dynamic simulation.
Key capabilities:
Design principles:
Related Issues
#93
Documentation PR
strands-agents/docs#500
Type of Change
New feature
Testing
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.