Tool Simulation
Overview
Section titled “Overview”Tool simulation enables controlled agent evaluation by replacing real tool execution with LLM-powered responses. Using the ToolSimulator class, you register tools with a decorator, define output schemas, and optionally share state across related tools. When the agent calls a simulated tool, an LLM generates a realistic, schema-validated response instead of executing the real function.
This is useful when:
- Real tools require live infrastructure (APIs, databases, hardware)
- You need controllable tool behavior for evaluation
- You want to test agent tool-use patterns without side effects
- Tools are still under development or unavailable in the test environment
from typing import Anyfrom pydantic import BaseModel, Fieldfrom strands import Agentfrom strands_evals.simulation.tool_simulator import ToolSimulator
tool_simulator = ToolSimulator()
class WeatherResponse(BaseModel): temperature: float = Field(..., description="Temperature in Fahrenheit") conditions: str = Field(..., description="Weather conditions")
@tool_simulator.tool(output_schema=WeatherResponse)def get_weather(city: str) -> dict[str, Any]: """Get current weather for a city.""" pass
weather_tool = tool_simulator.get_tool("get_weather")agent = Agent(tools=[weather_tool], callback_handler=None)response = agent("What's the weather in Seattle?")How It Works
Section titled “How It Works”- Tool Registration: The
@tool_simulator.tool()decorator captures function metadata (name, docstring, type hints) via Strands’FunctionToolMetadata. The function body is never executed. - Simulation Wrapper: When retrieved via
get_tool(), the real function is replaced with an LLM-backed wrapper that can be passed to a StrandsAgent. - LLM Invocation: On each call, the wrapper builds a prompt containing the tool’s input schema, output schema, user parameters, and current state context, then invokes an Agent to generate a response.
- State Tracking: A
StateRegistryrecords call history and shared state across tools, providing the LLM with context for consistent responses.
Basic Usage
Section titled “Basic Usage”Registering a Tool
Section titled “Registering a Tool”Define a function with type hints and a docstring, then decorate it with @tool_simulator.tool(). Provide an output_schema to control the response structure, and the tool can be retrived and passed to a Strands agent.
from typing import Anyfrom pydantic import BaseModel, Fieldfrom strands import Agentfrom strands_evals.simulation.tool_simulator import ToolSimulator
tool_simulator = ToolSimulator()
class OrderStatus(BaseModel): order_id: str = Field(..., description="Order identifier") status: str = Field(..., description="Current order status") estimated_delivery: str = Field(..., description="Estimated delivery date")
@tool_simulator.tool(output_schema=OrderStatus)def check_order(order_id: str) -> dict[str, Any]: """Check the current status of a customer order.""" pass
order_tool = tool_simulator.get_tool("check_order")agent = Agent( system_prompt="You are a customer service assistant.", tools=[order_tool], callback_handler=None,)response = agent("Where is my order #12345?")Custom Tool Names
Section titled “Custom Tool Names”Override the default function name:
@tool_simulator.tool(name="lookup_order", output_schema=OrderStatus)def check_order(order_id: str) -> dict[str, Any]: """Check the current status of a customer order.""" pass
# Retrieved by custom nametool = tool_simulator.get_tool("lookup_order")Shared State
Section titled “Shared State”Tools that operate on the same environment can share state via share_state_id. When multiple tools share a state key, the LLM sees call history from all of them, enabling consistent behavior across related tools.
from enum import Enumfrom pydantic import BaseModel, Field
tool_simulator = ToolSimulator()
class HVACMode(str, Enum): HEAT = "heat" COOL = "cool" AUTO = "auto" OFF = "off"
class HVACResponse(BaseModel): temperature: float = Field(..., description="Target temperature in Fahrenheit") mode: HVACMode = Field(..., description="HVAC mode") status: str = Field(default="success", description="Operation status")
class SensorResponse(BaseModel): temperature: float = Field(..., description="Current temperature in Fahrenheit") humidity: float = Field(..., description="Current humidity percentage")
@tool_simulator.tool( share_state_id="room_environment", initial_state_description="Room environment: temperature 68F, humidity 45%, HVAC off", output_schema=HVACResponse,)def hvac_controller(temperature: float, mode: str) -> dict: """Control heating/cooling system that affects room temperature and humidity.""" pass
@tool_simulator.tool( share_state_id="room_environment", output_schema=SensorResponse,)def room_sensor() -> dict: """Read current room temperature and humidity.""" pass
# Both tools share the "room_environment" statehvac_tool = tool_simulator.get_tool("hvac_controller")sensor_tool = tool_simulator.get_tool("room_sensor")agent = Agent(tools=[hvac_tool, sensor_tool], callback_handler=None)Initial State Description
Section titled “Initial State Description”The initial_state_description parameter provides the LLM with baseline context about the environment. This is included in every prompt so the LLM can generate responses consistent with the starting conditions:
@tool_simulator.tool( initial_state_description="Database contains users: alice (admin), bob (viewer). No pending invitations.", output_schema=UserLookupResponse,)def lookup_user(username: str) -> dict: """Look up a user in the system.""" passIntegration with Experiments
Section titled “Integration with Experiments”Use ToolSimulator within an Experiment to evaluate agent tool-use behavior end-to-end:
from pydantic import BaseModel, Fieldfrom strands import Agentfrom strands_evals import Case, Experimentfrom strands_evals.evaluators import GoalSuccessRateEvaluatorfrom strands_evals.simulation.tool_simulator import ToolSimulatorfrom strands_evals.mappers import StrandsInMemorySessionMapperfrom strands_evals.telemetry import StrandsEvalsTelemetry
# Setup telemetrytelemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()memory_exporter = telemetry.in_memory_exportertool_simulator = ToolSimulator()
class HVACResponse(BaseModel): temperature: float = Field(..., description="Target temperature in Fahrenheit") mode: str = Field(..., description="HVAC mode") status: str = Field(default="success", description="Operation status")
@tool_simulator.tool( share_state_id="room_environment", initial_state_description="Room: 68F, humidity 45%, HVAC off", output_schema=HVACResponse,)def hvac_controller(temperature: float, mode: str) -> dict: """Control heating/cooling system.""" pass
def task_function(case: Case) -> dict: hvac_tool = tool_simulator.get_tool("hvac_controller") agent = Agent( trace_attributes={ "gen_ai.conversation.id": case.session_id, "session.id": case.session_id, }, system_prompt="You are an HVAC control assistant.", tools=[hvac_tool], callback_handler=None, ) response = agent(case.input)
spans = memory_exporter.get_finished_spans() mapper = StrandsInMemorySessionMapper() session = mapper.map_to_session(spans, session_id=case.session_id)
return {"output": str(response), "trajectory": session}
test_cases = [ Case(name="heat_control", input="Turn on the heat to 72 degrees"), Case(name="cool_down", input="It's too hot, cool the room to 65 degrees"),]
evaluators = [GoalSuccessRateEvaluator()]experiment = Experiment(cases=test_cases, evaluators=evaluators)reports = experiment.run_evaluations(task_function)reports[0].run_display()API Reference
Section titled “API Reference”ToolSimulator
Section titled “ToolSimulator”| Method | Description |
|---|---|
tool(output_schema, name, share_state_id, initial_state_description) | Decorator to register a simulated tool |
get_tool(tool_name) | Retrieve a simulation-wrapped tool by name |
get_state(state_key) | Get current state for a tool or shared state group |
list_tools() | List all registered tool names |
clear_tools() | Clear all registered tools |
StateRegistry
Section titled “StateRegistry”| Method | Description |
|---|---|
initialize_state_via_description(description, state_key) | Pre-seed state with context |
get_state(state_key) | Retrieve state dict for a tool or shared group |
cache_tool_call(tool_name, state_key, response_data, parameters) | Record a tool call |
clear_state(state_key) | Clear state for a specific key |
Data Models
Section titled “Data Models”RegisteredTool:
class RegisteredTool(BaseModel): name: str # Tool name function: Callable | None # Underlying DecoratedFunctionTool output_schema: type[BaseModel] | None # Pydantic output schema initial_state_description: str | None # Initial state context share_state_id: str | None # Shared state keyDefaultToolResponse:
class DefaultToolResponse(BaseModel): response: str # Default response when no output_schema is providedAdvanced Usage and Configurations
Section titled “Advanced Usage and Configurations”Inspecting State
Section titled “Inspecting State”Use get_state() to examine call history and initial state for debugging:
# Before agent invocationinitial_state = tool_simulator.get_state("room_environment")print(f"Initial state: {initial_state.get('initial_state')}")print(f"Previous calls: {initial_state.get('previous_calls', [])}")
# After agent invocationfinal_state = tool_simulator.get_state("room_environment")for call in final_state["previous_calls"]: print(f" {call['tool_name']}: {call['parameters']} -> {call['response']}")Each call record contains:
tool_name: Name of the tool that was calledparameters: The parameters passed to the toolresponse: The LLM-generated responsetimestamp: When the call was made
Configuration
Section titled “Configuration”Custom Model
Section titled “Custom Model”Specify a different model for simulation inference:
# Via model ID string (Bedrock)tool_simulator = ToolSimulator(model="anthropic.claude-3-5-sonnet-20241022-v2:0")
# Via Strands Model providerfrom strands.models import BedrockModel
model = BedrockModel(model_id="anthropic.claude-3-5-sonnet-20241022-v2:0")tool_simulator = ToolSimulator(model=model)Cache Size
Section titled “Cache Size”Control how many tool calls are retained per state key:
# Default: 20 calls per state keytool_simulator = ToolSimulator(max_tool_call_cache_size=20)
# Increase for long-running evaluationstool_simulator = ToolSimulator(max_tool_call_cache_size=50)When the cache is full, the oldest calls are evicted (FIFO).
Custom State Registry
Section titled “Custom State Registry”Provide your own StateRegistry for advanced state management:
from strands_evals.simulation.tool_simulator import StateRegistry, ToolSimulator
registry = StateRegistry(max_tool_call_cache_size=100)tool_simulator = ToolSimulator(state_registry=registry)Running Independent Simulator Instances
Section titled “Running Independent Simulator Instances”You can create multiple ToolSimulator instances side by side. Each instance maintains its own tool registry and state, so you can run parallel experiment configurations in the same codebase:
simulator_a = ToolSimulator()simulator_b = ToolSimulator()
# Each instance has an independent tool registry and state --# ideal for comparing agent behavior across different tool setups.This is useful when you want to A/B test different tool configurations, output schemas, or initial state descriptions against the same agent.
Seeding State from Real Data
Section titled “Seeding State from Real Data”Because initial_state_description accepts natural language, you can get creative with how you seed context. For tools that interact with tabular data, use a DataFrame.describe() call to generate statistical summaries and pass those statistics directly as the state description. ToolSimulator will generate responses that reflect realistic data distributions, without ever accessing the actual data:
import pandas as pd
df = pd.read_csv("sales_data.csv")stats_summary = df.describe().to_string()
@tool_simulator.tool( initial_state_description=f"Sales database statistics:\n{stats_summary}", output_schema=SalesQueryResponse,)def query_sales(region: str, quarter: str) -> dict: """Query sales data by region and quarter.""" passThis approach lets you ground simulated responses in real data characteristics while keeping the actual data out of the evaluation loop.
Troubleshooting
Section titled “Troubleshooting”Issue: Tool Not Found
Section titled “Issue: Tool Not Found”get_tool() returns None if the tool name doesn’t match:
tool = tool_simulator.get_tool("my_tool")if tool is None: print(f"Available tools: {tool_simulator.list_tools()}")Issue: Inconsistent Responses Across Calls
Section titled “Issue: Inconsistent Responses Across Calls”Ensure related tools share state and that initial state is set:
# Without shared state, each tool has independent context@tool_simulator.tool(share_state_id="shared_env", initial_state_description="...", output_schema=...)def tool_a(...): ...
@tool_simulator.tool(share_state_id="shared_env", output_schema=...)def tool_b(...): ...Issue: State Re-initialization Warning
Section titled “Issue: State Re-initialization Warning”If you see a warning about state already being initialized, it means two tools with the same share_state_id both provide initial_state_description. Only the first one takes effect:
# First tool initializes state@tool_simulator.tool( share_state_id="env", initial_state_description="Starting state", # This takes effect output_schema=...,)def tool_a(...): ...
# Second tool's initial_state_description is ignored with a warning@tool_simulator.tool( share_state_id="env", initial_state_description="Different state", # Ignored output_schema=...,)def tool_b(...): ...Related Documentation
Section titled “Related Documentation”- Simulators Overview: Overview of the simulator framework
- User Simulation: Simulate multi-turn user conversations
- Quickstart Guide: Get started with Strands Evals
- Goal Success Rate Evaluator: Assess goal completion