Deterministic Evaluators
Overview
Section titled “Overview”Deterministic evaluators provide fast, code-based evaluation without LLM judges. They perform exact checks on outputs, trajectories, and environment state — making them ideal for regression testing, CI/CD pipelines, and cases where evaluation criteria are objective and well-defined.
Key Features
Section titled “Key Features”- No LLM Required: Pure code-based evaluation — fast and free
- Deterministic Results: Same input always produces the same score
- Multiple Check Types: Output matching, tool call verification, and state comparison
- Async Support: All evaluators support both sync and async evaluation
Available Evaluators
Section titled “Available Evaluators”Output Evaluators
Section titled “Output Evaluators”Equals
Section titled “Equals”Checks if actual_output equals an expected value.
from strands_evals.evaluators import Equals
# Compare against explicit valueevaluator = Equals(value="Paris")
# Or compare against case's expected_output (when value is None)evaluator = Equals()Parameters:
value(optional): The expected value. IfNone, usesexpected_outputfrom the evaluation case.
Contains
Section titled “Contains”Checks if actual_output contains a substring.
from strands_evals.evaluators import Contains
evaluator = Contains(value="Paris", case_sensitive=False)Parameters:
value(required): The substring to search for.case_sensitive(optional, defaultTrue): Whether the check is case-sensitive.
StartsWith
Section titled “StartsWith”Checks if actual_output starts with a prefix.
from strands_evals.evaluators import StartsWith
evaluator = StartsWith(value="The capital", case_sensitive=False)Parameters:
value(required): The prefix to check.case_sensitive(optional, defaultTrue): Whether the check is case-sensitive.
Trajectory Evaluators
Section titled “Trajectory Evaluators”ToolCalled
Section titled “ToolCalled”Checks if a specific tool was called in the trajectory. Works with both list-based trajectories and Session objects.
from strands_evals.evaluators import ToolCalled
evaluator = ToolCalled(tool_name="calculator")Parameters:
tool_name(required): Name of the tool to check for.
Environment State Evaluators
Section titled “Environment State Evaluators”StateEquals
Section titled “StateEquals”Checks if a named environment state matches an expected value. Useful for verifying that tool-using agents produce the correct side effects.
from strands_evals.evaluators import StateEquals
# Compare against explicit valueevaluator = StateEquals(name="temperature", value=72.0)
# Or compare against case's expected_environment_stateevaluator = StateEquals(name="temperature")Parameters:
name(required): Name of the environment state to check.value(optional): Expected value. IfNone, usesexpected_environment_statefrom the evaluation case.
Usage Example
Section titled “Usage Example”from strands import Agentfrom strands_evals import Case, Experimentfrom strands_evals.evaluators import Equals, Contains, ToolCalled
cases = [ Case( name="capital-check", input="What is the capital of France?", expected_output="Paris" )]
# Combine multiple deterministic checksevaluators = [ Contains(value="Paris", case_sensitive=False), Contains(value="France", case_sensitive=False),]
def get_response(case: Case) -> str: agent = Agent(callback_handler=None) return str(agent(case.input))
experiment = Experiment(cases=cases, evaluators=evaluators)reports = experiment.run_evaluations(get_response)reports[0].run_display()Combining with LLM Evaluators
Section titled “Combining with LLM Evaluators”Deterministic evaluators work well alongside LLM-based evaluators for comprehensive assessment:
from strands_evals.evaluators import Contains, HelpfulnessEvaluator, CorrectnessEvaluator
evaluators = [ Contains(value="Paris"), # Fast deterministic check CorrectnessEvaluator(), # LLM-based correctness HelpfulnessEvaluator(), # LLM-based helpfulness]Best Practices
Section titled “Best Practices”- Use for regression testing: Deterministic evaluators are ideal for CI/CD since they’re fast and don’t require API calls
- Combine with LLM evaluators: Use deterministic checks as a first pass, then LLM evaluators for nuanced assessment
- Case sensitivity: Use
case_sensitive=Falsewhen exact casing doesn’t matter - State verification: Use
StateEqualswhen your agent modifies external state through tools
Related Evaluators
Section titled “Related Evaluators”- OutputEvaluator: LLM-based output evaluation with custom rubrics
- TrajectoryEvaluator: LLM-based trajectory evaluation
- CustomEvaluator: Build your own evaluation logic