Skip to content

Response Relevance Evaluator

The ResponseRelevanceEvaluator evaluates whether an agent’s response is relevant to the user’s question. It assesses if the response addresses what was actually asked, rather than going off-topic or providing unrelated information.

  • Trace-Level Evaluation: Evaluates the most recent turn in the conversation
  • Five-Level Scoring: Granular scale from “Not At All” to “Completely Yes”
  • Async Support: Supports both synchronous and asynchronous evaluation
  • Structured Reasoning: Provides step-by-step reasoning for each evaluation

Use the ResponseRelevanceEvaluator when you need to:

  • Detect off-topic or tangential responses
  • Ensure agents stay focused on the user’s question
  • Identify cases where agents misinterpret user intent
  • Measure response alignment with user queries

This evaluator operates at the TRACE_LEVEL, evaluating the most recent turn in the conversation.

  • Type: Union[Model, str, None]
  • Default: None (uses default Bedrock model)
  • Description: The model to use as the judge.
  • Type: str | None
  • Default: None (uses built-in template)
  • Description: Custom system prompt for the judge model.
  • Type: bool
  • Default: True
  • Description: Whether to include the input prompt in the evaluation context.
  • Type: str
  • Default: "v0"
  • Description: Prompt template version.

| Rating | Score | Description | |--------|-------|-------------| | Not At All | 0.0 | Response is completely unrelated to the question | | Not Generally | 0.25 | Response is mostly off-topic with minor relevance | | Neutral/Mixed | 0.5 | Response partially addresses the question | | Generally Yes | 0.75 | Response is mostly relevant with minor tangents | | Completely Yes | 1.0 | Response directly and fully addresses the question |

A response passes the evaluation if the score is >= 0.5.

from strands import Agent
from strands_evals import Case, Experiment
from strands_evals.evaluators import ResponseRelevanceEvaluator
from strands_evals.mappers import StrandsInMemorySessionMapper
from strands_evals.telemetry import StrandsEvalsTelemetry
telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
def task_function(case: Case) -> dict:
telemetry.in_memory_exporter.clear()
agent = Agent(
trace_attributes={"session.id": case.session_id},
callback_handler=None
)
response = agent(case.input)
spans = telemetry.in_memory_exporter.get_finished_spans()
mapper = StrandsInMemorySessionMapper()
session = mapper.map_to_session(spans, session_id=case.session_id)
return {"output": str(response), "trajectory": session}
cases = [
Case(name="relevance-check", input="How do I reset my password?")
]
experiment = Experiment(cases=cases, evaluators=[ResponseRelevanceEvaluator()])
reports = experiment.run_evaluations(task_function)
reports[0].run_display()