Multimodal Instruction Following Evaluator
Overview
Section titled “Overview”The MultimodalInstructionFollowingEvaluator assesses whether an agent response satisfies the explicit constraints in the user’s instruction (count, format, scope, order, completeness, and style), independently of factual accuracy.
Key Features
Section titled “Key Features”- Output-Level Evaluation: Scores a single agent response per case
- Binary Scoring:
1.0if all constraints are satisfied,0.0if any constraint is violated - Constraint-Focused: Evaluates compliance with directives, not overall correctness or quality
- Image-Aware: Verifies image-referential constraints (e.g., “describe only the background”)
When to Use
Section titled “When to Use”Use the MultimodalInstructionFollowingEvaluator when you need to:
- Verify that responses respect format constraints (bullet vs. numbered list, paragraph, JSON)
- Check count constraints (“exactly N sentences”, “in one paragraph”)
- Assess scope constraints (“describe only the foreground”, “do not mention people”)
- Validate order constraints (“left to right”, “largest to smallest”)
- Evaluate instruction compliance independently from factual correctness
Evaluation Level
Section titled “Evaluation Level”This evaluator operates at the OUTPUT_LEVEL, scoring a single agent response per case.
Parameters
Section titled “Parameters”rubric (optional)
Section titled “rubric (optional)”- Type:
str | None - Default:
INSTRUCTION_FOLLOWING_RUBRIC_V0 - Description: Custom rubric. Leave unset to use the default rubric.
model (optional)
Section titled “model (optional)”- Type:
Model | str | None - Default:
None(uses default Bedrock model) - Description: Multimodal judge model.
include_inputs (optional)
Section titled “include_inputs (optional)”- Type:
bool - Default:
True
system_prompt (optional)
Section titled “system_prompt (optional)”- Type:
str | None - Default:
None(uses the built-inMLLM_JUDGE_SYSTEM_PROMPT)
reference_suffix (optional)
Section titled “reference_suffix (optional)”- Type:
str | None - Default:
None(uses the built-in default suffix)
Scoring System
Section titled “Scoring System”| Score | Label | Meaning |
|---|---|---|
| 1.0 | Following | All explicit constraints are satisfied |
| 0.0 | Not Following | One or more constraints are violated |
A response passes only if the score is 1.0.
Basic Usage
Section titled “Basic Usage”from strands_evals import Case, Experimentfrom strands_evals.evaluators import MultimodalInstructionFollowingEvaluatorfrom strands_evals.types import MultimodalInputfrom strands_evals.types.evaluation_report import EvaluationReport
def task_function(case: Case) -> str: # Replace with your multimodal agent invocation. return "- tree\n- bench\n- lamppost"
cases = [ Case( name="bullet-format", input=MultimodalInput( media="/path/to/park.jpg", instruction="List exactly three objects visible in the background as bullet points.", ), ),]
experiment = Experiment( cases=cases, evaluators=[MultimodalInstructionFollowingEvaluator()],)reports = experiment.run_evaluations(task_function)EvaluationReport.flatten(reports).run_display()Combining with Other Evaluators
Section titled “Combining with Other Evaluators”Pair with correctness and faithfulness to assess different failure modes separately. Experiment.run_evaluations returns one report per evaluator, so use EvaluationReport.flatten to view them together:
from strands_evals import Experimentfrom strands_evals.evaluators import ( MultimodalCorrectnessEvaluator, MultimodalFaithfulnessEvaluator, MultimodalInstructionFollowingEvaluator,)from strands_evals.types.evaluation_report import EvaluationReport
evaluators = [ MultimodalInstructionFollowingEvaluator(), # Did it follow the instruction constraints? MultimodalCorrectnessEvaluator(), # Are the listed objects correct? MultimodalFaithfulnessEvaluator(), # Are they actually in the image?]
experiment = Experiment(cases=cases, evaluators=evaluators)reports = experiment.run_evaluations(task_function)EvaluationReport.flatten(reports).run_display()Related Evaluators
Section titled “Related Evaluators”- MultimodalOutputEvaluator: Parent class with full parameter reference
- InstructionFollowingEvaluator: Text-only counterpart