strands.experimental.checkpoint.checkpoint
Checkpoint system for durable agent execution.
Checkpoints enable crash-resilient agent workflows by capturing agent state at cycle boundaries in the agent loop. A durability provider (e.g. Temporal) can persist checkpoints and resume from them after failures.
Two checkpoint positions per ReAct cycle:
- after_model: model call completed, tools not yet executed.
- after_tools: all tools executed, next model call pending.
Per-tool granularity is handled by the ToolExecutor abstraction (e.g. TemporalToolExecutor routes each tool to a separate Temporal activity). The SDK checkpoint operates at cycle boundaries.
User-facing pattern (same as interrupts):
- Pause via stop_reason=“checkpoint” on AgentResult
- State via AgentResult.checkpoint field
- Resume via checkpointResume content block in next agent() call
V0 Known Limitations:
- Metrics reset on each resume call. The caller is responsible for aggregating metrics across a durable run. EventLoopMetrics reflects only the current call.
- OpenAIResponsesModel(stateful=True) is not supported. The server-side response_id (_model_state) is not captured in the snapshot.
- When position is “after_tools”, AgentResult.message is the assistant message that requested the tools; tool results are in the snapshot messages.
- BeforeInvocationEvent and AfterInvocationEvent fire on every resume call, same as interrupts. Hooks counting invocations will see each resume as a separate invocation.
- Per-tool granularity within a cycle requires a custom ToolExecutor (e.g. TemporalToolExecutor).
Checkpoint
Section titled “Checkpoint”@dataclassclass Checkpoint()Defined in: src/strands/experimental/checkpoint/checkpoint.py:46
Pause point in the agent loop. Treat as opaque — pass back to resume.
Attributes:
position- What just completed (after_model or after_tools).cycle_index- Which ReAct loop cycle (0-based).snapshot- Serialized agent state as a dict, produced bySnapshot.to_dict(). Stored asdict[str, Any](not aSnapshotobject) because checkpoints must be JSON-serializable for cross-process persistence. The consumer reconstructs viaSnapshot.from_dict()on resume.app_data- Application-level internal state data. The SDK does not read or modify this. Applications can store arbitrary data needed across checkpoint boundaries (e.g. session context, workflow metadata). Separate fromSnapshot.app_datawhich captures agent-state-level data managed by the SDK.schema_version- Rejects mismatches on resume across schema versions.
to_dict
Section titled “to_dict”def to_dict() -> dict[str, Any]Defined in: src/strands/experimental/checkpoint/checkpoint.py:70
Serialize for persistence.
from_dict
Section titled “from_dict”@classmethoddef from_dict(cls, data: dict[str, Any]) -> "Checkpoint"Defined in: src/strands/experimental/checkpoint/checkpoint.py:75
Reconstruct from a dict produced by to_dict().
Arguments:
data- Serialized checkpoint data.
Raises:
ValueError- If schema_version doesn’t match the current version.