Context Management
As conversations grow, your agent’s context window fills with messages, tool results, and system prompts. Without management, this leads to token limit errors, degraded performance, and loss of relevant information.
The SDK ships context management configurations that work out of the box, so you don’t have to assemble conversation managers, offloaders, and thresholds by hand. Pick a mode and the SDK wires up the pieces with tuned defaults. You can still drop down to explicit configuration when you need it.
Automatic context management
Section titled “Automatic context management”For most agents with multi-turn conversations, you don’t need to configure conversation management and context offloading separately. Pass context_manager="auto" contextManager: "auto"
from strands import Agent
agent = Agent(context_manager="auto")import { Agent } from '@strands-agents/sdk'
const agent = new Agent({ contextManager: 'auto',})What it sets up
Section titled “What it sets up”This composes two components whose defaults scored highest across ContextBench evaluations relative to other Strands Agent configurations:
- SummarizingConversationManager (summary ratio of 0.3, compression threshold of 0.85): proactively summarizes older messages before the context window fills, preserving key information while freeing space for new turns.
- ContextOffloader plugin (max result tokens of 1,500, preview tokens of 750): intercepts large tool results at execution time, stores them externally, and keeps a truncated preview in context. Registers a
retrieve_offloaded_contenttool so the agent can fetch full content on demand.
For full details on each component, see Conversation Management and Context Offloader.
Combining with explicit configuration
Section titled “Combining with explicit configuration”Your own settings take precedence when you need fine-grained control:
- Custom conversation manager: If you also pass
, your manager replaces the auto-composedconversation_managerconversationManagerSummarizingConversationManager. The SDK still adds theContextOffloaderplugin. - Existing ContextOffloader: If your
pluginslist already contains aContextOffloaderinstance, no duplicate is added. Your configuration is preserved.
from strands import Agentfrom strands.agent.conversation_manager import ( SlidingWindowConversationManager,)
# Your conversation manager is used;# ContextOffloader is still added automaticallyagent = Agent( context_manager="auto", conversation_manager=SlidingWindowConversationManager( window_size=30, ),)import { Agent, SlidingWindowConversationManager } from '@strands-agents/sdk'
// Your conversation manager is used;// ContextOffloader is still added automaticallyconst agent = new Agent({ contextManager: 'auto', conversationManager: new SlidingWindowConversationManager({ windowSize: 30, }),})Agentic context management
Section titled “Agentic context management”Auto mode compresses on a fixed threshold the SDK controls. Agentic mode hands that control to the model. Pass context_manager="agentic" contextManager: "agentic"
from strands import Agent
agent = Agent(context_manager="agentic")import { Agent } from '@strands-agents/sdk'
const agent = new Agent({ contextManager: 'agentic',})The model is better positioned than a threshold to know which messages still matter. A coding agent can drop a stale file it already edited while pinning the failing test it is working toward. A threshold cannot tell the difference; it compresses by age. Agentic mode trades tokens for that judgment.
What it sets up
Section titled “What it sets up”Two things give the model the information and the levers to manage context.
Token-usage telemetry. Before each model call, the SDK appends a status block to the latest message reporting how much of the window is in use:
<context-status><used>50,000 / 200,000 tokens (25.0%)</used><remaining>~150,000 tokens</remaining></context-status>This is the signal the model acts on. It decides whether the window is full enough to compress, instead of waiting for a fixed cutoff.
Three tools the model can call, each a different lever with its own choices:
summarize_contextfolds older messages into a model-written summary. The model chooses how many recent messages to keep verbatim, how aggressively to summarize, and whether to target tool results, discussion, or both.truncate_contextdrops older messages outright when they no longer need preserving. The model again chooses how much recent history to keep and what kind of messages to target.pin_contextmarks messages that must survive compression: a user constraint, a key fact, the current task. Pinned messages are never evicted by either tool. The model can pin the current exchange, the last few messages, or specific ones.
Recent messages stay verbatim regardless, and the first user message is always preserved so the conversation stays valid. New tools may be added to agentic mode in future releases.
Behind the tools, agentic mode also composes a SummarizingConversationManager with no proactive compression and a ContextOffloader. The conversation manager is only a safety net: if the model lets the window overflow anyway, it compresses reactively. The offloader uses a higher inline threshold than auto mode (8,000 tokens versus 1,500), since the model is already managing context and benefits from seeing more tool output inline before it is offloaded.
Choosing between auto and agentic
Section titled “Choosing between auto and agentic”Use "auto" for most agents. It manages context in the background, with no model involvement and no extra tool calls. Reach for "agentic" when you want the model itself to decide what stays in context: it judges relevance per message rather than compressing on a fixed threshold. The tradeoff is the tokens the model spends reading telemetry and calling the tools.
Limitations
Section titled “Limitations”Stateful models: Stateful models manage conversation state server-side. Setting context_manager contextManager ValueError Error
Both modes compose an offloader that uses in-memory storage, which does not persist across process restarts. If your agent uses session management and needs durable offloaded content, configure an explicit ContextOffloader with FileStorage or S3Storage. See Storage Backends.