Conversation Management
In the Strands Agents SDK, context refers to the information provided to the agent for understanding and reasoning. This includes:
- User messages
- Agent responses
- Tool usage and results
- System prompts
As conversations grow, managing this context becomes increasingly important for several reasons:
- Token Limits: Language models have fixed context windows (maximum tokens they can process)
- Performance: Larger contexts require more processing time and resources
- Relevance: Older messages may become less relevant to the current conversation
- Coherence: Maintaining logical flow and preserving important information
Built-in Conversation Managers
Section titled “Built-in Conversation Managers”The SDK provides a flexible system for context management through the ConversationManager interface. This allows you to implement different strategies for managing conversation history. You can either leverage one of Strands’s provided managers:
- NullConversationManager: A simple implementation that does not modify conversation history
- SlidingWindowConversationManager: Maintains a fixed number of recent messages (default manager)
- SummarizingConversationManager: Intelligently summarizes older messages to preserve context
or build your own manager that matches your requirements.
NullConversationManager
Section titled “NullConversationManager”The NullConversationManager is a simple implementation that does not modify the conversation history. It’s useful for:
- Short conversations that won’t exceed context limits
- Debugging purposes
- Cases where you want to manage context manually
from strands import Agentfrom strands.agent.conversation_manager import NullConversationManager
agent = Agent( conversation_manager=NullConversationManager())import { Agent, NullConversationManager } from '@strands-agents/sdk'
const agent = new Agent({ conversationManager: new NullConversationManager(),})SlidingWindowConversationManager
Section titled “SlidingWindowConversationManager”The SlidingWindowConversationManager implements a sliding window strategy that maintains a fixed number of recent messages. This is the default conversation manager used by the Agent class.
from strands import Agentfrom strands.agent.conversation_manager import SlidingWindowConversationManager
# Create a conversation manager with custom window sizeconversation_manager = SlidingWindowConversationManager( window_size=20, # Maximum number of messages to keep should_truncate_results=True, # Enable truncating the tool result when a message is too large for the model's context window)
agent = Agent( conversation_manager=conversation_manager)import { Agent, SlidingWindowConversationManager } from '@strands-agents/sdk'
// Create a conversation manager with custom window sizeconst conversationManager = new SlidingWindowConversationManager({ windowSize: 40, // Maximum number of messages to keep shouldTruncateResults: true, // Enable truncating the tool result when a message is too large for the model's context window})
const agent = new Agent({ conversationManager,})Key features of the SlidingWindowConversationManager:
- Maintains Window Size: Automatically removes messages from the window if the number of messages exceeds the limit.
- Dangling Message Cleanup: Removes incomplete message sequences to maintain valid conversation state.
- Overflow Trimming: In the case of a context window overflow, it will trim the oldest messages from history until the request fits in the models context window.
- Configurable Tool Result Truncation: Enable / disable truncation of tool results when the message exceeds context window limits. When
should_truncate_results=True(default), large results are truncated with a placeholder message. WhenFalse, full results are preserved but more historical messages may be removed. For a proactive alternative that preserves full content externally, see the Context Offloader plugin. - Per-Turn Management: Optionally apply context management proactively during the agent loop execution, not just at the end.
- Proactive Compression: Pass
proactiveCompression: trueorproactiveCompression: { compressionThreshold: 0.7 }to trigger context reduction before the model call when projected input tokens exceed a configurable threshold. See Proactive Context Compression.
Per-Turn Management:
By default, the SlidingWindowConversationManager applies context management only after the agent loop completes. The per_turn parameter allows you to proactively manage context during execution, which is useful for long-running agent loops with many tool calls.
from strands import Agentfrom strands.agent.conversation_manager import SlidingWindowConversationManager
# Apply management before every model callconversation_manager = SlidingWindowConversationManager( per_turn=True, # Apply management before each model call)
# Or apply management every N model callsconversation_manager = SlidingWindowConversationManager( per_turn=3, # Apply management every 3 model calls)
agent = Agent( conversation_manager=conversation_manager)// Not supported in TypeScriptThe per_turn parameter accepts:
False(default): Only apply management after the agent loop completesTrue: Apply management before every model call- An integer
N(must be > 0): Apply management every N model calls
SummarizingConversationManager
Section titled “SummarizingConversationManager”The SummarizingConversationManager (Python) / SummarizingConversationManager (TypeScript) implements intelligent conversation context management by summarizing older messages instead of simply discarding them. This approach preserves important information while staying within context limits.
Configuration parameters:
summary_ratio(float, default: 0.3): Percentage of messages to summarize when reducing context (clamped between 0.1 and 0.8)preserve_recent_messages(int, default: 10): Minimum number of recent messages to always keepsummarization_agent(Agent, optional): Custom agent for generating summaries. If not provided, uses the main agent instance. Cannot be used together withsummarization_system_prompt.summarization_system_prompt(str, optional): Custom system prompt for summarization. If not provided, uses a default prompt that creates structured bullet-point summaries focusing on key topics, tools used, and technical information in third-person format. Cannot be used together withsummarization_agent.
Configuration parameters:
model(Model, optional): Override model to use for generating summaries. When not provided, uses the agent’s own model.summaryRatio(number, default: 0.3): Ratio of messages to summarize when reducing context (clamped between 0.1 and 0.8)preserveRecentMessages(number, default: 10): Minimum number of recent messages to always keepsummarizationSystemPrompt(string, optional): Custom system prompt for summarization. If not provided, uses a default prompt that creates structured bullet-point summaries focusing on key topics, tools used, and technical information in third-person format.proactiveCompression(boolean | { compressionThreshold: number }, optional): Enable proactive context compression before the model call. Passtruefor the default 0.7 threshold, or an object with a custom threshold. See Proactive Context Compression.
Basic Usage:
By default, the SummarizingConversationManager leverages the same model and configuration as your main agent to perform summarization.
from strands import Agentfrom strands.agent.conversation_manager import SummarizingConversationManager
agent = Agent( conversation_manager=SummarizingConversationManager())By default, the SummarizingConversationManager uses the agent’s own model for summarization. You can optionally provide a different model to override this behavior.
import { Agent, SummarizingConversationManager } from '@strands-agents/sdk'
const agent = new Agent({ conversationManager: new SummarizingConversationManager(),})You can also customize the behavior by adjusting parameters like summary ratio and number of preserved messages:
from strands import Agentfrom strands.agent.conversation_manager import SummarizingConversationManager
# Create the summarizing conversation manager with default settingsconversation_manager = SummarizingConversationManager( summary_ratio=0.3, # Summarize 30% of messages when context reduction is needed preserve_recent_messages=10, # Always keep 10 most recent messages)
agent = Agent( conversation_manager=conversation_manager)import { Agent, SummarizingConversationManager, BedrockModel } from '@strands-agents/sdk'
// Optionally use a different model for summarizationconst summarizationModel = new BedrockModel({ modelId: 'anthropic.claude-sonnet-4-20250514-v1:0',})
const conversationManager = new SummarizingConversationManager({ model: summarizationModel, // Override the agent's model for summarization summaryRatio: 0.3, // Summarize 30% of messages when context reduction is needed preserveRecentMessages: 10, // Always keep 10 most recent messages})
const agent = new Agent({ conversationManager,})Custom System Prompt for Domain-Specific Summarization:
You can customize the summarization behavior by providing a custom system prompt that tailors the summarization to your domain or use case.
from strands import Agentfrom strands.agent.conversation_manager import SummarizingConversationManager
# Custom system prompt for technical conversationscustom_system_prompt = """You are summarizing a technical conversation. Create a concise bullet-point summary that:- Focuses on code changes, architectural decisions, and technical solutions- Preserves specific function names, file paths, and configuration details- Omits conversational elements and focuses on actionable information- Uses technical terminology appropriate for software development
Format as bullet points without conversational language."""
conversation_manager = SummarizingConversationManager( summarization_system_prompt=custom_system_prompt)
agent = Agent( conversation_manager=conversation_manager)import { Agent, SummarizingConversationManager } from '@strands-agents/sdk'
// Custom system prompt for technical conversations const customSystemPrompt = `You are summarizing a technical conversation.Create a concise bullet-point summary that:- Focuses on code changes, architectural decisions, and technical solutions- Preserves specific function names, file paths, and configuration details- Omits conversational elements and focuses on actionable information- Uses technical terminology appropriate for software development
Format as bullet points without conversational language.`
const conversationManager = new SummarizingConversationManager({ summarizationSystemPrompt: customSystemPrompt, })
const agent = new Agent({ conversationManager, })Advanced Configuration with Custom Summarization Agent:
For advanced use cases, you can provide a custom summarization_agent to handle the summarization process. This enables using a different model (such as a faster or a more cost-effective one), incorporating tools during summarization, or implementing specialized summarization logic tailored to your domain. The custom agent can leverage its own system prompt, tools, and model configuration to generate summaries that best preserve the essential context for your specific use case.
from strands import Agentfrom strands.agent.conversation_manager import SummarizingConversationManagerfrom strands.models import AnthropicModel
# Create a cheaper, faster model for summarization taskssummarization_model = AnthropicModel( model_id="claude-3-5-haiku-20241022", # More cost-effective for summarization max_tokens=1000, params={"temperature": 0.1} # Low temperature for consistent summaries)custom_summarization_agent = Agent(model=summarization_model)
conversation_manager = SummarizingConversationManager( summary_ratio=0.4, preserve_recent_messages=8, summarization_agent=custom_summarization_agent)
agent = Agent( conversation_manager=conversation_manager)// Not supported in TypeScriptKey Features
Section titled “Key Features”- Context Window Management: Automatically reduces context when token limits are exceeded
- Intelligent Summarization: Uses structured bullet-point summaries to capture key information
- Tool Pair Preservation: Ensures tool use and result message pairs aren’t broken during summarization
- Flexible Configuration: Customize summarization behavior through various parameters
- Fallback Safety: Handles summarization failures gracefully
Proactive Context Compression
Section titled “Proactive Context Compression”By default, conversation managers are reactive. They only reduce context after the model rejects a request with a context window overflow error. Proactive compression avoids wasting round-trips and output token starvation by triggering context reduction before the model call when the projected input token count exceeds a configurable threshold of the model’s context window.
Enabling Proactive Compression
Section titled “Enabling Proactive Compression”Pass proactive_compression to any built-in conversation manager. Use True for the default 0.7 threshold, or pass a dict with a custom compression_threshold ratio between 0 and 1. For example, 0.7 will trigger compression when 70% of the model’s context window is used:
With SlidingWindowConversationManager:
from strands import Agentfrom strands.agent.conversation_manager import SlidingWindowConversationManagerfrom strands.models.bedrock import BedrockModel
agent = Agent( model=BedrockModel(model_id="anthropic.claude-sonnet-4-6-v1:0"), conversation_manager=SlidingWindowConversationManager( window_size=50, proactive_compression={"compression_threshold": 0.7}, ),)With SummarizingConversationManager:
from strands import Agentfrom strands.agent.conversation_manager import SummarizingConversationManagerfrom strands.models.bedrock import BedrockModel
agent = Agent( model=BedrockModel(model_id="anthropic.claude-sonnet-4-6-v1:0"), conversation_manager=SummarizingConversationManager( proactive_compression=True, ),)Without proactive_compression, only reactive overflow recovery is used.
Pass proactiveCompression to any built-in conversation manager. Use true for the default 0.7 threshold, or pass an object with a custom compressionThreshold ratio between 0 and 1. For example, 0.7 will trigger compression when 70% of the model’s context window is used:
With SlidingWindowConversationManager:
import { Agent, BedrockModel, SlidingWindowConversationManager,} from '@strands-agents/sdk'
const agent = new Agent({ model: new BedrockModel({ modelId: 'anthropic.claude-sonnet-4-20250514-v1:0', }), conversationManager: new SlidingWindowConversationManager({ windowSize: 50, proactiveCompression: { compressionThreshold: 0.7 }, }),})With SummarizingConversationManager:
import { Agent, BedrockModel, SummarizingConversationManager } from '@strands-agents/sdk'
const agent = new Agent({ model: new BedrockModel({ modelId: 'anthropic.claude-sonnet-4-20250514-v1:0', }), conversationManager: new SummarizingConversationManager({ proactiveCompression: true, }),})Without proactiveCompression, only reactive overflow recovery is used.
How It Works
Section titled “How It Works”Before each model call, the agent estimates the projected input token count and attaches it to the BeforeModelCallEvent. When proactive compression is configured, the conversation manager compares this estimate against the model’s contextWindowLimit:
if projectedInputTokens / contextWindowLimit >= compressionThreshold: reduce() // proactively compress contextEach conversation manager uses the same reduction logic for proactive compression as reactive overflow recovery. Proactive compression is best-effort only, so if reduce() throws or returns false, the error is swallowed and the model call proceeds normally.
Because BeforeModelCallEvent triggers before every model call including calls within a tool-use cycle, this provides automatic in-loop compression. If an agent makes five tool calls in a single invocation and context grows past the threshold between calls three and four, compression triggers before call four.
Context Window Limit
Section titled “Context Window Limit”The threshold check requires the model’s context window size. The SDK auto-populates contextWindowLimit from built-in lookup tables (Python, TypeScript) for known models. You can override it manually for models not in the lookup table:
model = BedrockModel( model_id="my-custom-model", context_window_limit=128_000,)const model = new BedrockModel({ modelId: 'my-custom-model', contextWindowLimit: 128_000,})Token Estimation
Section titled “Token Estimation”The agent estimates input tokens using the following strategy:
- Known baseline: Reads
inputTokens + outputTokensfrom the last assistant message’smetadata.usage - Delta estimation: Only estimates tokens for new messages added since that assistant message using the model’s
countTokens()method - Cold start fallback: When no prior usage metadata exists (first call or after session restore without metadata), estimates all messages via
countTokens()
The countTokens() method uses a character-based heuristic to estimate token count by default (characters ÷ 4 for text, characters ÷ 2 for JSON). Some model providers support native token counting APIs for exact counts, which can be enabled on the model. See the Token Counting section on each provider’s page for details and instructions:
Creating a ConversationManager
Section titled “Creating a ConversationManager”To create a custom conversation manager, implement the ConversationManager interface, which is composed of three key elements:
-
apply_management: This method is called after each event loop cycle completes to manage the conversation history. It’s responsible for applying your management strategy to the messages array, which may have been modified with tool results and assistant responses. The agent runs this method automatically after processing each user input and generating a response. -
reduce_context: This method is called when the model’s context window is exceeded (typically due to token limits). It implements the specific strategy for reducing the window size when necessary. The agent calls this method when it encounters a context window overflow exception, giving your implementation a chance to trim the conversation history before retrying. -
removed_message_count: This attribute is tracked by conversation managers, and utilized by Session Management to efficiently load messages from the session storage. The count represents messages provided by the user or LLM that have been removed from the agent’s messages, but not messages included by the conversation manager through something like summarization. -
register_hooks(optional): Override this method to integrate with hooks. This enables proactive context management patterns, such as trimming context before model calls. Always callsuper().register_hookswhen overriding.
See the SlidingWindowConversationManager implementation as a reference example.
To create a custom conversation manager, extend the abstract ConversationManager base class and implement the reduce method:
-
reduce(options: ReduceOptions): boolean: Called in two scenarios: reactively when aContextWindowOverflowErroroccurs (options.erroris set), and proactively before model calls that exceed the compression threshold (options.errorisundefined). Mutateagent.messagesin place to reduce history, then returntrueif any reduction was made. Whenerroris set, returningfalselets the error propagate out of the agent loop uncaught. Whenerrorisundefined, returningfalseor throwing is safe — the model call proceeds regardless. -
initAgent(agent)(optional): Override to add proactive management (e.g. trimming after each invocation). Always callsuper.initAgent(agent)to preserve the built-in overflow recovery and proactive compression hooks.
import { Agent, ConversationManager, type ConversationManagerReduceOptions,} from '@strands-agents/sdk'
class Last10MessagesManager extends ConversationManager { readonly name = 'my:last-10-messages'
reduce({ agent }: ConversationManagerReduceOptions): boolean { if (agent.messages.length <= 10) return false agent.messages.splice(0, agent.messages.length - 10) return true }}
const agent = new Agent({ conversationManager: new Last10MessagesManager(),})For proactive management alongside overflow recovery, override initAgent:
import { Agent, ConversationManager, AfterInvocationEvent, type AgentData, type ConversationManagerReduceOptions,} from '@strands-agents/sdk'
class MyManager extends ConversationManager { readonly name = 'my:manager' private readonly _maxMessages = 5
reduce({ agent }: ConversationManagerReduceOptions): boolean { return this._trim(agent.messages) }
override initAgent(agent: LocalAgent): void { super.initAgent(agent) // preserves overflow recovery agent.addHook(AfterInvocationEvent, (event) => { this._trim(event.agent.messages) }) }
private _trim(messages: LocalAgent['messages']): boolean { if (messages.length <= this._maxMessages) return false messages.splice(0, messages.length - this._maxMessages) return true }}See the SlidingWindowConversationManager implementation as a reference example.