Conversation Management

Name: Strands Agents SDK
Author: Strands Agents

In the Strands Agents SDK, context refers to the information provided to the agent for understanding and reasoning. This includes:

User messages
Agent responses
Tool usage and results
System prompts

As conversations grow, managing this context becomes increasingly important for several reasons:

Token Limits: Language models have fixed context windows (maximum tokens they can process)
Performance: Larger contexts require more processing time and resources
Relevance: Older messages may become less relevant to the current conversation
Coherence: Maintaining logical flow and preserving important information

Built-in Conversation Managers

The SDK provides a flexible system for context management through the ConversationManager interface. This allows you to implement different strategies for managing conversation history. You can either leverage one of Strands’s provided managers:

NullConversationManager: A simple implementation that does not modify conversation history
SlidingWindowConversationManager: Maintains a fixed number of recent messages (default manager)
SummarizingConversationManager: Intelligently summarizes older messages to preserve context

or build your own manager that matches your requirements.

NullConversationManager

The NullConversationManager is a simple implementation that does not modify the conversation history. It’s useful for:

Short conversations that won’t exceed context limits
Debugging purposes
Cases where you want to manage context manually

Python
TypeScript

from strands import Agent
from strands.agent.conversation_manager import NullConversationManager

agent = Agent(
    conversation_manager=NullConversationManager()
)

import { Agent, NullConversationManager } from '@strands-agents/sdk'

const agent = new Agent({
  conversationManager: new NullConversationManager(),
})

SlidingWindowConversationManager

The SlidingWindowConversationManager implements a sliding window strategy that maintains a fixed number of recent messages. This is the default conversation manager used by the Agent class.

Python
TypeScript

from strands import Agent
from strands.agent.conversation_manager import SlidingWindowConversationManager

# Create a conversation manager with custom window size
conversation_manager = SlidingWindowConversationManager(
    window_size=20,  # Maximum number of messages to keep
    should_truncate_results=True, # Enable truncating the tool result when a message is too large for the model's context window
)

agent = Agent(
    conversation_manager=conversation_manager
)

import { Agent, SlidingWindowConversationManager } from '@strands-agents/sdk'

// Create a conversation manager with custom window size
const conversationManager = new SlidingWindowConversationManager({
  windowSize: 40, // Maximum number of messages to keep
  shouldTruncateResults: true, // Enable truncating the tool result when a message is too large for the model's context window
})

const agent = new Agent({
  conversationManager,
})

Key features of the SlidingWindowConversationManager:

Maintains Window Size: Automatically removes messages from the window if the number of messages exceeds the limit.
Dangling Message Cleanup: Removes incomplete message sequences to maintain valid conversation state.
Overflow Trimming: In the case of a context window overflow, it will trim the oldest messages from history until the request fits in the models context window.
Configurable Tool Result Truncation: Enable / disable truncation of tool results when the message exceeds context window limits. When should_truncate_results=True (default), large results are truncated with a placeholder message. When False, full results are preserved but more historical messages may be removed. For a proactive alternative that preserves full content externally, see the Context Offloader plugin.
Per-Turn Management: Optionally apply context management proactively during the agent loop execution, not just at the end.
Proactive Compression: Pass proactiveCompression: true or proactiveCompression: { compressionThreshold: 0.7 } to trigger context reduction before the model call when projected input tokens exceed a configurable threshold. See Proactive Context Compression.

Per-Turn Management:

By default, the SlidingWindowConversationManager applies context management only after the agent loop completes. The per_turn parameter allows you to proactively manage context during execution, which is useful for long-running agent loops with many tool calls.

Python
TypeScript

from strands import Agent
from strands.agent.conversation_manager import SlidingWindowConversationManager

# Apply management before every model call
conversation_manager = SlidingWindowConversationManager(
    per_turn=True,  # Apply management before each model call
)

# Or apply management every N model calls
conversation_manager = SlidingWindowConversationManager(
    per_turn=3,  # Apply management every 3 model calls
)

agent = Agent(
    conversation_manager=conversation_manager
)

// Not supported in TypeScript

The per_turn parameter accepts:

False (default): Only apply management after the agent loop completes
True: Apply management before every model call
An integer N (must be > 0): Apply management every N model calls

SummarizingConversationManager

The SummarizingConversationManager (Python) / SummarizingConversationManager (TypeScript) implements intelligent conversation context management by summarizing older messages instead of simply discarding them. This approach preserves important information while staying within context limits.

Python
TypeScript

Configuration parameters:

summary_ratio (float, default: 0.3): Percentage of messages to summarize when reducing context (clamped between 0.1 and 0.8)
preserve_recent_messages (int, default: 10): Minimum number of recent messages to always keep
summarization_agent (Agent, optional): Custom agent for generating summaries. If not provided, uses the main agent instance. Cannot be used together with summarization_system_prompt.
summarization_system_prompt (str, optional): Custom system prompt for summarization. If not provided, uses a default prompt that creates structured bullet-point summaries focusing on key topics, tools used, and technical information in third-person format. Cannot be used together with summarization_agent.

Configuration parameters:

model (Model, optional): Override model to use for generating summaries. When not provided, uses the agent’s own model.
summaryRatio (number, default: 0.3): Ratio of messages to summarize when reducing context (clamped between 0.1 and 0.8)
preserveRecentMessages (number, default: 10): Minimum number of recent messages to always keep
summarizationSystemPrompt (string, optional): Custom system prompt for summarization. If not provided, uses a default prompt that creates structured bullet-point summaries focusing on key topics, tools used, and technical information in third-person format.
proactiveCompression (boolean | { compressionThreshold: number }, optional): Enable proactive context compression before the model call. Pass true for the default 0.7 threshold, or an object with a custom threshold. See Proactive Context Compression.

Basic Usage:

Python
TypeScript

By default, the SummarizingConversationManager leverages the same model and configuration as your main agent to perform summarization.

from strands import Agent
from strands.agent.conversation_manager import SummarizingConversationManager

agent = Agent(
    conversation_manager=SummarizingConversationManager()
)

By default, the SummarizingConversationManager uses the agent’s own model for summarization. You can optionally provide a different model to override this behavior.

import { Agent, SummarizingConversationManager } from '@strands-agents/sdk'

const agent = new Agent({
  conversationManager: new SummarizingConversationManager(),
})

You can also customize the behavior by adjusting parameters like summary ratio and number of preserved messages:

Python
TypeScript

from strands import Agent
from strands.agent.conversation_manager import SummarizingConversationManager

# Create the summarizing conversation manager with default settings
conversation_manager = SummarizingConversationManager(
    summary_ratio=0.3,  # Summarize 30% of messages when context reduction is needed
    preserve_recent_messages=10,  # Always keep 10 most recent messages
)

agent = Agent(
    conversation_manager=conversation_manager
)

import { Agent, SummarizingConversationManager, BedrockModel } from '@strands-agents/sdk'

// Optionally use a different model for summarization
const summarizationModel = new BedrockModel({
  modelId: 'anthropic.claude-sonnet-4-20250514-v1:0',
})

const conversationManager = new SummarizingConversationManager({
  model: summarizationModel, // Override the agent's model for summarization
  summaryRatio: 0.3, // Summarize 30% of messages when context reduction is needed
  preserveRecentMessages: 10, // Always keep 10 most recent messages
})

const agent = new Agent({
  conversationManager,
})

Custom System Prompt for Domain-Specific Summarization:

You can customize the summarization behavior by providing a custom system prompt that tailors the summarization to your domain or use case.

Python
TypeScript

from strands import Agent
from strands.agent.conversation_manager import SummarizingConversationManager

# Custom system prompt for technical conversations
custom_system_prompt = """
You are summarizing a technical conversation. Create a concise bullet-point summary that:
- Focuses on code changes, architectural decisions, and technical solutions
- Preserves specific function names, file paths, and configuration details
- Omits conversational elements and focuses on actionable information
- Uses technical terminology appropriate for software development

Format as bullet points without conversational language.
"""

conversation_manager = SummarizingConversationManager(
    summarization_system_prompt=custom_system_prompt
)

agent = Agent(
    conversation_manager=conversation_manager
)

import { Agent, SummarizingConversationManager } from '@strands-agents/sdk'

// Custom system prompt for technical conversations
  const customSystemPrompt = `
You are summarizing a technical conversation.
Create a concise bullet-point summary that:
- Focuses on code changes, architectural decisions, and technical solutions
- Preserves specific function names, file paths, and configuration details
- Omits conversational elements and focuses on actionable information
- Uses technical terminology appropriate for software development

Format as bullet points without conversational language.
`

  const conversationManager = new SummarizingConversationManager({
    summarizationSystemPrompt: customSystemPrompt,
  })

  const agent = new Agent({
    conversationManager,
  })

Advanced Configuration with Custom Summarization Agent:

Python
TypeScript

For advanced use cases, you can provide a custom summarization_agent to handle the summarization process. This enables using a different model (such as a faster or a more cost-effective one), incorporating tools during summarization, or implementing specialized summarization logic tailored to your domain. The custom agent can leverage its own system prompt, tools, and model configuration to generate summaries that best preserve the essential context for your specific use case.

from strands import Agent
from strands.agent.conversation_manager import SummarizingConversationManager
from strands.models import AnthropicModel

# Create a cheaper, faster model for summarization tasks
summarization_model = AnthropicModel(
    model_id="claude-3-5-haiku-20241022",  # More cost-effective for summarization
    max_tokens=1000,
    params={"temperature": 0.1}  # Low temperature for consistent summaries
)
custom_summarization_agent = Agent(model=summarization_model)

conversation_manager = SummarizingConversationManager(
    summary_ratio=0.4,
    preserve_recent_messages=8,
    summarization_agent=custom_summarization_agent
)

agent = Agent(
    conversation_manager=conversation_manager
)

// Not supported in TypeScript

Key Features

Context Window Management: Automatically reduces context when token limits are exceeded
Intelligent Summarization: Uses structured bullet-point summaries to capture key information
Tool Pair Preservation: Ensures tool use and result message pairs aren’t broken during summarization
Flexible Configuration: Customize summarization behavior through various parameters
Fallback Safety: Handles summarization failures gracefully

Proactive Context Compression

By default, conversation managers are reactive. They only reduce context after the model rejects a request with a context window overflow error. Proactive compression avoids wasting round-trips and output token starvation by triggering context reduction before the model call when the projected input token count exceeds a configurable threshold of the model’s context window.

Pass proactive_compression to any built-in conversation manager. Use True for the default 0.7 threshold, or pass a dict with a custom compression_threshold ratio between 0 and 1. For example, 0.7 will trigger compression when 70% of the model’s context window is used:

With SlidingWindowConversationManager:

from strands import Agent
from strands.agent.conversation_manager import SlidingWindowConversationManager
from strands.models.bedrock import BedrockModel

agent = Agent(
    model=BedrockModel(model_id="anthropic.claude-sonnet-4-6-v1:0"),
    conversation_manager=SlidingWindowConversationManager(
        window_size=50,
        proactive_compression={"compression_threshold": 0.7},
    ),
)

With SummarizingConversationManager:

from strands import Agent
from strands.agent.conversation_manager import SummarizingConversationManager
from strands.models.bedrock import BedrockModel

agent = Agent(
    model=BedrockModel(model_id="anthropic.claude-sonnet-4-6-v1:0"),
    conversation_manager=SummarizingConversationManager(
        proactive_compression=True,
    ),
)

Without proactive_compression, only reactive overflow recovery is used.

Pass proactiveCompression to any built-in conversation manager. Use true for the default 0.7 threshold, or pass an object with a custom compressionThreshold ratio between 0 and 1. For example, 0.7 will trigger compression when 70% of the model’s context window is used:

With SlidingWindowConversationManager:

import {
  Agent,
  BedrockModel,
  SlidingWindowConversationManager,
} from '@strands-agents/sdk'

const agent = new Agent({
  model: new BedrockModel({
    modelId: 'anthropic.claude-sonnet-4-20250514-v1:0',
  }),
  conversationManager: new SlidingWindowConversationManager({
    windowSize: 50,
    proactiveCompression: { compressionThreshold: 0.7 },
  }),
})

With SummarizingConversationManager:

import { Agent, BedrockModel, SummarizingConversationManager } from '@strands-agents/sdk'

const agent = new Agent({
  model: new BedrockModel({
    modelId: 'anthropic.claude-sonnet-4-20250514-v1:0',
  }),
  conversationManager: new SummarizingConversationManager({
    proactiveCompression: true,
  }),
})

Without proactiveCompression, only reactive overflow recovery is used.

How It Works

Before each model call, the agent estimates the projected input token count and attaches it to the BeforeModelCallEvent. When proactive compression is configured, the conversation manager compares this estimate against the model’s contextWindowLimit:

if projectedInputTokens / contextWindowLimit >= compressionThreshold:
    reduce()  // proactively compress context

Each conversation manager uses the same reduction logic for proactive compression as reactive overflow recovery. Proactive compression is best-effort only, so if reduce() throws or returns false, the error is swallowed and the model call proceeds normally.

Because BeforeModelCallEvent triggers before every model call including calls within a tool-use cycle, this provides automatic in-loop compression. If an agent makes five tool calls in a single invocation and context grows past the threshold between calls three and four, compression triggers before call four.

Context Window Limit

The threshold check requires the model’s context window size. The SDK auto-populates contextWindowLimit from built-in lookup tables (Python, TypeScript) for known models. You can override it manually for models not in the lookup table:

Python
TypeScript

model = BedrockModel(
    model_id="my-custom-model",
    context_window_limit=128_000,
)

const model = new BedrockModel({
  modelId: 'my-custom-model',
  contextWindowLimit: 128_000,
})

Token Estimation

The agent estimates input tokens using the following strategy:

Known baseline: Reads inputTokens + outputTokens from the last assistant message’s metadata.usage
Delta estimation: Only estimates tokens for new messages added since that assistant message using the model’s countTokens() method
Cold start fallback: When no prior usage metadata exists (first call or after session restore without metadata), estimates all messages via countTokens()

The countTokens() method uses a character-based heuristic to estimate token count by default (characters ÷ 4 for text, characters ÷ 2 for JSON). Some model providers support native token counting APIs for exact counts, which can be enabled on the model. See the Token Counting section on each provider’s page for details and instructions:

Creating a ConversationManager

Python
TypeScript

To create a custom conversation manager, implement the ConversationManager interface, which is composed of three key elements:

apply_management: This method is called after each event loop cycle completes to manage the conversation history. It’s responsible for applying your management strategy to the messages array, which may have been modified with tool results and assistant responses. The agent runs this method automatically after processing each user input and generating a response.
reduce_context: This method is called when the model’s context window is exceeded (typically due to token limits). It implements the specific strategy for reducing the window size when necessary. The agent calls this method when it encounters a context window overflow exception, giving your implementation a chance to trim the conversation history before retrying.
removed_message_count: This attribute is tracked by conversation managers, and utilized by Session Management to efficiently load messages from the session storage. The count represents messages provided by the user or LLM that have been removed from the agent’s messages, but not messages included by the conversation manager through something like summarization.
register_hooks (optional): Override this method to integrate with hooks. This enables proactive context management patterns, such as trimming context before model calls. Always call super().register_hooks when overriding.

See the SlidingWindowConversationManager implementation as a reference example.

To create a custom conversation manager, extend the abstract ConversationManager base class and implement the reduce method:

reduce(options: ReduceOptions): boolean: Called in two scenarios: reactively when a ContextWindowOverflowError occurs (options.error is set), and proactively before model calls that exceed the compression threshold (options.error is undefined). Mutate agent.messages in place to reduce history, then return true if any reduction was made. When error is set, returning false lets the error propagate out of the agent loop uncaught. When error is undefined, returning false or throwing is safe — the model call proceeds regardless.
initAgent(agent) (optional): Override to add proactive management (e.g. trimming after each invocation). Always call super.initAgent(agent) to preserve the built-in overflow recovery and proactive compression hooks.

import {
  Agent,
  ConversationManager,
  type ConversationManagerReduceOptions,
} from '@strands-agents/sdk'

class Last10MessagesManager extends ConversationManager {
  readonly name = 'my:last-10-messages'

  reduce({ agent }: ConversationManagerReduceOptions): boolean {
    if (agent.messages.length <= 10) return false
    agent.messages.splice(0, agent.messages.length - 10)
    return true
  }
}

const agent = new Agent({
  conversationManager: new Last10MessagesManager(),
})

For proactive management alongside overflow recovery, override initAgent:

import {
  Agent,
  ConversationManager,
  AfterInvocationEvent,
  type AgentData,
  type ConversationManagerReduceOptions,
} from '@strands-agents/sdk'

class MyManager extends ConversationManager {
  readonly name = 'my:manager'
  private readonly _maxMessages = 5

  reduce({ agent }: ConversationManagerReduceOptions): boolean {
    return this._trim(agent.messages)
  }

  override initAgent(agent: LocalAgent): void {
    super.initAgent(agent) // preserves overflow recovery
    agent.addHook(AfterInvocationEvent, (event) => {
      this._trim(event.agent.messages)
    })
  }

  private _trim(messages: LocalAgent['messages']): boolean {
    if (messages.length <= this._maxMessages) return false
    messages.splice(0, messages.length - this._maxMessages)
    return true
  }
}

See the SlidingWindowConversationManager implementation as a reference example.