Context Offloader
The ContextOffloader plugin prevents large tool results from consuming your agent’s context window. When a tool returns a result that exceeds a configurable token threshold, the plugin stores each content block individually in an external storage backend and replaces it in the conversation with a truncated preview plus per-block references. Each offloaded result includes inline guidance telling the agent to use its available tools to selectively access the data it needs.
The Problem
Section titled “The Problem”Tools like file readers, API clients, and database queries can return results that are tens or hundreds of thousands of characters long. When these large results enter the conversation, they crowd out other context and can exceed the model’s token limits.
The default SlidingWindowConversationManager handles this reactively — after the context overflows, it truncates tool results to the first and last 200 characters. This works as a safety net, but the truncation is lossy (the middle content is gone permanently) and happens after a failed API call has already been wasted.
ContextOffloader takes a proactive approach: it intercepts results at tool execution time, before they enter the conversation, so the overflow never happens in the first place.
How It Works
Section titled “How It Works”After each tool call, the plugin estimates the result’s token count and compares it against the max_result_tokens threshold (default: 2,500 tokens). If the result exceeds it, the plugin:
- Stores each content block individually in the configured storage backend, preserving its content type
- Replaces the in-context result with the first
preview_tokenstokens (default: 1,000) plus per-block storage references
Token estimation uses model.count_tokens(), which delegates to the model provider’s native counting API if available, otherwise falling back to a character-based heuristic (chars/4 for text, chars/2 for JSON).
Results under the threshold pass through unchanged.
What the agent sees
Section titled “What the agent sees”For a tool that returns 150KB of JSON, the agent would see something like:
{"users": [{"id": 1, "name": "Alice", ...}, {"id": 2, "name": "Bob", ...},... (first ~1,000 tokens of the result) ...
[Full content offloaded to storage - reference: a1b2c3d4]For non-text content, the plugin replaces the result with a descriptive placeholder plus a reference:
| Content Type | What the agent sees |
|---|---|
| Text / JSON | First preview_tokens tokens + storage reference |
| Image | [image: format, N bytes] placeholder + storage reference |
| Document | [document: format, name, N bytes] placeholder + storage reference |
Getting Started
Section titled “Getting Started”Pass a ContextOffloader instance to your agent’s plugins list with your choice of storage backend:
from strands import Agentfrom strands.vended_plugins.context_offloader import ( ContextOffloader, InMemoryStorage,)
agent = Agent(plugins=[ ContextOffloader(storage=InMemoryStorage())])import { Agent } from '@strands-agents/sdk'import { ContextOffloader, InMemoryStorage,} from '@strands-agents/sdk/vended-plugins/context-offloader'
const agent = new Agent({ plugins: [new ContextOffloader({ storage: new InMemoryStorage() })],})To customize the token thresholds:
agent = Agent(plugins=[ ContextOffloader( storage=InMemoryStorage(), max_result_tokens=5_000, preview_tokens=2_000, )])import { Agent } from '@strands-agents/sdk'import { ContextOffloader, InMemoryStorage,} from '@strands-agents/sdk/vended-plugins/context-offloader'
const agent = new Agent({ plugins: [ new ContextOffloader({ storage: new InMemoryStorage(), maxResultTokens: 5_000, previewTokens: 2_000, }), ],})Storage Backends
Section titled “Storage Backends”Choose a storage backend based on your needs:
| Backend | Persistence | Best for |
|---|---|---|
InMemoryStorage | Process lifetime only (call clear() to free manually) | Development, testing, reducing context without side effects |
FileStorage | Disk | Local development, debugging, inspecting stored artifacts |
S3Storage | Amazon S3 | Production workloads, shared or durable artifact retention |
All backends implement the Storage protocol and preserve content type metadata, so you can also build your own.
In-memory storage — stores content in process memory, useful for development and testing:
from strands.vended_plugins.context_offloader import ( ContextOffloader, InMemoryStorage,)
agent = Agent(plugins=[ ContextOffloader( storage=InMemoryStorage(), )])import { Agent } from '@strands-agents/sdk'import { ContextOffloader, InMemoryStorage,} from '@strands-agents/sdk/vended-plugins/context-offloader'
const agent = new Agent({ plugins: [ new ContextOffloader({ storage: new InMemoryStorage(), }), ],})File storage — persists to a local directory with .metadata.json sidecars for content type tracking:
from strands.vended_plugins.context_offloader import ( ContextOffloader, FileStorage,)
agent = Agent(plugins=[ ContextOffloader( storage=FileStorage("./artifacts"), )])import { Agent } from '@strands-agents/sdk'import { ContextOffloader, FileStorage,} from '@strands-agents/sdk/vended-plugins/context-offloader'
const agent = new Agent({ plugins: [ new ContextOffloader({ storage: new FileStorage('./artifacts'), }), ],})S3 storage — persists to an Amazon S3 bucket with content type preserved via S3 object metadata:
from strands.vended_plugins.context_offloader import ( ContextOffloader, S3Storage,)
agent = Agent(plugins=[ ContextOffloader( storage=S3Storage( bucket="my-agent-artifacts", prefix="tool-results/", ), )])import { Agent } from '@strands-agents/sdk'import { ContextOffloader, S3Storage,} from '@strands-agents/sdk/vended-plugins/context-offloader'
const agent = new Agent({ plugins: [ new ContextOffloader({ storage: new S3Storage('my-agent-artifacts', { prefix: 'tool-results/', }), }), ],})Configuration
Section titled “Configuration”| Parameter | Default | Description |
|---|---|---|
storage | (required) | Storage backend instance |
max_result_tokens | 2_500 | Results whose estimated token count exceeds this are offloaded |
preview_tokens | 1_000 | Number of tokens to keep as an in-context preview |
include_retrieval_tool | True | Registers a retrieve_offloaded_content tool the agent can use to fetch full content by reference. Enabled by default; set to False to disable |
| Parameter | Default | Description |
|---|---|---|
storage | (required) | Storage backend instance |
maxResultTokens | 2_500 | Results whose estimated token count exceeds this are offloaded |
previewTokens | 1_000 | Number of tokens to keep as an in-context preview |
includeRetrievalTool | true | Registers a retrieve_offloaded_content tool the agent can use to fetch full content by reference. Enabled by default; set to false to disable |
Retrieval Tool
Section titled “Retrieval Tool”The plugin includes a retrieve_offloaded_content tool that lets the agent fetch offloaded content by reference, returning it in its native format — text as a string, JSON as a JSON block, images as image blocks, and documents as document blocks. This tool is registered by default.
The inline guidance in offloaded results tells the agent to use its available tools to selectively access the data it needs, and mentions retrieve_offloaded_content as a fallback.
In the TypeScript SDK, the retrieval tool supports targeted retrieval through optional parameters, so the agent can search and filter offloaded content without loading it entirely back into context.
Parameters:
| Parameter | Type | Description |
|---|---|---|
reference | string | (required) Storage reference from the offloaded result |
pattern | string | Regex or keyword to grep for |
line_range | { start: number; end: number } | 1-indexed inclusive line span to retrieve |
context_lines | number | Lines of context around pattern matches (default: 5) |
Retrieval modes:
- Pattern search — Provide
patternto grep for regex/keyword matches with configurablecontext_lines - Line range — Provide
line_rangefor random access to specific line numbers - Combined — Provide both
patternandline_rangeto search within a specific range - Head — Provide only
context_lineswithout apatternorline_rangeto return the first N lines of the content - Full retrieval — Omit all optional parameters to retrieve everything (discouraged for large content)
Results include line numbers to enable follow-up queries. Large result sets are truncated with guidance to narrow the search. Binary content cannot be searched — pattern and line range parameters return an error for binary references.
Retrieval examples
Section titled “Retrieval examples”1. Tool result gets offloaded (replaces original result inline)
[Offloaded: 1 blocks, ~10,000 tokens]Tool result was offloaded to external storage due to size.Use the preview below to answer if possible.Use retrieve_offloaded_content to fetch the full content by reference.
{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"},{"id":3,"name":"Charlie","rol
[Stored references:] mem_1_tool-123_0 (json, 42,000 bytes)2. Agent retrieves full content
Input: { reference: "mem_1_tool-123_0" }
The tool returns the full offloaded content in its native format.
{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"},{"id":3,"name":"Charlie","role":"user"}, ...]}1. Tool result gets offloaded (replaces original result inline)
[Offloaded: 1 blocks, ~10,000 tokens]Tool result was offloaded to external storage due to size.Use the preview below to answer if possible.Use retrieve_offloaded_content with a reference and either: - pattern: regex or keyword to find matching lines with context - line_range: { start, end } to read a specific span of linesOnly retrieve the full content (omit pattern/line_range) as a last resort.
{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"},{"id":3,"name":"Charlie","rol
[Stored references:] mem_1_tool-123_0 (json, 42,000 bytes)2. Agent searches with a pattern
Input: { reference: "mem_1_tool-123_0", pattern: "admin", context_lines: 2 }
[2 matches for /admin/]
1| { 2| "users": [> 3| { "id": 1, "name": "Alice", "role": "admin" }, 4| { "id": 2, "name": "Bob", "role": "user" }, 5| { "id": 3, "name": "Charlie", "role": "user" },--- 48| { "id": 15, "name": "Dana", "role": "user" },> 49| { "id": 16, "name": "Eve", "role": "admin" }, 50| { "id": 17, "name": "Frank", "role": "user" } 51| ]Using other tools for retrieval
Section titled “Using other tools for retrieval”When using FileStorage, the agent can use its existing tools (shell, grep, cat, etc.) to access offloaded content directly from the file system. The offloaded guidance includes the full storage path, so the agent knows where to look:
grep -n "admin" ./artifacts/mem_1_tool-123_0cat ./artifacts/mem_1_tool-123_0 | head -50sed -n '45,55p' ./artifacts/mem_1_tool-123_0With S3Storage, the agent can use the AWS CLI to access offloaded content:
aws s3 cp s3://my-agent-artifacts/tool-results/mem_1_tool-123_0 - | grep -n "admin"aws s3 cp s3://my-agent-artifacts/tool-results/mem_1_tool-123_0 - | head -50With InMemoryStorage, there is no external access path — the built-in retrieval tool is the only way to access offloaded content, so keep it enabled.
This approach is often preferable because the agent already knows these tools well and can chain them together for complex queries. To disable the built-in retrieval tool and rely on the agent’s own tools:
from strands_tools import shell
agent = Agent( tools=[shell], plugins=[ ContextOffloader( storage=FileStorage("./artifacts"), include_retrieval_tool=False, ) ])import { Agent } from '@strands-agents/sdk'import { ContextOffloader, FileStorage } from '@strands-agents/sdk/vended-plugins/context-offloader'import { bash } from '@strands-agents/sdk/vended-tools/bash'import { fileEditor } from '@strands-agents/sdk/vended-tools/file-editor'
const agent = new Agent({ tools: [bash, fileEditor], plugins: [ new ContextOffloader({ storage: new FileStorage('./artifacts'), includeRetrievalTool: false, }), ],})Tradeoffs
Section titled “Tradeoffs”- Preview vs. full content: The agent reasons over the preview, not the full result. If the answer is buried deep in a large result, the agent may miss it. Tune
preview_tokensto balance context usage against information loss for your use case. Theretrieve_offloaded_contenttool is enabled by default so the agent can fetch full offloaded content as a fallback. If the agent already has tools that can access the storage backend directly (file readers, shell, etc.), you can disable it withinclude_retrieval_tool=False. - Storage costs:
S3Storageincurs S3 PUT/GET and storage charges.FileStoragewrites to disk on every large result. - Not a replacement for conversation management: This plugin handles individual large results. You still need a conversation manager like
SlidingWindowConversationManagerto handle overall context growth across many turns.