Context Offloader
The ContextOffloader plugin prevents large tool results from consuming your agent’s context window. When a tool returns a result that exceeds a configurable token threshold, the plugin stores each content block individually in an external storage backend and replaces it in the conversation with a truncated preview plus per-block references. Each offloaded result includes inline guidance telling the agent to use its available tools to selectively access the data it needs.
The Problem
Section titled “The Problem”Tools like file readers, API clients, and database queries can return results that are tens or hundreds of thousands of characters long. When these large results enter the conversation, they crowd out other context and can exceed the model’s token limits.
The default SlidingWindowConversationManager handles this reactively — after the context overflows, it truncates tool results to the first and last 200 characters. This works as a safety net, but the truncation is lossy (the middle content is gone permanently) and happens after a failed API call has already been wasted.
ContextOffloader takes a proactive approach: it intercepts results at tool execution time, before they enter the conversation, so the overflow never happens in the first place.
How It Works
Section titled “How It Works”After each tool call, the plugin estimates the result’s token count using the agent’s model.count_tokens() method and compares it against the max_result_tokens threshold (default: 2,500 tokens). If the result exceeds it, the plugin:
- Stores each content block individually in the configured storage backend, preserving its content type
- Replaces the in-context result with the first
preview_tokenstokens (default: 1,000) plus per-block storage references
Token estimation uses tiktoken when available for accurate counts, falling back to a chars/4 heuristic. Preview slicing also uses tiktoken for exact token-level cuts when available.
Results under the threshold pass through unchanged.
What the agent sees
Section titled “What the agent sees”For a tool that returns 150KB of JSON, the agent would see something like:
{"users": [{"id": 1, "name": "Alice", ...}, {"id": 2, "name": "Bob", ...},... (first ~1,000 tokens of the result) ...
[Full content offloaded to storage - reference: a1b2c3d4]For non-text content, the plugin replaces the result with a descriptive placeholder plus a reference:
| Content Type | What the agent sees |
|---|---|
| Text / JSON | First preview_tokens tokens + storage reference |
| Image | [image: format, N bytes] placeholder + storage reference |
| Document | [document: format, name, N bytes] placeholder + storage reference |
Retrieval tool (opt-in)
Section titled “Retrieval tool (opt-in)”The plugin includes an optional retrieve_offloaded_content tool that lets the agent fetch offloaded content by reference, returning it in its native format — text as a string, JSON as a JSON block, images as image blocks, and documents as document blocks. Enable it with include_retrieval_tool=True:
agent = Agent(plugins=[ ContextOffloader( storage=InMemoryStorage(), include_retrieval_tool=True, )])The retrieval tool is disabled by default. The inline guidance in offloaded results always tells the agent to use its available tools to selectively access the data it needs. When the retrieval tool is enabled, the guidance additionally mentions retrieve_offloaded_content.
Getting Started
Section titled “Getting Started”Pass a ContextOffloader instance to your agent’s plugins list with your choice of storage backend:
from strands import Agentfrom strands.vended_plugins.context_offloader import ( ContextOffloader, InMemoryStorage,)
agent = Agent(plugins=[ ContextOffloader(storage=InMemoryStorage())])To customize the token thresholds:
agent = Agent(plugins=[ ContextOffloader( storage=InMemoryStorage(), max_result_tokens=5_000, preview_tokens=2_000, )])Storage Backends
Section titled “Storage Backends”Choose a storage backend based on your needs:
| Backend | Persistence | Best for |
|---|---|---|
InMemoryStorage | Process lifetime only (call clear() to free manually) | Development, testing, reducing context without side effects |
FileStorage | Disk | Local development, debugging, inspecting stored artifacts |
S3Storage | Amazon S3 | Production workloads, shared or durable artifact retention |
All backends implement the OffloadStorage protocol and preserve content type metadata, so you can also build your own.
File storage — persists to a local directory with .metadata.json sidecars for content type tracking:
from strands.vended_plugins.context_offloader import ( ContextOffloader, FileStorage,)
agent = Agent(plugins=[ ContextOffloader( storage=FileStorage("./artifacts"), )])S3 storage — persists to an Amazon S3 bucket with content type preserved via S3 object metadata:
from strands.vended_plugins.context_offloader import ( ContextOffloader, S3Storage,)
agent = Agent(plugins=[ ContextOffloader( storage=S3Storage( bucket="my-agent-artifacts", prefix="tool-results/", ), include_retrieval_tool=True, )])Configuration
Section titled “Configuration”| Parameter | Default | Description |
|---|---|---|
storage | (required) | Storage backend instance |
max_result_tokens | 2_500 | Results whose estimated token count exceeds this are offloaded |
preview_tokens | 1_000 | Number of tokens to keep as an in-context preview |
include_retrieval_tool | False | When True, registers a retrieve_offloaded_content tool the agent can use to fetch full content by reference |
Tradeoffs
Section titled “Tradeoffs”- Preview vs. full content: The agent reasons over the preview, not the full result. If the answer is buried deep in a large result, the agent may miss it. Tune
preview_tokensto balance context usage against information loss for your use case. Enableinclude_retrieval_tool=Trueif the agent needs to fetch full offloaded content and doesn’t have other tools (file readers, shell, etc.) that can access the storage backend directly. - Storage costs:
S3Storageincurs S3 PUT/GET and storage charges.FileStoragewrites to disk on every large result. - Not a replacement for conversation management: This plugin handles individual large results. You still need a conversation manager like
SlidingWindowConversationManagerto handle overall context growth across many turns.