Skip to content

Context Offloader

The ContextOffloader plugin prevents large tool results from consuming your agent’s context window. When a tool returns a result that exceeds a configurable token threshold, the plugin stores each content block individually in an external storage backend and replaces it in the conversation with a truncated preview plus per-block references. Each offloaded result includes inline guidance telling the agent to use its available tools to selectively access the data it needs.

Tools like file readers, API clients, and database queries can return results that are tens or hundreds of thousands of characters long. When these large results enter the conversation, they crowd out other context and can exceed the model’s token limits.

The default SlidingWindowConversationManager handles this reactively — after the context overflows, it truncates tool results to the first and last 200 characters. This works as a safety net, but the truncation is lossy (the middle content is gone permanently) and happens after a failed API call has already been wasted.

ContextOffloader takes a proactive approach: it intercepts results at tool execution time, before they enter the conversation, so the overflow never happens in the first place.

After each tool call, the plugin estimates the result’s token count and compares it against the max_result_tokens threshold (default: 2,500 tokens). If the result exceeds it, the plugin:

  1. Stores each content block individually in the configured storage backend, preserving its content type
  2. Replaces the in-context result with the first preview_tokens tokens (default: 1,000) plus per-block storage references

Token estimation uses model.count_tokens(), which delegates to the model provider’s native counting API if available, otherwise falling back to a character-based heuristic (chars/4 for text, chars/2 for JSON).

Results under the threshold pass through unchanged.

For a tool that returns 150KB of JSON, the agent would see something like:

{"users": [{"id": 1, "name": "Alice", ...}, {"id": 2, "name": "Bob", ...},
... (first ~1,000 tokens of the result) ...
[Full content offloaded to storage - reference: a1b2c3d4]

For non-text content, the plugin replaces the result with a descriptive placeholder plus a reference:

Content TypeWhat the agent sees
Text / JSONFirst preview_tokens tokens + storage reference
Image[image: format, N bytes] placeholder + storage reference
Document[document: format, name, N bytes] placeholder + storage reference

Pass a ContextOffloader instance to your agent’s plugins list with your choice of storage backend:

from strands import Agent
from strands.vended_plugins.context_offloader import (
ContextOffloader,
InMemoryStorage,
)
agent = Agent(plugins=[
ContextOffloader(storage=InMemoryStorage())
])

To customize the token thresholds:

agent = Agent(plugins=[
ContextOffloader(
storage=InMemoryStorage(),
max_result_tokens=5_000,
preview_tokens=2_000,
)
])

Choose a storage backend based on your needs:

BackendPersistenceBest for
InMemoryStorageProcess lifetime only (call clear() to free manually)Development, testing, reducing context without side effects
FileStorageDiskLocal development, debugging, inspecting stored artifacts
S3StorageAmazon S3Production workloads, shared or durable artifact retention

All backends implement the Storage protocol and preserve content type metadata, so you can also build your own.

In-memory storage — stores content in process memory, useful for development and testing:

from strands.vended_plugins.context_offloader import (
ContextOffloader,
InMemoryStorage,
)
agent = Agent(plugins=[
ContextOffloader(
storage=InMemoryStorage(),
)
])

File storage — persists to a local directory with .metadata.json sidecars for content type tracking:

from strands.vended_plugins.context_offloader import (
ContextOffloader,
FileStorage,
)
agent = Agent(plugins=[
ContextOffloader(
storage=FileStorage("./artifacts"),
)
])

S3 storage — persists to an Amazon S3 bucket with content type preserved via S3 object metadata:

from strands.vended_plugins.context_offloader import (
ContextOffloader,
S3Storage,
)
agent = Agent(plugins=[
ContextOffloader(
storage=S3Storage(
bucket="my-agent-artifacts",
prefix="tool-results/",
),
)
])
ParameterDefaultDescription
storage(required)Storage backend instance
max_result_tokens2_500Results whose estimated token count exceeds this are offloaded
preview_tokens1_000Number of tokens to keep as an in-context preview
include_retrieval_toolTrueRegisters a retrieve_offloaded_content tool the agent can use to fetch full content by reference. Enabled by default; set to False to disable

The plugin includes a retrieve_offloaded_content tool that lets the agent fetch offloaded content by reference, returning it in its native format — text as a string, JSON as a JSON block, images as image blocks, and documents as document blocks. This tool is registered by default.

The inline guidance in offloaded results tells the agent to use its available tools to selectively access the data it needs, and mentions retrieve_offloaded_content as a fallback.

1. Tool result gets offloaded (replaces original result inline)

[Offloaded: 1 blocks, ~10,000 tokens]
Tool result was offloaded to external storage due to size.
Use the preview below to answer if possible.
Use retrieve_offloaded_content to fetch the full content by reference.
{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"},{"id":3,"name":"Charlie","rol
[Stored references:]
mem_1_tool-123_0 (json, 42,000 bytes)

2. Agent retrieves full content

Input: { reference: "mem_1_tool-123_0" }

The tool returns the full offloaded content in its native format.

{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"},{"id":3,"name":"Charlie","role":"user"}, ...]}

When using FileStorage, the agent can use its existing tools (shell, grep, cat, etc.) to access offloaded content directly from the file system. The offloaded guidance includes the full storage path, so the agent knows where to look:

grep -n "admin" ./artifacts/mem_1_tool-123_0
cat ./artifacts/mem_1_tool-123_0 | head -50
sed -n '45,55p' ./artifacts/mem_1_tool-123_0

With S3Storage, the agent can use the AWS CLI to access offloaded content:

aws s3 cp s3://my-agent-artifacts/tool-results/mem_1_tool-123_0 - | grep -n "admin"
aws s3 cp s3://my-agent-artifacts/tool-results/mem_1_tool-123_0 - | head -50

With InMemoryStorage, there is no external access path — the built-in retrieval tool is the only way to access offloaded content, so keep it enabled.

This approach is often preferable because the agent already knows these tools well and can chain them together for complex queries. To disable the built-in retrieval tool and rely on the agent’s own tools:

from strands_tools import shell
agent = Agent(
tools=[shell],
plugins=[
ContextOffloader(
storage=FileStorage("./artifacts"),
include_retrieval_tool=False,
)
]
)
  • Preview vs. full content: The agent reasons over the preview, not the full result. If the answer is buried deep in a large result, the agent may miss it. Tune preview_tokens to balance context usage against information loss for your use case. The retrieve_offloaded_content tool is enabled by default so the agent can fetch full offloaded content as a fallback. If the agent already has tools that can access the storage backend directly (file readers, shell, etc.), you can disable it with include_retrieval_tool=False.
  • Storage costs: S3Storage incurs S3 PUT/GET and storage charges. FileStorage writes to disk on every large result.
  • Not a replacement for conversation management: This plugin handles individual large results. You still need a conversation manager like SlidingWindowConversationManager to handle overall context growth across many turns.