Skip to content

Context Offloader

The ContextOffloader plugin prevents large tool results from consuming your agent’s context window. When a tool returns a result that exceeds a configurable token threshold, the plugin stores each content block individually in an external storage backend and replaces it in the conversation with a truncated preview plus per-block references. Each offloaded result includes inline guidance telling the agent to use its available tools to selectively access the data it needs.

Tools like file readers, API clients, and database queries can return results that are tens or hundreds of thousands of characters long. When these large results enter the conversation, they crowd out other context and can exceed the model’s token limits.

The default SlidingWindowConversationManager handles this reactively — after the context overflows, it truncates tool results to the first and last 200 characters. This works as a safety net, but the truncation is lossy (the middle content is gone permanently) and happens after a failed API call has already been wasted.

ContextOffloader takes a proactive approach: it intercepts results at tool execution time, before they enter the conversation, so the overflow never happens in the first place.

After each tool call, the plugin estimates the result’s token count using the agent’s model.count_tokens() method and compares it against the max_result_tokens threshold (default: 2,500 tokens). If the result exceeds it, the plugin:

  1. Stores each content block individually in the configured storage backend, preserving its content type
  2. Replaces the in-context result with the first preview_tokens tokens (default: 1,000) plus per-block storage references

Token estimation uses tiktoken when available for accurate counts, falling back to a chars/4 heuristic. Preview slicing also uses tiktoken for exact token-level cuts when available.

Results under the threshold pass through unchanged.

For a tool that returns 150KB of JSON, the agent would see something like:

{"users": [{"id": 1, "name": "Alice", ...}, {"id": 2, "name": "Bob", ...},
... (first ~1,000 tokens of the result) ...
[Full content offloaded to storage - reference: a1b2c3d4]

For non-text content, the plugin replaces the result with a descriptive placeholder plus a reference:

Content TypeWhat the agent sees
Text / JSONFirst preview_tokens tokens + storage reference
Image[image: format, N bytes] placeholder + storage reference
Document[document: format, name, N bytes] placeholder + storage reference

The plugin includes an optional retrieve_offloaded_content tool that lets the agent fetch offloaded content by reference, returning it in its native format — text as a string, JSON as a JSON block, images as image blocks, and documents as document blocks. Enable it with include_retrieval_tool=True:

agent = Agent(plugins=[
ContextOffloader(
storage=InMemoryStorage(),
include_retrieval_tool=True,
)
])

The retrieval tool is disabled by default. The inline guidance in offloaded results always tells the agent to use its available tools to selectively access the data it needs. When the retrieval tool is enabled, the guidance additionally mentions retrieve_offloaded_content.

Pass a ContextOffloader instance to your agent’s plugins list with your choice of storage backend:

from strands import Agent
from strands.vended_plugins.context_offloader import (
ContextOffloader,
InMemoryStorage,
)
agent = Agent(plugins=[
ContextOffloader(storage=InMemoryStorage())
])

To customize the token thresholds:

agent = Agent(plugins=[
ContextOffloader(
storage=InMemoryStorage(),
max_result_tokens=5_000,
preview_tokens=2_000,
)
])

Choose a storage backend based on your needs:

BackendPersistenceBest for
InMemoryStorageProcess lifetime only (call clear() to free manually)Development, testing, reducing context without side effects
FileStorageDiskLocal development, debugging, inspecting stored artifacts
S3StorageAmazon S3Production workloads, shared or durable artifact retention

All backends implement the OffloadStorage protocol and preserve content type metadata, so you can also build your own.

File storage — persists to a local directory with .metadata.json sidecars for content type tracking:

from strands.vended_plugins.context_offloader import (
ContextOffloader,
FileStorage,
)
agent = Agent(plugins=[
ContextOffloader(
storage=FileStorage("./artifacts"),
)
])

S3 storage — persists to an Amazon S3 bucket with content type preserved via S3 object metadata:

from strands.vended_plugins.context_offloader import (
ContextOffloader,
S3Storage,
)
agent = Agent(plugins=[
ContextOffloader(
storage=S3Storage(
bucket="my-agent-artifacts",
prefix="tool-results/",
),
include_retrieval_tool=True,
)
])
ParameterDefaultDescription
storage(required)Storage backend instance
max_result_tokens2_500Results whose estimated token count exceeds this are offloaded
preview_tokens1_000Number of tokens to keep as an in-context preview
include_retrieval_toolFalseWhen True, registers a retrieve_offloaded_content tool the agent can use to fetch full content by reference
  • Preview vs. full content: The agent reasons over the preview, not the full result. If the answer is buried deep in a large result, the agent may miss it. Tune preview_tokens to balance context usage against information loss for your use case. Enable include_retrieval_tool=True if the agent needs to fetch full offloaded content and doesn’t have other tools (file readers, shell, etc.) that can access the storage backend directly.
  • Storage costs: S3Storage incurs S3 PUT/GET and storage charges. FileStorage writes to disk on every large result.
  • Not a replacement for conversation management: This plugin handles individual large results. You still need a conversation manager like SlidingWindowConversationManager to handle overall context growth across many turns.