Ollama

Ollama is a framework for running open-source large language models locally. Strands provides native support for Ollama, allowing you to use locally-hosted models in your agents.

The OllamaModel class in Strands enables seamless integration with Ollama’s API, supporting:

Text generation
Image understanding
Tool/function calling
Streaming responses
Configuration management

Getting Started

Prerequisites

First install the python client into your python environment:

pip install 'strands-agents[ollama]' strands-agents-tools

Next, you’ll need to install and setup ollama itself.

Option 1: Native Installation

Install Ollama by following the instructions at ollama.ai
Pull your desired model:
Terminal window
```
ollama pull llama3.1
```
Start the Ollama server:
Terminal window
```
ollama serve
```

Option 2: Docker Installation

Pull the Ollama Docker image:
Terminal window
```
docker pull ollama/ollama
```
Run the Ollama container:
Terminal window
```
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```
Note: Add --gpus=all if you have a GPU and if Docker GPU support is configured.
Pull a model using the Docker container:
Terminal window
```
docker exec -it ollama ollama pull llama3.1
```
Verify the Ollama server is running:
Terminal window
```
curl http://localhost:11434/api/tags
```

Basic Usage

Here’s how to create an agent using an Ollama model:

from strands import Agent
from strands.models.ollama import OllamaModel

# Create an Ollama model instance
ollama_model = OllamaModel(
    host="http://localhost:11434",  # Ollama server address
    model_id="llama3.1"               # Specify which model to use
)

# Create an agent using the Ollama model
agent = Agent(model=ollama_model)

# Use the agent
agent("Tell me about Strands agents.") # Prints model output to stdout by default

Configuration Options

The OllamaModel supports various configuration parameters:

Parameter	Description	Default
`host`	The address of the Ollama server	Required
`model_id`	The Ollama model identifier	Required
`keep_alive`	How long the model stays loaded in memory	”5m”
`max_tokens`	Maximum number of tokens to generate	None
`temperature`	Controls randomness (higher = more random)	None
`top_p`	Controls diversity via nucleus sampling	None
`stop_sequences`	List of sequences that stop generation	None
`options`	Additional model parameters (e.g., top_k)	None
`additional_args`	Any additional arguments for the request	None

Example with Configuration

from strands import Agent
from strands.models.ollama import OllamaModel

# Create a configured Ollama model
ollama_model = OllamaModel(
    host="http://localhost:11434",
    model_id="llama3.1",
    temperature=0.7,
    keep_alive="10m",
    stop_sequences=["###", "END"],
    options={"top_k": 40}
)

# Create an agent with the configured model
agent = Agent(model=ollama_model)

# Use the agent
response = agent("Write a short story about an AI assistant.")

Advanced Features

Updating Configuration at Runtime

You can update the model configuration during runtime:

# Create the model with initial configuration
ollama_model = OllamaModel(
    host="http://localhost:11434",
    model_id="llama3.1",
    temperature=0.7
)

# Update configuration later
ollama_model.update_config(
    temperature=0.9,
    top_p=0.8
)

This is especially useful if you want a tool to update the model’s config for you:

@tool
def update_model_id(model_id: str, agent: Agent) -> str:
    """
    Update the model id of the agent

    Args:
      model_id: Ollama model id to use.
    """
    print(f"Updating model_id to {model_id}")
    agent.model.update_config(model_id=model_id)
    return f"Model updated to {model_id}"


@tool
def update_temperature(temperature: float, agent: Agent) -> str:
    """
    Update the temperature of the agent

    Args:
      temperature: Temperature value for the model to use.
    """
    print(f"Updating Temperature to {temperature}")
    agent.model.update_config(temperature=temperature)
    return f"Temperature updated to {temperature}"

Using Different Models

Ollama supports many different models. You can switch between them (make sure they are pulled first). See the list of available models here: https://ollama.com/search

# Create models for different use cases
creative_model = OllamaModel(
    host="http://localhost:11434",
    model_id="llama3.1",
    temperature=0.8
)

factual_model = OllamaModel(
    host="http://localhost:11434",
    model_id="mistral",
    temperature=0.2
)

# Create agents with different models
creative_agent = Agent(model=creative_model)
factual_agent = Agent(model=factual_model)

Structured Output

Ollama supports structured output for models that have tool calling capabilities. When you pass structured_output_model parameter to the agent invocation, the Strands SDK converts your Pydantic models to tool specifications that compatible Ollama models can understand.

from pydantic import BaseModel, Field
from strands import Agent
from strands.models.ollama import OllamaModel

# 1) Define the Pydantic model
class BookAnalysis(BaseModel):
    """Analyze a book's key information."""
    title: str = Field(description="The book's title")
    author: str = Field(description="The book's author")
    genre: str = Field(description="Primary genre or category")
    summary: str = Field(description="Brief summary of the book")
    rating: int = Field(description="Rating from 1-10", ge=1, le=10)

# 2) Define Ollama Model and Prompt
ollama_model = OllamaModel(
    host="http://localhost:11434",
    model_id="llama3.1",
)

prompt = """
    Analyze this book: "The Hitchhiker's Guide to the Galaxy" by Douglas Adams.
    It's a science fiction comedy about Arthur Dent's adventures through space
    after Earth is destroyed. It's widely considered a classic of humorous sci-fi with good ratings.
    """

# 3) Pass the model to the agent
agent = Agent(model=ollama_model)
result = agent(prompt, structured_output_model=BookAnalysis)

# 4) Access the `structured_output` from the result
result_info: BookAnalysis = result.structured_output
print(f"Title: {result_info.title}")
print(f"Author: {result_info.author}")
print(f"Genre: {result_info.genre}")
print(f"Rating: {result_info.rating}")

Tool Support

Ollama models that support tool use can use tools through Strands’ tool system:

from strands import Agent
from strands.models.ollama import OllamaModel
from strands_tools import calculator, current_time

# Create an Ollama model
ollama_model = OllamaModel(
    host="http://localhost:11434",
    model_id="llama3.1"
)

# Create an agent with tools
agent = Agent(
    model=ollama_model,
    tools=[calculator, current_time]
)

# Use the agent with tools
response = agent("What's the square root of 144 plus the current time?")

Troubleshooting

Common Issues

Connection Refused:
- Ensure the Ollama server is running (ollama serve or check Docker container status)
- Verify the host URL is correct
- For Docker: Check if port 11434 is properly exposed
Model Not Found:
- Pull the model first: ollama pull model_name or docker exec -it ollama ollama pull model_name
- Check for typos in the model_id
Module Not Found:
- If you encounter the error ModuleNotFoundError: No module named 'ollama', this means you haven’t installed the ollama dependency in your python environment
- To fix, run pip install 'strands-agents[ollama]'