Local Models
Local models are useful for privacy, offline-ish workflows, and cost control. They also make it easier to iterate on agent workflows without worrying about API outages or rate limits.
Recommended approach (pragmatic)
- Start with one local runtime that is easy to operate.
- Keep one remote provider configured as a fallback for hard tasks.
- Verify end-to-end (model list → first reply → tool calls) before adding complexity.
Pick a local runtime
Ollama (recommended first)
Ollama is the easiest way to run tool-capable open models locally.
- Setup guide: Ollama
- Good default for “local-first with streaming + tools”.
vLLM (self-hosted serving)
If you already run GPU inference servers, vLLM is a common choice:
- Setup guide: vLLM
Quick start (Ollama)
- Install Ollama and pull a model:
ollama pull llama3.3- Enable Ollama in OpenClaw (any value works; Ollama doesn’t require a real key):
export OLLAMA_API_KEY="ollama-local"- Set a local default model:
{
agents: {
defaults: {
model: { primary: "ollama/llama3.3" }
}
}
}- Verify:
ollama list
openclaw models listPattern: local primary + remote fallback
Use local models for most turns, and fall back to a remote provider when the task is too hard or the local model is slow:
{
agents: {
defaults: {
model: {
primary: "ollama/llama3.3",
fallbacks: ["anthropic/claude-opus-4-6"]
}
}
}
}See Failover for more patterns.
Notes
- Local models still need a good sandbox/tool policy posture; “local” is not the same as “safe”.
- For the full provider catalog and provider-specific auth flows, see Model providers.