Local Models

Local models are useful for privacy, offline-ish workflows, and cost control. They also make it easier to iterate on agent workflows without worrying about API outages or rate limits.

Recommended approach (pragmatic)

Start with one local runtime that is easy to operate.
Keep one remote provider configured as a fallback for hard tasks.
Verify end-to-end (model list → first reply → tool calls) before adding complexity.

Pick a local runtime

Ollama (recommended first)

Ollama is the easiest way to run tool-capable open models locally.

Setup guide: Ollama
Good default for “local-first with streaming + tools”.

vLLM (self-hosted serving)

If you already run GPU inference servers, vLLM is a common choice:

Setup guide: vLLM

Quick start (Ollama)

Install Ollama and pull a model:

ollama pull llama3.3

Enable Ollama in OpenClaw (any value works; Ollama doesn’t require a real key):

export OLLAMA_API_KEY="ollama-local"

Set a local default model:

{
  agents: {
    defaults: {
      model: { primary: "ollama/llama3.3" }
    }
  }
}

Verify:

ollama list
openclaw models list

Pattern: local primary + remote fallback

Use local models for most turns, and fall back to a remote provider when the task is too hard or the local model is slow:

{
  agents: {
    defaults: {
      model: {
        primary: "ollama/llama3.3",
        fallbacks: ["anthropic/claude-opus-4-6"]
      }
    }
  }
}

See Failover for more patterns.

Notes

Local models still need a good sandbox/tool policy posture; “local” is not the same as “safe”.
For the full provider catalog and provider-specific auth flows, see Model providers.

LiteLLM MiniMax