The most private way to run OpenClaw is to keep the brain on your own machine. With Ollama serving a local model, OpenClaw makes no API calls for inference, costs nothing per use, and keeps your prompts and files off anyone else’s servers. For a personal assistant that touches your files and messages, that combination of free and private is hard to argue with.
The honest trade is performance: a model small enough to run on your hardware won’t match the best hosted models, and responses are slower. This guide covers the setup, realistic model and hardware choices, and exactly where local falls short so you can decide if it fits. OpenClaw should already be installed — see the install guide if not.
What “local” gets you, and what it doesn’t
Two things are true at once. The model runs on your machine, so inference is free and private. But skills that reach out — web search, sending a message through a channel — still use the network, because that’s their job. “Local” means the thinking is local, not that the assistant is sealed off from the internet entirely.
Step 1: install Ollama and pull a model
Install Ollama from ollama.com, then pull a model. Inside WSL or your Linux/macOS shell:
ollama pull qwen2.5-coder
ollama run qwen2.5-coder "hello"
The second command is a quick sanity check that the model loads and responds. Pick a model size your hardware can handle — more on that below. Ollama serves an OpenAI-compatible endpoint locally, usually at http://localhost:11434, which is what OpenClaw will connect to.
Step 2: point OpenClaw at the local endpoint
Configure OpenClaw’s model provider to use Ollama’s local endpoint. No API key is needed for local Ollama (a placeholder value is fine where one is required). Confirm the current config keys in the OpenClaw repo; the values look like:
{
"model": {
"provider": "openai-compatible",
"base_url": "http://localhost:11434/v1",
"api_key": "ollama",
"model": "qwen2.5-coder"
}
}
The base URL points at your own machine, the model name matches what you pulled with Ollama, and the key is a throwaway placeholder since nothing’s being authenticated. That’s the whole connection.
Step 3: the hardware reality
This is where expectations need calibrating. Local model speed depends on your hardware:
Rough hardware expectations
| Setup | What to expect |
|---|---|
| Modern laptop CPU, small model | Works, but slow — fine for occasional tasks |
| GPU with modest VRAM, small/mid model | Comfortable for everyday assistant use |
| GPU with ample VRAM, larger model | Best local quality, still below top cloud models |
There’s no shame in running a small model. An assistant that answers in two seconds on a 7-billion-parameter coder model beats one that takes half a minute on something larger. Test on your actual machine and let speed guide the size.
Step 4: test it on a real task
Start OpenClaw and give it something that uses the model and a local capability:
Read the files in this folder and write a one-paragraph summary of the project.
If it reads the files and produces a reasonable summary at a speed you can live with, your local setup works. If responses crawl, drop to a smaller model — that’s almost always the fix.
OpenClaw + Ollama checklist
- Ollama installed and a model pulled
- Quick ollama run test passes
- OpenClaw provider set to http://localhost:11434/v1
- Model name in config matches the pulled model
- Model size matched to your hardware for responsive replies
- Tested on a real file task
Where local falls short
Be clear-eyed about the limits so you’re not disappointed:
- Hard reasoning and long context. Consumer-sized local models lag the best hosted ones here. Complex, multi-step tasks may stumble where a cloud model wouldn’t.
- Speed. Even on a good GPU, local inference is usually slower than a hosted API.
- The biggest models are out of reach. The frontier-quality models don’t fit on typical consumer hardware.
The practical move many people make: run local as the default for privacy and zero cost, and keep a cheap cloud option like DeepSeek configured for the occasional task that needs more horsepower. Because OpenClaw is model-agnostic, switching is just a config change.
Wrapping up
Running OpenClaw on Ollama gives you a free, private assistant whose thinking never leaves your machine — install Ollama, pull a model that fits your hardware, point OpenClaw at the local endpoint, and you’re running with no API bill. The trade is performance: pick a model your GPU can keep fast, and accept that the hardest tasks still favor a cloud model.
If you hit the limits of local, the lowest-cost cloud option is the DeepSeek setup, while GLM is the flat-plan route for heavier use.