Most cost-cutting guides optimize for price. This one optimizes for speed. Cerebras serves models on specialized hardware that pushes very high tokens-per-second, and running the large open Qwen3-Coder-480B there makes an agent feel near-instant — the iterative edit-run-fix loop stops waiting on the model. Pair it with OpenCode, which takes any OpenAI-compatible provider, and you get a fast, capable coding agent.
This is the OpenCode + Cerebras setup. For OpenCode’s general provider model, see OpenCode custom providers.
Why Cerebras for an agent
Agentic coding is a tight loop: the model reads, proposes an edit, you run it, it reads the result, repeats. The slower the model, the more you wait at each step. Cerebras’s high throughput collapses that wait, which matters more for agent ergonomics than for a one-shot chat. Qwen3-Coder-480B is a strong open model for the job, so the combination is fast and capable.
Step 1: Install OpenCode and get a Cerebras key
npm install -g opencode-ai
Create a Cerebras account and generate an API key. On Windows, WSL gives the cleaner shell — see install OpenCode on Windows with WSL.
Step 2: Add the Cerebras provider
In opencode.json, define Cerebras as an OpenAI-compatible provider:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"cerebras": {
"npm": "@ai-sdk/openai-compatible",
"name": "Cerebras",
"options": {
"baseURL": "https://api.cerebras.ai/v1",
"apiKey": "{env:CEREBRAS_API_KEY}"
},
"models": {
"qwen-3-coder-480b": { "name": "Qwen3 Coder 480B" }
}
}
},
"model": "cerebras/qwen-3-coder-480b"
}
Export the key:
export CEREBRAS_API_KEY="csk-your-key"
Step 3: Run it
opencode
The default is cerebras/qwen-3-coder-480b. Start a task and you’ll notice the responses arrive fast — that’s the point of this setup.
Speed vs price vs context
Where to run Qwen3-Coder
| Cerebras | Fastest tokens/sec; best for snappy agent loops |
|---|---|
| Alibaba DashScope | Cache discounts + coding plan; big context |
| Local (Ollama/LM Studio) | Private, free to run; limited by your hardware |
If raw speed is your priority, Cerebras wins. If cost and context size matter more, DashScope (with its coding plan) is the better Qwen home. For privacy, run it locally.
Troubleshooting
- “Model not found” — use the exact Cerebras model ID, not the DashScope name.
- Auth errors —
CEREBRAS_API_KEYnot set in the launching shell. - Provider mismatch — the
providerkey (cerebras) must match themodelprefix. - Slower than expected — check you’re actually on the Cerebras endpoint, not a fallback.
Cerebras + OpenCode checklist
- OpenCode installed
- Cerebras API key created and exported
- Provider block with https://api.cerebras.ai/v1
- Exact Cerebras Qwen3-Coder model ID listed
- Default model set; launched with opencode
Wrapping up
Running Qwen3-Coder-480B on Cerebras through OpenCode is the setup to choose when you want an agent that responds fast: add a Cerebras provider with https://api.cerebras.ai/v1, the exact 480B coder model ID, and your key. The high throughput makes the agent loop snappy in a way cheaper, slower endpoints can’t match.
For the cost-first alternatives, see run Qwen3-Coder with Claude Code on DashScope or run Qwen3-Coder locally with LM Studio.