Claude Code is a great agent and a fast way to burn through an API bill. The interface — reading your repo, editing files, running commands, working through multi-step tasks — is the valuable part, and it turns out you don’t have to pay Anthropic’s per-token rate to keep it. Claude Code can talk to cheaper models like DeepSeek, GLM, Kimi, and Qwen, often at a fraction of the cost, with the exact same workflow.
This guide is the map for the whole topic: the three ways to connect a different model, how pricing and coding plans compare, and the trade-offs nobody mentions. The per-model and per-tool walkthroughs link out from here.
If you’re setting this up on Windows, a WSL environment makes the proxy and shell steps painless — see the WSL install guide first.
The three ways to connect a cheaper model
Every method below ends with Claude Code running normally; only the backend changes.
1. Point Claude Code straight at an Anthropic-compatible endpoint
This is the cleanest option and needs no extra software. Claude Code respects two environment variables:
export ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-your-key"
claude
Providers that ship an Anthropic-compatible endpoint — DeepSeek, Z.AI’s GLM, and Moonshot’s Kimi among them — work this way with no proxy. See run DeepSeek V4 with Claude Code, run GLM-5 with Claude Code, and run Kimi K2 with Claude Code.
2. Use Claude Code Router for OpenAI-style providers
Many models only speak the OpenAI format. Claude Code Router (CCR) is a small open-source proxy that sits between Claude Code and any provider, translating requests and even routing different task types to different models.
npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router
ccr code
3. Route through OpenRouter
OpenRouter is a paid aggregator that exposes hundreds of models behind one key and one endpoint. Slightly more expensive than going direct, but you get failover and one bill.
Pricing vs coding plans
Two billing models dominate, and several vendors offer both.
How the cheap providers bill (check official pages for current rates)
| DeepSeek | Pay-per-token only; off-peak discount window |
|---|---|
| Z.AI (GLM) | Coding plan from ~$10/mo + pay-per-token |
| Alibaba (Qwen) | Coding plan ~$50/mo (90k req) + pay-per-token |
| Moonshot (Kimi) | Pay-per-token; cache-hit discounts |
| MiniMax | Pay-per-token; agent-oriented pricing |
| Xiaomi MiMo | Flat pay-per-token, no long-context surcharge |
Pay-per-token suits light or unpredictable use — you only pay for what you run, and the cheapest models cost cents per million tokens. Coding plans are flat monthly subscriptions that work out cheaper for heavy daily coding, but they cap usage.
Pricing on these models changes often — promotional rates, new versions, and discount windows shift monthly. Treat any number you read (here or elsewhere) as a snapshot and confirm on the provider’s official pricing page before you commit.
Which model should you start with
A rough guide for someone moving off Claude’s native pricing:
- Cheapest serious option: DeepSeek. Pure pay-per-token, very low rates, Anthropic endpoint. See DeepSeek V4 pricing explained.
- Best flat-rate value: Z.AI’s GLM Coding Plan if you code daily. See GLM Coding Plan setup.
- Strong agentic coding: Kimi K2 Thinking or MiniMax M2.5. See MiniMax vs DeepSeek vs GLM.
- Big context on a budget: Qwen3-Coder or MiMo (1M context). See run Qwen3-Coder with Claude Code.
Not just Claude Code
The same models plug into other agents too. If you’d rather not use Claude Code at all, Codex CLI, OpenCode, and Aider all accept custom providers. There’s a full comparison in Claude Code vs Codex vs OpenCode vs Aider.
Before you switch
- Pick a provider and create an API key
- Decide pay-per-token vs coding plan based on your volume
- Choose method 1 (Anthropic endpoint) if available, else CCR
- Test on a small task before trusting it with a big repo
- Note the provider's rate limits and discount windows
Wrapping up
Running Claude Code on a cheaper model is mostly a config change: set two environment variables for Anthropic-compatible providers, or drop Claude Code Router in front for OpenAI-style ones. Pricing splits into pay-per-token (best for light use) and flat coding plans (best for heavy use, but capped per window), and several vendors offer both.
Start with the model that matches your budget and workload, test it on something small, and keep the official pricing page bookmarked since these rates move fast. From here, follow the per-model guides to get the exact endpoint and config.