Claude Code is a great agent and a fast way to burn through an API bill. The interface — reading your repo, editing files, running commands, working through multi-step tasks — is the valuable part, and it turns out you don’t have to pay Anthropic’s per-token rate to keep it. Claude Code can talk to cheaper models like DeepSeek, GLM, Kimi, and Qwen, often at a fraction of the cost, with the exact same workflow.
This guide is the map for the whole topic: the three ways to connect a different model, how pricing and coding plans compare, and the trade-offs nobody mentions. The per-model and per-tool walkthroughs link out from here.
If you’re setting this up on Windows, a WSL environment makes the proxy and shell steps painless — see the WSL install guide first.
The three ways to connect a cheaper model
Every method below ends with Claude Code running normally; only the backend changes.
1. Point Claude Code straight at an Anthropic-compatible endpoint
This is the cleanest option and needs no extra software. Claude Code respects two environment variables:
export ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-your-key"
claude
Providers that ship an Anthropic-compatible endpoint — DeepSeek, Z.AI’s GLM, and Moonshot’s Kimi among them — work this way with no proxy. See run DeepSeek V4 with Claude Code, run GLM-5 with Claude Code, and run Kimi K2 with Claude Code.
2. Use Claude Code Router for OpenAI-style providers
Many models only speak the OpenAI format. Claude Code Router (CCR) is a small open-source proxy that sits between Claude Code and any provider, translating requests and even routing different task types to different models.
npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router
ccr code
3. Route through OpenRouter
OpenRouter is a paid aggregator that exposes hundreds of models behind one key and one endpoint. Slightly more expensive than going direct, but you get failover and one bill.
Pricing vs coding plans
Two billing models dominate, and several vendors offer both.
The snapshot below was checked against official provider pages on June 9, 2026. Prices move, but the article should still show the numbers instead of making you hunt them down.
Pricing snapshot
Official cheap-model rates for Claude Code backends
6 providers
DeepSeek
deepseek-v4-flash: cache-hit input $0.0028, cache-miss input $0.14, output $0.28 per 1M tokens. deepseek-v4-pro: cache-hit input $0.003625, cache-miss input $0.435, output $0.87 per 1M tokens. Both list 1M context and 384K max output.
Official pricing
Why it is best for this use: It is the cheapest serious direct Anthropic-compatible route I found, so light and bursty Claude Code users can avoid a monthly plan.
Use the Anthropic endpoint directly: https://api.deepseek.com/anthropic.
Z.AI GLM
GLM-5.1 is $1.40 input, $0.26 cached input, $4.40 output per 1M tokens. GLM-5 is $1 / $0.20 / $3.20. GLM-4.7-FlashX is much cheaper at $0.07 / $0.01 / $0.40, and GLM-4.7-Flash / GLM-4.5-Flash are listed as free.
Official pricing
Why it is best for this use: The Coding Plan starts at $18/month in Z.AI's developer docs, while PAYG gives you very cheap Flash models for simpler agent turns.
Z.AI's Claude Code docs map Claude models to GLM models through an Anthropic-compatible endpoint.
Alibaba Qwen
qwen3-coder-next international PAYG: $0.30 input / $1.50 output up to 32K, $0.50 / $2.50 from 32K-128K, and $0.80 / $4.00 from 128K-256K. Alibaba Coding Plan Pro is $50/month with 6,000 requests per 5 hours, 45,000 per week, and 90,000 per month.
Official pricing
Why it is best for this use: Best when you want a large request allowance and Qwen-Coder in coding tools rather than paying every token separately.
The Coding Plan uses its own API key and base URL, separate from normal DashScope PAYG keys.
Moonshot Kimi
kimi-k2.6: cache-hit input $0.16, cache-miss input $0.95, output $4.00 per 1M tokens, with a 262,144-token context window. kimi-k2.5 is cheaper at $0.10 / $0.60 / $3.00 per 1M tokens.
Official pricing
Why it is best for this use: It costs more than DeepSeek or MiMo, but it is a strong pay-per-token reasoning option for harder agent tasks.
Kimi is OpenAI-compatible, so use Claude Code Router or another OpenAI-style proxy when Claude Code needs Anthropic format.
MiniMax
MiniMax-M3 standard PAYG is $0.30 input, $1.20 output, and $0.06 prompt-cache read per 1M tokens up to 512K input; above 512K it is $0.60 / $2.40 / $0.12. MiniMax-M2.7 is $0.30 input, $1.20 output, $0.06 cache read, and $0.375 cache write. Token Plan tiers are $20, $50, and $120 per month.
Official pricing
Why it is best for this use: Good if you specifically want MiniMax's agent models and want the choice between normal PAYG and quota-style subscription plans.
The MiniMax Token Plan uses 5-hour rolling and weekly windows, so compare burst limits before subscribing.
Xiaomi MiMo
MiMo-V2.5: cache-hit input $0.0028, cache-miss input $0.14, output $0.28 per 1M tokens. MiMo-V2.5-Pro: cache-hit input $0.0036, cache-miss input $0.435, output $0.87 per 1M tokens. Xiaomi's May 27, 2026 update says old MiMo-V2-Pro / Omni names auto-route to V2.5 pricing from June 1.
Official pricing
Why it is best for this use: It now matches the ultra-low DeepSeek-style price band while keeping Xiaomi's big-context coding angle.
MiMo's token plan also advertises 20% off off-peak usage and credit-reset/subscription discounts, so check whether PAYG or token-plan billing fits your volume.
Pay-per-token suits light or unpredictable use: DeepSeek V4 Flash and MiMo V2.5 are both in the $0.14 input / $0.28 output per 1M cache-miss/output range, with tiny cache-hit rates. Coding plans are better when you code heavily every day and would otherwise exceed the monthly token cost, but they cap usage in windows.
Pricing on these models changes often: promotional rates, new versions, off-peak discounts, and plan quotas shift. Treat the numbers above as a checked snapshot, then confirm the linked official page before you buy credits or subscribe.
Which model should you start with
A rough guide for someone moving off Claude’s native pricing:
- Cheapest serious option: DeepSeek. Pure pay-per-token, very low rates, Anthropic endpoint. See DeepSeek V4 pricing explained.
- Best flat-rate value: Z.AI’s GLM Coding Plan if you code daily. See GLM Coding Plan setup.
- Strong agentic coding: Kimi K2.6 or MiniMax M3. See MiniMax vs DeepSeek vs GLM.
- Big context on a budget: MiMo V2.5 or DeepSeek V4 Flash for 1M context; Qwen3-Coder when code specialization matters. See run Qwen3-Coder with Claude Code.
Not just Claude Code
The same models plug into other agents too. If you’d rather not use Claude Code at all, Codex CLI, OpenCode, and Aider all accept custom providers. There’s a full comparison in Claude Code vs Codex vs OpenCode vs Aider.
Before you switch
- Pick a provider and create an API key
- Decide pay-per-token vs coding plan based on your volume
- Choose method 1 (Anthropic endpoint) if available, else CCR
- Test on a small task before trusting it with a big repo
- Note the provider's rate limits and discount windows
Wrapping up
Running Claude Code on a cheaper model is mostly a config change: set two environment variables for Anthropic-compatible providers, or drop Claude Code Router in front for OpenAI-style ones. Pricing splits into pay-per-token (best for light use) and flat coding plans (best for heavy use, but capped per window), and several vendors offer both.
Start with the model that matches your budget and workload, test it on something small, and keep the official pricing page bookmarked since these rates move fast. From here, follow the per-model guides to get the exact endpoint and config.