Skip to content

How to Use Cheap AI Models With Claude Code (2026 Guide)

Run Claude Code on cheaper models like DeepSeek, GLM, Kimi, and Qwen. The three connection methods, pricing and coding plans compared, and how to set it up on Windows.

MGMCSA Guru Team June 8, 2026 8 min read
Claude Code terminal on Windows routed to cheaper DeepSeek, GLM and Kimi models

Claude Code is a great agent and a fast way to burn through an API bill. The interface — reading your repo, editing files, running commands, working through multi-step tasks — is the valuable part, and it turns out you don’t have to pay Anthropic’s per-token rate to keep it. Claude Code can talk to cheaper models like DeepSeek, GLM, Kimi, and Qwen, often at a fraction of the cost, with the exact same workflow.

This guide is the map for the whole topic: the three ways to connect a different model, how pricing and coding plans compare, and the trade-offs nobody mentions. The per-model and per-tool walkthroughs link out from here.

If you’re setting this up on Windows, a WSL environment makes the proxy and shell steps painless — see the WSL install guide first.

The three ways to connect a cheaper model

Every method below ends with Claude Code running normally; only the backend changes.

1. Point Claude Code straight at an Anthropic-compatible endpoint

This is the cleanest option and needs no extra software. Claude Code respects two environment variables:

export ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-your-key"
claude

Providers that ship an Anthropic-compatible endpoint — DeepSeek, Z.AI’s GLM, and Moonshot’s Kimi among them — work this way with no proxy. See run DeepSeek V4 with Claude Code, run GLM-5 with Claude Code, and run Kimi K2 with Claude Code.

2. Use Claude Code Router for OpenAI-style providers

Many models only speak the OpenAI format. Claude Code Router (CCR) is a small open-source proxy that sits between Claude Code and any provider, translating requests and even routing different task types to different models.

npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router
ccr code

3. Route through OpenRouter

OpenRouter is a paid aggregator that exposes hundreds of models behind one key and one endpoint. Slightly more expensive than going direct, but you get failover and one bill.

Pricing vs coding plans

Two billing models dominate, and several vendors offer both.

The snapshot below was checked against official provider pages on June 9, 2026. Prices move, but the article should still show the numbers instead of making you hunt them down.

Pricing snapshot

Official cheap-model rates for Claude Code backends

6 providers

#1 Lowest PAYG

DeepSeek

deepseek-v4-flash: cache-hit input $0.0028, cache-miss input $0.14, output $0.28 per 1M tokens. deepseek-v4-pro: cache-hit input $0.003625, cache-miss input $0.435, output $0.87 per 1M tokens. Both list 1M context and 384K max output.

Official pricing

Why it is best for this use: It is the cheapest serious direct Anthropic-compatible route I found, so light and bursty Claude Code users can avoid a monthly plan.

Use the Anthropic endpoint directly: https://api.deepseek.com/anthropic.

#2 Cheap plan + PAYG

Z.AI GLM

GLM-5.1 is $1.40 input, $0.26 cached input, $4.40 output per 1M tokens. GLM-5 is $1 / $0.20 / $3.20. GLM-4.7-FlashX is much cheaper at $0.07 / $0.01 / $0.40, and GLM-4.7-Flash / GLM-4.5-Flash are listed as free.

Why it is best for this use: The Coding Plan starts at $18/month in Z.AI's developer docs, while PAYG gives you very cheap Flash models for simpler agent turns.

Z.AI's Claude Code docs map Claude models to GLM models through an Anthropic-compatible endpoint.

#3 Big coding plan

Alibaba Qwen

qwen3-coder-next international PAYG: $0.30 input / $1.50 output up to 32K, $0.50 / $2.50 from 32K-128K, and $0.80 / $4.00 from 128K-256K. Alibaba Coding Plan Pro is $50/month with 6,000 requests per 5 hours, 45,000 per week, and 90,000 per month.

Why it is best for this use: Best when you want a large request allowance and Qwen-Coder in coding tools rather than paying every token separately.

The Coding Plan uses its own API key and base URL, separate from normal DashScope PAYG keys.

#4 Reasoning PAYG

Moonshot Kimi

kimi-k2.6: cache-hit input $0.16, cache-miss input $0.95, output $4.00 per 1M tokens, with a 262,144-token context window. kimi-k2.5 is cheaper at $0.10 / $0.60 / $3.00 per 1M tokens.

Why it is best for this use: It costs more than DeepSeek or MiMo, but it is a strong pay-per-token reasoning option for harder agent tasks.

Kimi is OpenAI-compatible, so use Claude Code Router or another OpenAI-style proxy when Claude Code needs Anthropic format.

#5 Agent PAYG + plan

MiniMax

MiniMax-M3 standard PAYG is $0.30 input, $1.20 output, and $0.06 prompt-cache read per 1M tokens up to 512K input; above 512K it is $0.60 / $2.40 / $0.12. MiniMax-M2.7 is $0.30 input, $1.20 output, $0.06 cache read, and $0.375 cache write. Token Plan tiers are $20, $50, and $120 per month.

Why it is best for this use: Good if you specifically want MiniMax's agent models and want the choice between normal PAYG and quota-style subscription plans.

The MiniMax Token Plan uses 5-hour rolling and weekly windows, so compare burst limits before subscribing.

#6 1M context value

Xiaomi MiMo

MiMo-V2.5: cache-hit input $0.0028, cache-miss input $0.14, output $0.28 per 1M tokens. MiMo-V2.5-Pro: cache-hit input $0.0036, cache-miss input $0.435, output $0.87 per 1M tokens. Xiaomi's May 27, 2026 update says old MiMo-V2-Pro / Omni names auto-route to V2.5 pricing from June 1.

Why it is best for this use: It now matches the ultra-low DeepSeek-style price band while keeping Xiaomi's big-context coding angle.

MiMo's token plan also advertises 20% off off-peak usage and credit-reset/subscription discounts, so check whether PAYG or token-plan billing fits your volume.

Pay-per-token suits light or unpredictable use: DeepSeek V4 Flash and MiMo V2.5 are both in the $0.14 input / $0.28 output per 1M cache-miss/output range, with tiny cache-hit rates. Coding plans are better when you code heavily every day and would otherwise exceed the monthly token cost, but they cap usage in windows.

Pricing on these models changes often: promotional rates, new versions, off-peak discounts, and plan quotas shift. Treat the numbers above as a checked snapshot, then confirm the linked official page before you buy credits or subscribe.

Which model should you start with

A rough guide for someone moving off Claude’s native pricing:

Not just Claude Code

The same models plug into other agents too. If you’d rather not use Claude Code at all, Codex CLI, OpenCode, and Aider all accept custom providers. There’s a full comparison in Claude Code vs Codex vs OpenCode vs Aider.

Before you switch

  • Pick a provider and create an API key
  • Decide pay-per-token vs coding plan based on your volume
  • Choose method 1 (Anthropic endpoint) if available, else CCR
  • Test on a small task before trusting it with a big repo
  • Note the provider's rate limits and discount windows

Wrapping up

Running Claude Code on a cheaper model is mostly a config change: set two environment variables for Anthropic-compatible providers, or drop Claude Code Router in front for OpenAI-style ones. Pricing splits into pay-per-token (best for light use) and flat coding plans (best for heavy use, but capped per window), and several vendors offer both.

Start with the model that matches your budget and workload, test it on something small, and keep the official pricing page bookmarked since these rates move fast. From here, follow the per-model guides to get the exact endpoint and config.

Frequently asked questions

Can Claude Code run on models other than Claude?

Yes. Claude Code reads two environment variables — ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN — so any provider that exposes an Anthropic-compatible endpoint (DeepSeek, GLM, Kimi and others) can be pointed to directly. For OpenAI-style providers, a small proxy like Claude Code Router translates the requests.

Is it cheaper to use a coding plan or pay per token?

It depends on volume. DeepSeek V4 Flash and Xiaomi MiMo V2.5 publish very low pay-per-token rates, while GLM's Coding Plan starts at $18/month and Alibaba's Coding Plan is $50/month with 6,000 requests per 5 hours and 90,000 per month. Pay per token is better for light use; plans are better for heavy daily coding.

Do cheap models match Claude's quality in Claude Code?

For most everyday coding, the top Chinese models get close, and the agent workflow (file edits, running commands, multi-step tasks) is identical because it comes from Claude Code itself. Expect some gap on the hardest reasoning and very long context tasks.

What's the catch with coding plans?

Most flat plans cap usage in rolling windows — often a number of prompts every 5 hours, similar to Anthropic's own limits. Heavy bursts can hit those caps. Check each plan's per-window quota before committing.

Does this work on Windows or do I need WSL?

Both work. Native Windows uses PowerShell environment variables; WSL behaves like a normal Linux setup. WSL tends to be smoother for proxies and shell scripts, which is why many guides use it.

Sources & further reading

Official vendor documentation referenced while writing this guide.

MG

MCSA Guru Team

IT & Systems Administration

We are working IT pros and system administrators who spend our days in Windows Server, Microsoft 365, and the wider Microsoft stack. MCSA Guru is where we write down the fixes and walkthroughs we wish we had found the first time.

MCSA Guru provides independent, educational IT guidance. Microsoft, Windows, Windows Server, Microsoft 365, Exchange, and Microsoft Teams are trademarks of Microsoft Corporation; Docker is a trademark of Docker, Inc. MCSA Guru is not affiliated with or endorsed by Microsoft or Docker. Always test changes in a safe environment before applying them in production.

Related guides

Fixing something right now?

Jump straight into the guide library or search for the exact error or task you are dealing with.