Can Claude Code run on models other than Claude?

Yes. Claude Code reads two environment variables — ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN — so any provider that exposes an Anthropic-compatible endpoint (DeepSeek, GLM, Kimi and others) can be pointed to directly. For OpenAI-style providers, a small proxy like Claude Code Router translates the requests.

Is it cheaper to use a coding plan or pay per token?

It depends on volume. DeepSeek V4 Flash and Xiaomi MiMo V2.5 publish very low pay-per-token rates, while GLM's Coding Plan starts at $18/month and Alibaba's Coding Plan is $50/month with 6,000 requests per 5 hours and 90,000 per month. Pay per token is better for light use; plans are better for heavy daily coding.

Do cheap models match Claude's quality in Claude Code?

For most everyday coding, the top Chinese models get close, and the agent workflow (file edits, running commands, multi-step tasks) is identical because it comes from Claude Code itself. Expect some gap on the hardest reasoning and very long context tasks.

What's the catch with coding plans?

Most flat plans cap usage in rolling windows — often a number of prompts every 5 hours, similar to Anthropic's own limits. Heavy bursts can hit those caps. Check each plan's per-window quota before committing.

Does this work on Windows or do I need WSL?

Both work. Native Windows uses PowerShell environment variables; WSL behaves like a normal Linux setup. WSL tends to be smoother for proxies and shell scripts, which is why many guides use it.

Use Cheap AI Models With Claude Code (2026)

Claude Code is a great agent and a fast way to burn through an API bill. The interface — reading your repo, editing files, running commands, working through multi-step tasks — is the valuable part, and it turns out you don’t have to pay Anthropic’s per-token rate to keep it. Claude Code can talk to cheaper models like DeepSeek, GLM, Kimi, and Qwen, often at a fraction of the cost, with the exact same workflow.

This guide is the map for the whole topic: the three ways to connect a different model, how pricing and coding plans compare, and the trade-offs nobody mentions. The per-model and per-tool walkthroughs link out from here.

If you’re setting this up on Windows, a WSL environment makes the proxy and shell steps painless — see the WSL install guide first.

The three ways to connect a cheaper model

Every method below ends with Claude Code running normally; only the backend changes.

1. Point Claude Code straight at an Anthropic-compatible endpoint

This is the cleanest option and needs no extra software. Claude Code respects two environment variables:

export ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-your-key"
claude

Providers that ship an Anthropic-compatible endpoint — DeepSeek, Z.AI’s GLM, and Moonshot’s Kimi among them — work this way with no proxy. See run DeepSeek V4 with Claude Code, run GLM-5 with Claude Code, and run Kimi K2 with Claude Code.

2. Use Claude Code Router for OpenAI-style providers

Many models only speak the OpenAI format. Claude Code Router (CCR) is a small open-source proxy that sits between Claude Code and any provider, translating requests and even routing different task types to different models.

npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router
ccr code

3. Route through OpenRouter

OpenRouter is a paid aggregator that exposes hundreds of models behind one key and one endpoint. Slightly more expensive than going direct, but you get failover and one bill.

Pricing vs coding plans

Two billing models dominate, and several vendors offer both.

The snapshot below was checked against official provider pages on June 9, 2026. Prices move, but the article should still show the numbers instead of making you hunt them down.

Pricing snapshot

Official cheap-model rates for Claude Code backends

6 providers

#1 Lowest PAYG

DeepSeek

deepseek-v4-flash: cache-hit input $0.0028, cache-miss input $0.14, output $0.28 per 1M tokens. deepseek-v4-pro: cache-hit input $0.003625, cache-miss input $0.435, output $0.87 per 1M tokens. Both list 1M context and 384K max output.

Official pricing

DeepSeek pricing

Why it is best for this use: It is the cheapest serious direct Anthropic-compatible route I found, so light and bursty Claude Code users can avoid a monthly plan.

Use the Anthropic endpoint directly: https://api.deepseek.com/anthropic.

#2 Cheap plan + PAYG

Z.AI GLM

GLM-5.1 is $1.40 input, $0.26 cached input, $4.40 output per 1M tokens. GLM-5 is $1 / $0.20 / $3.20. GLM-4.7-FlashX is much cheaper at $0.07 / $0.01 / $0.40, and GLM-4.7-Flash / GLM-4.5-Flash are listed as free.

Official pricing

Z.AI pricing Claude Code setup

Why it is best for this use: The Coding Plan starts at $18/month in Z.AI's developer docs, while PAYG gives you very cheap Flash models for simpler agent turns.

Z.AI's Claude Code docs map Claude models to GLM models through an Anthropic-compatible endpoint.

#3 Big coding plan

Alibaba Qwen

qwen3-coder-next international PAYG: $0.30 input / $1.50 output up to 32K, $0.50 / $2.50 from 32K-128K, and $0.80 / $4.00 from 128K-256K. Alibaba Coding Plan Pro is $50/month with 6,000 requests per 5 hours, 45,000 per week, and 90,000 per month.

Official pricing

Model pricing Coding Plan

Why it is best for this use: Best when you want a large request allowance and Qwen-Coder in coding tools rather than paying every token separately.

The Coding Plan uses its own API key and base URL, separate from normal DashScope PAYG keys.

#4 Reasoning PAYG

Moonshot Kimi

kimi-k2.6: cache-hit input $0.16, cache-miss input $0.95, output $4.00 per 1M tokens, with a 262,144-token context window. kimi-k2.5 is cheaper at $0.10 / $0.60 / $3.00 per 1M tokens.

Official pricing

Kimi K2.6 pricing Kimi K2.5 pricing

Why it is best for this use: It costs more than DeepSeek or MiMo, but it is a strong pay-per-token reasoning option for harder agent tasks.

Kimi is OpenAI-compatible, so use Claude Code Router or another OpenAI-style proxy when Claude Code needs Anthropic format.

#5 Agent PAYG + plan

MiniMax

MiniMax-M3 standard PAYG is $0.30 input, $1.20 output, and $0.06 prompt-cache read per 1M tokens up to 512K input; above 512K it is $0.60 / $2.40 / $0.12. MiniMax-M2.7 is $0.30 input, $1.20 output, $0.06 cache read, and $0.375 cache write. Token Plan tiers are $20, $50, and $120 per month.

Official pricing

PAYG pricing Token Plan

Why it is best for this use: Good if you specifically want MiniMax's agent models and want the choice between normal PAYG and quota-style subscription plans.

The MiniMax Token Plan uses 5-hour rolling and weekly windows, so compare burst limits before subscribing.

#6 1M context value

Xiaomi MiMo

MiMo-V2.5: cache-hit input $0.0028, cache-miss input $0.14, output $0.28 per 1M tokens. MiMo-V2.5-Pro: cache-hit input $0.0036, cache-miss input $0.435, output $0.87 per 1M tokens. Xiaomi's May 27, 2026 update says old MiMo-V2-Pro / Omni names auto-route to V2.5 pricing from June 1.

Official pricing

PAYG pricing Price update

Why it is best for this use: It now matches the ultra-low DeepSeek-style price band while keeping Xiaomi's big-context coding angle.

MiMo's token plan also advertises 20% off off-peak usage and credit-reset/subscription discounts, so check whether PAYG or token-plan billing fits your volume.

Pay-per-token suits light or unpredictable use: DeepSeek V4 Flash and MiMo V2.5 are both in the $0.14 input / $0.28 output per 1M cache-miss/output range, with tiny cache-hit rates. Coding plans are better when you code heavily every day and would otherwise exceed the monthly token cost, but they cap usage in windows.

Pricing on these models changes often: promotional rates, new versions, off-peak discounts, and plan quotas shift. Treat the numbers above as a checked snapshot, then confirm the linked official page before you buy credits or subscribe.

Which model should you start with

A rough guide for someone moving off Claude’s native pricing:

Cheapest serious option: DeepSeek. Pure pay-per-token, very low rates, Anthropic endpoint. See DeepSeek V4 pricing explained.
Best flat-rate value: Z.AI’s GLM Coding Plan if you code daily. See GLM Coding Plan setup.
Strong agentic coding: Kimi K2.6 or MiniMax M3. See MiniMax vs DeepSeek vs GLM.
Big context on a budget: MiMo V2.5 or DeepSeek V4 Flash for 1M context; Qwen3-Coder when code specialization matters. See run Qwen3-Coder with Claude Code.

Not just Claude Code

The same models plug into other agents too. If you’d rather not use Claude Code at all, Codex CLI, OpenCode, and Aider all accept custom providers. There’s a full comparison in Claude Code vs Codex vs OpenCode vs Aider.

Before you switch

Pick a provider and create an API key
Decide pay-per-token vs coding plan based on your volume
Choose method 1 (Anthropic endpoint) if available, else CCR
Test on a small task before trusting it with a big repo
Note the provider's rate limits and discount windows

Wrapping up

Running Claude Code on a cheaper model is mostly a config change: set two environment variables for Anthropic-compatible providers, or drop Claude Code Router in front for OpenAI-style ones. Pricing splits into pay-per-token (best for light use) and flat coding plans (best for heavy use, but capped per window), and several vendors offer both.

Start with the model that matches your budget and workload, test it on something small, and keep the official pricing page bookmarked since these rates move fast. From here, follow the per-model guides to get the exact endpoint and config.

How to Use Cheap AI Models With Claude Code (2026 Guide)

The three ways to connect a cheaper model

1. Point Claude Code straight at an Anthropic-compatible endpoint