Skip to content

How to Use Cheap AI Models With Claude Code (2026 Guide)

Run Claude Code on cheaper models like DeepSeek, GLM, Kimi, and Qwen. The three connection methods, pricing and coding plans compared, and how to set it up on Windows.

MGMCSA Guru Team June 8, 2026 4 min read
Claude Code terminal on Windows routed to cheaper DeepSeek, GLM and Kimi models

Claude Code is a great agent and a fast way to burn through an API bill. The interface — reading your repo, editing files, running commands, working through multi-step tasks — is the valuable part, and it turns out you don’t have to pay Anthropic’s per-token rate to keep it. Claude Code can talk to cheaper models like DeepSeek, GLM, Kimi, and Qwen, often at a fraction of the cost, with the exact same workflow.

This guide is the map for the whole topic: the three ways to connect a different model, how pricing and coding plans compare, and the trade-offs nobody mentions. The per-model and per-tool walkthroughs link out from here.

If you’re setting this up on Windows, a WSL environment makes the proxy and shell steps painless — see the WSL install guide first.

The three ways to connect a cheaper model

Every method below ends with Claude Code running normally; only the backend changes.

1. Point Claude Code straight at an Anthropic-compatible endpoint

This is the cleanest option and needs no extra software. Claude Code respects two environment variables:

export ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-your-key"
claude

Providers that ship an Anthropic-compatible endpoint — DeepSeek, Z.AI’s GLM, and Moonshot’s Kimi among them — work this way with no proxy. See run DeepSeek V4 with Claude Code, run GLM-5 with Claude Code, and run Kimi K2 with Claude Code.

2. Use Claude Code Router for OpenAI-style providers

Many models only speak the OpenAI format. Claude Code Router (CCR) is a small open-source proxy that sits between Claude Code and any provider, translating requests and even routing different task types to different models.

npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router
ccr code

3. Route through OpenRouter

OpenRouter is a paid aggregator that exposes hundreds of models behind one key and one endpoint. Slightly more expensive than going direct, but you get failover and one bill.

Pricing vs coding plans

Two billing models dominate, and several vendors offer both.

How the cheap providers bill (check official pages for current rates)

DeepSeek Pay-per-token only; off-peak discount window
Z.AI (GLM) Coding plan from ~$10/mo + pay-per-token
Alibaba (Qwen) Coding plan ~$50/mo (90k req) + pay-per-token
Moonshot (Kimi) Pay-per-token; cache-hit discounts
MiniMax Pay-per-token; agent-oriented pricing
Xiaomi MiMo Flat pay-per-token, no long-context surcharge

Pay-per-token suits light or unpredictable use — you only pay for what you run, and the cheapest models cost cents per million tokens. Coding plans are flat monthly subscriptions that work out cheaper for heavy daily coding, but they cap usage.

Pricing on these models changes often — promotional rates, new versions, and discount windows shift monthly. Treat any number you read (here or elsewhere) as a snapshot and confirm on the provider’s official pricing page before you commit.

Which model should you start with

A rough guide for someone moving off Claude’s native pricing:

Not just Claude Code

The same models plug into other agents too. If you’d rather not use Claude Code at all, Codex CLI, OpenCode, and Aider all accept custom providers. There’s a full comparison in Claude Code vs Codex vs OpenCode vs Aider.

Before you switch

  • Pick a provider and create an API key
  • Decide pay-per-token vs coding plan based on your volume
  • Choose method 1 (Anthropic endpoint) if available, else CCR
  • Test on a small task before trusting it with a big repo
  • Note the provider's rate limits and discount windows

Wrapping up

Running Claude Code on a cheaper model is mostly a config change: set two environment variables for Anthropic-compatible providers, or drop Claude Code Router in front for OpenAI-style ones. Pricing splits into pay-per-token (best for light use) and flat coding plans (best for heavy use, but capped per window), and several vendors offer both.

Start with the model that matches your budget and workload, test it on something small, and keep the official pricing page bookmarked since these rates move fast. From here, follow the per-model guides to get the exact endpoint and config.

Frequently asked questions

Can Claude Code run on models other than Claude?

Yes. Claude Code reads two environment variables — ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN — so any provider that exposes an Anthropic-compatible endpoint (DeepSeek, GLM, Kimi and others) can be pointed to directly. For OpenAI-style providers, a small proxy like Claude Code Router translates the requests.

Is it cheaper to use a coding plan or pay per token?

It depends on volume. Flat coding plans (GLM from about $10/month, Alibaba's plan around $50/month) are predictable and usually cheaper for heavy daily use. Pay-per-token is better for light or bursty use. Several providers offer both, so estimate your monthly tokens before choosing.

Do cheap models match Claude's quality in Claude Code?

For most everyday coding, the top Chinese models get close, and the agent workflow (file edits, running commands, multi-step tasks) is identical because it comes from Claude Code itself. Expect some gap on the hardest reasoning and very long context tasks.

What's the catch with coding plans?

Most flat plans cap usage in rolling windows — often a number of prompts every 5 hours, similar to Anthropic's own limits. Heavy bursts can hit those caps. Check each plan's per-window quota before committing.

Does this work on Windows or do I need WSL?

Both work. Native Windows uses PowerShell environment variables; WSL behaves like a normal Linux setup. WSL tends to be smoother for proxies and shell scripts, which is why many guides use it.

Sources & further reading

Official vendor documentation referenced while writing this guide.

MG

MCSA Guru Team

IT & Systems Administration

We are working IT pros and system administrators who spend our days in Windows Server, Microsoft 365, and the wider Microsoft stack. MCSA Guru is where we write down the fixes and walkthroughs we wish we had found the first time.

MCSA Guru provides independent, educational IT guidance. Microsoft, Windows, Windows Server, Microsoft 365, Exchange, and Microsoft Teams are trademarks of Microsoft Corporation; Docker is a trademark of Docker, Inc. MCSA Guru is not affiliated with or endorsed by Microsoft or Docker. Always test changes in a safe environment before applying them in production.

Related guides

Fixing something right now?

Jump straight into the guide library or search for the exact error or task you are dealing with.