DeepSeek is the model people reach for when they want to cut AI coding costs hard, and its pricing is the reason. But “DeepSeek pricing” isn’t one number — V4 Flash and V4 Pro have different rates, and cache-hit input is billed very differently from cache-miss input. This explains how it all fits together so you can estimate what you’ll actually pay.
One thing up front: DeepSeek is pay-per-token only. There’s no flat coding plan like GLM’s or Alibaba’s. That’s a feature for light or bursty use — you pay for exactly what you run.
The model tiers
DeepSeek V4 model tiers (official pricing snapshot, June 2026)
| deepseek-v4-flash | $0.0028/M cache-hit input, $0.14/M cache-miss input, $0.28/M output |
|---|---|
| deepseek-v4-pro | $0.003625/M cache-hit input, $0.435/M cache-miss input, $0.87/M output |
The official page also lists two base URLs: https://api.deepseek.com for OpenAI-format tools and https://api.deepseek.com/anthropic for Anthropic-format tools like Claude Code.
In older guides and tools, you will still see deepseek-chat and deepseek-reasoner. DeepSeek says those aliases map to non-thinking and thinking modes of deepseek-v4-flash, and are scheduled for deprecation on July 24, 2026 at 15:59 UTC. New configs should prefer the current V4 model names where the tool supports them.
Cache-hit discounts
DeepSeek bills cached input tokens far cheaper than fresh ones. When your prompts reuse context — the same system prompt, the same files loaded across an agent session — those repeated tokens hit the cache and cost a fraction of the standard input rate.
This matters a lot for coding agents like Claude Code and OpenCode, which resend context constantly. On V4 Flash, cache-hit input is $0.0028 per million tokens versus $0.14 for cache-miss input, so repeated context is where DeepSeek becomes absurdly cheap.
Base URLs and model names
Use the base URL that matches your tool:
# OpenAI-compatible tools such as Cline, OpenCode, Aider, Codex custom providers
https://api.deepseek.com
# Anthropic-compatible tools such as Claude Code
https://api.deepseek.com/anthropic
For new setups, use deepseek-v4-flash as the default model where supported. Use deepseek-v4-pro only for hard tasks where the extra capability is worth the higher output price.
How to estimate your cost
A simple method:
- Estimate input and output tokens per task (an agent session might be hundreds of thousands of tokens with context).
- Multiply by the per-million rates for your chosen model from the pricing page.
- Discount the repeated input for cache hits.
- Multiply by tasks per day and days per month.
For most individual developers, real DeepSeek spend is low because agent sessions reuse context. The number to watch is output tokens; output is far more expensive than cache-hit input.
Pro vs Flash: which to use
- Everyday coding →
deepseek-v4-flash. Cheapest, fast, fine for the bulk of work. - Hard reasoning, tricky bugs, architecture →
deepseek-v4-pro. - Old tool configs → replace
deepseek-chat/deepseek-reasonerwhen your tool supports the V4 names.
A good pattern is to default to the cheap tier and escalate only when a task genuinely needs more.
Before you budget
- Check current per-million rates on the official pricing page
- Use deepseek-v4-flash by default and deepseek-v4-pro for hard tasks
- Account for cache-hit discounts on repeated context
- Use the OpenAI base URL for OpenAI-compatible tools and /anthropic for Claude Code
- Update old deepseek-chat / deepseek-reasoner aliases before the deprecation date
Wrapping up
DeepSeek’s pricing is pay-per-token across V4 Flash and V4 Pro, with cache-hit discounts that make repeated agent context extremely cheap. There’s no coding plan, which suits light and bursty use perfectly. Default to V4 Flash, escalate to V4 Pro when needed, use the right base URL for your tool, and confirm the live numbers on the official page since they move.
To put it to work, see run DeepSeek with Claude Code. To compare against flat-rate options, see coding plans vs pay-per-token.