Skip to content

DeepSeek V4 Pricing Explained: V4 Flash vs V4 Pro

DeepSeek V4 pricing explained with current V4 Flash and V4 Pro API rates, cache-hit discounts, Anthropic/OpenAI base URLs, and how to estimate coding cost.

MGMCSA Guru Team June 22, 2026 4 min read
A breakdown of DeepSeek V4 Pro and V4 Flash API pricing

DeepSeek is the model people reach for when they want to cut AI coding costs hard, and its pricing is the reason. But “DeepSeek pricing” isn’t one number — V4 Flash and V4 Pro have different rates, and cache-hit input is billed very differently from cache-miss input. This explains how it all fits together so you can estimate what you’ll actually pay.

One thing up front: DeepSeek is pay-per-token only. There’s no flat coding plan like GLM’s or Alibaba’s. That’s a feature for light or bursty use — you pay for exactly what you run.

The model tiers

DeepSeek V4 model tiers (official pricing snapshot, June 2026)

deepseek-v4-flash $0.0028/M cache-hit input, $0.14/M cache-miss input, $0.28/M output
deepseek-v4-pro $0.003625/M cache-hit input, $0.435/M cache-miss input, $0.87/M output

The official page also lists two base URLs: https://api.deepseek.com for OpenAI-format tools and https://api.deepseek.com/anthropic for Anthropic-format tools like Claude Code.

In older guides and tools, you will still see deepseek-chat and deepseek-reasoner. DeepSeek says those aliases map to non-thinking and thinking modes of deepseek-v4-flash, and are scheduled for deprecation on July 24, 2026 at 15:59 UTC. New configs should prefer the current V4 model names where the tool supports them.

Cache-hit discounts

DeepSeek bills cached input tokens far cheaper than fresh ones. When your prompts reuse context — the same system prompt, the same files loaded across an agent session — those repeated tokens hit the cache and cost a fraction of the standard input rate.

This matters a lot for coding agents like Claude Code and OpenCode, which resend context constantly. On V4 Flash, cache-hit input is $0.0028 per million tokens versus $0.14 for cache-miss input, so repeated context is where DeepSeek becomes absurdly cheap.

Base URLs and model names

Use the base URL that matches your tool:

# OpenAI-compatible tools such as Cline, OpenCode, Aider, Codex custom providers
https://api.deepseek.com

# Anthropic-compatible tools such as Claude Code
https://api.deepseek.com/anthropic

For new setups, use deepseek-v4-flash as the default model where supported. Use deepseek-v4-pro only for hard tasks where the extra capability is worth the higher output price.

How to estimate your cost

A simple method:

  1. Estimate input and output tokens per task (an agent session might be hundreds of thousands of tokens with context).
  2. Multiply by the per-million rates for your chosen model from the pricing page.
  3. Discount the repeated input for cache hits.
  4. Multiply by tasks per day and days per month.

For most individual developers, real DeepSeek spend is low because agent sessions reuse context. The number to watch is output tokens; output is far more expensive than cache-hit input.

Pro vs Flash: which to use

  • Everyday codingdeepseek-v4-flash. Cheapest, fast, fine for the bulk of work.
  • Hard reasoning, tricky bugs, architecturedeepseek-v4-pro.
  • Old tool configs → replace deepseek-chat / deepseek-reasoner when your tool supports the V4 names.

A good pattern is to default to the cheap tier and escalate only when a task genuinely needs more.

Before you budget

  • Check current per-million rates on the official pricing page
  • Use deepseek-v4-flash by default and deepseek-v4-pro for hard tasks
  • Account for cache-hit discounts on repeated context
  • Use the OpenAI base URL for OpenAI-compatible tools and /anthropic for Claude Code
  • Update old deepseek-chat / deepseek-reasoner aliases before the deprecation date

Wrapping up

DeepSeek’s pricing is pay-per-token across V4 Flash and V4 Pro, with cache-hit discounts that make repeated agent context extremely cheap. There’s no coding plan, which suits light and bursty use perfectly. Default to V4 Flash, escalate to V4 Pro when needed, use the right base URL for your tool, and confirm the live numbers on the official page since they move.

To put it to work, see run DeepSeek with Claude Code. To compare against flat-rate options, see coding plans vs pay-per-token.

Frequently asked questions

Does DeepSeek have a coding plan or subscription?

No. DeepSeek is pay-per-token only — there's no flat monthly coding plan like GLM or Alibaba offer. You pay for the input and output tokens you use, which keeps light usage extremely cheap.

What's the difference between V4 Pro and V4 Flash?

V4 Flash is the cheaper everyday model and V4 Pro is the stronger, pricier model for harder reasoning and complex coding. Pick Flash for routine agent work and Pro only when a task needs the extra capability.

How do cache-hit discounts work?

When parts of your prompt repeat (the same system prompt or context across calls), DeepSeek bills those cached input tokens at a steep discount. Agentic coding reuses a lot of context, so cache hits can cut real-world cost noticeably.

Which DeepSeek model should I use for coding?

Use deepseek-v4-flash as the default for everyday coding and deepseek-v4-pro for harder reasoning or complex debugging. Flash is much cheaper; Pro is the escalation path.

How do I estimate my monthly cost?

Multiply your expected input and output tokens by the per-million rates for V4 Flash or V4 Pro, then account for cache-hit input. Coding agents resend context constantly, so cache-hit pricing can be the difference between a tiny bill and a merely cheap one.

Sources & further reading

Official vendor documentation referenced while writing this guide.

MG

MCSA Guru Team

IT & Systems Administration

We are working IT pros and system administrators who spend our days in Windows Server, Microsoft 365, and the wider Microsoft stack. MCSA Guru is where we write down the fixes and walkthroughs we wish we had found the first time.

MCSA Guru provides independent, educational IT guidance. Microsoft, Windows, Windows Server, Microsoft 365, Exchange, and Microsoft Teams are trademarks of Microsoft Corporation; Docker is a trademark of Docker, Inc. MCSA Guru is not affiliated with or endorsed by Microsoft or Docker. Always test changes in a safe environment before applying them in production.

Related guides

Fixing something right now?

Jump straight into the guide library or search for the exact error or task you are dealing with.