DeepSeek

New

DeepSeek V4 Flash pricing

DeepSeek's April 2026 high-volume model. At $0.14/$0.28 per 1M tokens with near-free cached input, it is one of the cheapest capable models on the market.

Input

$0.14/ 1M tok

Output

$0.28/ 1M tok

Context window
1M
Max output
384K
Cached input
$0.0028 / 1M
Verified
2026-07-01

DeepSeek V4 Flash is the cost-optimised member of the DeepSeek V4 family, released on 24 April 2026 as an open-weight (MIT) model. It is a 284B-parameter mixture-of-experts model with roughly 13B parameters active per token – smaller than V4 Pro's 1.6T, but sharing the same 1M-token context window and 384K max output.

At $0.14 input / $0.28 output per 1M tokens it is one of the cheapest capable models available, and cached input is almost free at $0.0028 per 1M tokens – a 98% discount on cache hits. For workloads with large, repeated prefixes (RAG system prompts, few-shot templates, long tool definitions) that cache rate can dominate the effective cost.

DeepSeek is not yet wired into the Calcis estimator or billing; these rates are informational and dated. New integrations should target deepseek-v4-flash directly – the legacy deepseek-chat and deepseek-reasoner aliases retire on 24 July 2026.

Estimate LLM costs before you send

Paste your prompt into the Calcis estimator to see token counts and per-request cost across every tracked model, then compare DeepSeek V4 Flash against them side by side.

Frequently asked

How much does DeepSeek V4 Flash cost per 1M tokens?
$0.14 for input and $0.28 for output per 1M tokens. Cached input is $0.0028 per 1M tokens, a 98% discount on cache hits.
What is the DeepSeek V4 Flash context window?
1M tokens, with up to 384K output tokens - the same envelope as the larger V4 Pro.
How much cheaper is V4 Flash than V4 Pro?
Roughly 12x on input ($0.14 vs $1.74) and 12x on output ($0.28 vs $3.48) at standard rates. V4 Flash is built for high-volume, latency-sensitive work; V4 Pro is the flagship for the hardest tasks.
How does the cached-input discount work?
On a cache hit, input tokens bill at $0.0028 per 1M instead of $0.14 - about 98% off. Workloads with large repeated prefixes benefit most.
Is DeepSeek V4 Flash open weight?
Yes, released under the MIT licence, so you can self-host as well as call the hosted API.

Pricing verified 2026-07-01 from the provider's rate card. These figures are informational and not yet wired into the Calcis estimator or billing.