DeepSeek
NewDeepSeek V4 Flash pricing
DeepSeek's April 2026 high-volume model. At $0.14/$0.28 per 1M tokens with near-free cached input, it is one of the cheapest capable models on the market.
Input
$0.14/ 1M tok
Output
$0.28/ 1M tok
- Context window
- 1M
- Max output
- 384K
- Cached input
- $0.0028 / 1M
- Verified
- 2026-07-01
DeepSeek V4 Flash is the cost-optimised member of the DeepSeek V4 family, released on 24 April 2026 as an open-weight (MIT) model. It is a 284B-parameter mixture-of-experts model with roughly 13B parameters active per token – smaller than V4 Pro's 1.6T, but sharing the same 1M-token context window and 384K max output.
At $0.14 input / $0.28 output per 1M tokens it is one of the cheapest capable models available, and cached input is almost free at $0.0028 per 1M tokens – a 98% discount on cache hits. For workloads with large, repeated prefixes (RAG system prompts, few-shot templates, long tool definitions) that cache rate can dominate the effective cost.
DeepSeek is not yet wired into the Calcis estimator or billing; these rates are informational and dated. New integrations should target deepseek-v4-flash directly – the legacy deepseek-chat and deepseek-reasoner aliases retire on 24 July 2026.
Estimate LLM costs before you send
Paste your prompt into the Calcis estimator to see token counts and per-request cost across every tracked model, then compare DeepSeek V4 Flash against them side by side.
Frequently asked
- How much does DeepSeek V4 Flash cost per 1M tokens?
- $0.14 for input and $0.28 for output per 1M tokens. Cached input is $0.0028 per 1M tokens, a 98% discount on cache hits.
- What is the DeepSeek V4 Flash context window?
- 1M tokens, with up to 384K output tokens - the same envelope as the larger V4 Pro.
- How much cheaper is V4 Flash than V4 Pro?
- Roughly 12x on input ($0.14 vs $1.74) and 12x on output ($0.28 vs $3.48) at standard rates. V4 Flash is built for high-volume, latency-sensitive work; V4 Pro is the flagship for the hardest tasks.
- How does the cached-input discount work?
- On a cache hit, input tokens bill at $0.0028 per 1M instead of $0.14 - about 98% off. Workloads with large repeated prefixes benefit most.
- Is DeepSeek V4 Flash open weight?
- Yes, released under the MIT licence, so you can self-host as well as call the hosted API.
Pricing verified 2026-07-01 from the provider's rate card. These figures are informational and not yet wired into the Calcis estimator or billing.