Report
LLM Pricing Report: Q2 2026
A head-to-head look at 25 production LLMs across OpenAI, Anthropic, and Google as of April 2026. Every price is verified against the provider's own pricing page and recomputed on every render — if it changes at the source, this report changes with it.
1. Executive summary
Q2 2026 closes the gap between the "frontier tier" and the "good enough" tier dramatically. The cheapest model that clears a typical chat turn (2,000 input / 500 output) is now GPT-5 nano at $0.000300 per request — within an order of magnitude of what a self-hosted 3B parameter model would cost if you amortised GPUs perfectly.
Long-context workloads are where the providers still diverge the most. At 100,000 input tokens the cheapest flagship is GPT-5 nano at $0.01 — an order of magnitude cheaper than the most expensive flagship on the same shape. Anyone shipping RAG or document analysis at scale should be benchmarking across providers, not just across models.
Context windows converged: every major flagship now offers at least a 200k-token window, and three exceed 1M. Context is no longer a differentiator — price-per-token and output quality are.
2. The 25-model landscape
Every row is sourced from the provider's own pricing page and verified by hand. Context windows in tokens; rates in USD per million tokens.
| Model | Provider | Input $/Mtok | Output $/Mtok | Cached in | Context | Verified |
|---|---|---|---|---|---|---|
| Claude Opus 4.7 | anthropic | $5.00 | $25.00 | — | 1,000,000 | 2026-04-17 |
| Claude Opus 4.6 | anthropic | $5.00 | $25.00 | — | 1,000,000 | 2026-04-06 |
| Claude Sonnet 4.6 | anthropic | $3.00 | $15.00 | — | 1,000,000 | 2026-04-06 |
| Claude Haiku 4.5 | anthropic | $1.00 | $5.00 | — | 200,000 | 2026-04-06 |
| Claude Opus 4.5 | anthropic | $5.00 | $25.00 | — | 200,000 | 2026-04-06 |
| Claude Sonnet 4.5 | anthropic | $3.00 | $15.00 | — | 200,000 | 2026-04-06 |
| Claude Opus 4.1 | anthropic | $15.00 | $75.00 | — | 200,000 | 2026-04-06 |
| Gemini 3.1 Pro (preview) | $2.00 | $12.00 | $0.200 | 1,000,000 | 2026-04-06 | |
| Gemini 3 Flash (preview) | $0.50 | $3.00 | $0.050 | 1,000,000 | 2026-04-06 | |
| Gemini 3.1 Flash-Lite (preview) | $0.25 | $1.50 | $0.025 | 1,000,000 | 2026-04-06 | |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.125 | 2,000,000 | 2026-04-06 | |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.030 | 1,000,000 | 2026-04-06 | |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.010 | 1,000,000 | 2026-04-06 | |
| GPT-5.4 | openai | $2.50 | $15.00 | $0.250 | 1,050,000 | 2026-04-06 |
| GPT-5.4 mini | openai | $0.75 | $4.50 | $0.075 | 400,000 | 2026-04-06 |
| GPT-5.4 nano | openai | $0.20 | $1.25 | $0.020 | 400,000 | 2026-04-06 |
| GPT-5 | openai | $1.25 | $10.00 | $0.125 | 400,000 | 2026-04-06 |
| GPT-5 mini | openai | $0.25 | $2.00 | $0.025 | 400,000 | 2026-04-06 |
| GPT-5 nano | openai | $0.05 | $0.40 | $0.005 | 400,000 | 2026-04-06 |
| GPT-4.1 | openai | $2.00 | $8.00 | $0.500 | 1,047,576 | 2026-04-06 |
| GPT-4.1 mini | openai | $0.40 | $1.60 | $0.100 | 1,047,576 | 2026-04-06 |
| GPT-4o | openai | $2.50 | $10.00 | $1.250 | 128,000 | 2026-04-06 |
| GPT-4o mini | openai | $0.15 | $0.60 | $0.075 | 128,000 | 2026-04-06 |
| o3 | openai | $2.00 | $8.00 | $0.500 | 200,000 | 2026-04-06 |
| o4-mini | openai | $1.10 | $4.40 | $0.275 | 200,000 | 2026-04-06 |
3. Head-to-head: flagship models across four scenarios
Four request shapes, each provider's top-tier model. Everything below updates automatically when the underlying pricing changes.
| Scenario | Tokens (in / out) | GPT-5.4 | Claude Opus 4.1 | Gemini 3.1 Pro (preview) |
|---|---|---|---|---|
Short Q&A One-shot question with a short answer — closer to the classic assistant call than a chat loop. | 500 / 150 | $0.0035 | $0.0187 | $0.0028 |
Typical chat turn A medium-weight conversation with some retrieved context and a conversational reply. | 2,000 / 500 | $0.0125 | $0.0675 | $0.0100 |
Long-context doc Reading a long document end-to-end and emitting a summary or structured extraction. | 100,000 / 2,000 | $0.2800 | $1.6500 | $0.2240 |
Agent step One turn of a tool-using agent loop — prior messages plus tool outputs on input, an action plan on output. | 10,000 / 2,000 | $0.0550 | $0.3000 | $0.0440 |
Cheapest value per row highlighted in emerald.
4. Real workload costs
Five product loops at realistic monthly volume. For each, we show the cheapest-to-expensive spread so you can see the dollar swing of picking the wrong model.
Consumer chatbot
100,000 users × 5 messages/day × 30 days = 15M requests/month
Cheapest
GPT-5 nano
$0.000300 / request
$4,500 / month
Most expensive
Claude Opus 4.1
$0.067500 / request
$1,012,500 / month
Spread: 225.0× — the cost of picking the wrong flagship for this workload.
RAG assistant
10,000 queries/day × 30 days = 300,000 requests/month, 8k ctx
Cheapest
GPT-5 nano
$0.000560 / request
$168 / month
Most expensive
Claude Opus 4.1
$0.150000 / request
$45,000 / month
Spread: 267.9× — the cost of picking the wrong flagship for this workload.
Coding agent
1,000 tasks/day × 20 steps × 30 days = 600,000 agent steps
Cheapest
GPT-5 nano
$0.001300 / request
$780 / month
Most expensive
Claude Opus 4.1
$0.300000 / request
$180,000 / month
Spread: 230.8× — the cost of picking the wrong flagship for this workload.
Document summarization
5,000 documents/day × 30 days = 150,000 documents
Cheapest
GPT-5 nano
$0.003100 / request
$465 / month
Most expensive
Claude Opus 4.1
$0.862500 / request
$129,375 / month
Spread: 278.2× — the cost of picking the wrong flagship for this workload.
Classification pipeline
100,000/day × 30 days = 3M requests/month
Cheapest
GPT-5 nano
$0.000045 / request
$135 / month
Most expensive
Claude Opus 4.1
$0.011250 / request
$33,750 / month
Spread: 250.0× — the cost of picking the wrong flagship for this workload.
5. Key findings
- 1
Small is eating big at the low end
Nano- and lite-tier models (Gemini 2.5 Flash-Lite, GPT-5 nano, GPT-4o mini) now price output below $1 per million tokens — a range where the marginal cost of an extra user turn rounds to zero.
- 2
Long context is a price signal
Gemini 2.5 Pro and 3.1 Pro both switch to a 2× tier above 200k input tokens. OpenAI and Anthropic don't tier, which means above 200k of context the dollar math can flip in ways a flat headline rate hides. Always price the tier you'll actually hit.
- 3
Cached input is the quiet winner
Every OpenAI and Google model ships a cached-input rate 80-90% below the standard rate. If your prompt prefix is stable (system prompts, few-shot examples), caching is the single biggest lever you haven't pulled. See each model page for the cached rate — it's already in the table above.
- 4
The flagship gap is 15-25×, not 2-3×
People still compare GPT-5 to GPT-5.4 as if they're close. On per-request cost, the nano variant is usually 15-25× cheaper than the flagship at the same context size. For classification, routing, and summarization that 15-25× is pure margin — run nano and only escalate on uncertainty.
6. Methodology
All prices in this report are read from lib/pricing.ts, which is itself sourced from each provider's public pricing page. Every row carries a verified_at date and a source URL; the oldest verification on the table above is the page's effective floor.
Cost math uses the same estimateCost() helper that powers the estimator and every per-model page, so the numbers here can't drift from the rest of the site. tokenizerMultiplier is applied where a model's tokenizer emits a different token count than our counter (e.g. Claude Opus 4.7 at 1.15×). Gemini long-context tiers switch in automatically at the configured threshold.
Not modelled (yet): prompt caching discounts, batch discounts, extended thinking surcharges, image / audio modalities, and fine-tune hosting. The CSV download includes the cached-input rate so you can model caching yourself.
Machine-readable access: the same data is available as a CSV file and via the pricing RSS feed. Both are CC-BY for any use — attribute Calcis if you'd like to.
Want these numbers for your own workload?
Plug real prompts into the Calcis estimator and we'll cost the exact request you intend to send, not a generic scenario.