Report

LLM Pricing Report: Q2 2026

Name: LLM Pricing Landscape: Q2 2026
Creator: Calcis
Published: 2026-04-18
License: https://opensource.org/licenses/MIT

A head-to-head look at 28 production LLMs across OpenAI, Anthropic, and Google as of April 2026. Every price is verified against the provider's own pricing page and recomputed on every render: if it changes at the source, this report changes with it.

Published Apr 18, 2026Updated Apr 29, 2026Download CSV ↓

1. Executive summary

Q2 2026 closes the gap between the "frontier tier" and the "good enough" tier dramatically. The cheapest model that clears a typical chat turn (2,000 input / 500 output) is now GPT-5 nano at $0.000300 per request: within an order of magnitude of what a self-hosted 3B parameter model would cost if you amortised GPUs perfectly.

Long-context workloads are where the providers still diverge the most. At 100,000 input tokens the cheapest flagship is GPT-5 nano at $0.01: an order of magnitude cheaper than the most expensive flagship on the same shape. Anyone shipping RAG or document analysis at scale should be benchmarking across providers, not just across models.

Context windows converged: every major flagship now offers at least a 200k-token window, and three exceed 1M. Context is no longer a differentiator: price-per-token and output quality are.

2. The 28-model landscape

Every row is sourced from the provider's own pricing page and verified by hand. Context windows in tokens; rates in USD per million tokens.

Model	Provider	Input $/Mtok	Output $/Mtok	Cached in	Context	Verified
Claude Opus 4.7	anthropic	$5.00	$25.00	-	1,000,000	2026-04-17
Claude Opus 4.6	anthropic	$5.00	$25.00	-	1,000,000	2026-04-06
Claude Sonnet 4.6	anthropic	$3.00	$15.00	-	1,000,000	2026-04-06
Claude Haiku 4.5	anthropic	$1.00	$5.00	-	200,000	2026-04-06
Claude Opus 4.5	anthropic	$5.00	$25.00	-	200,000	2026-04-06
Claude Sonnet 4.5	anthropic	$3.00	$15.00	-	200,000	2026-04-06
Claude Opus 4.1	anthropic	$15.00	$75.00	-	200,000	2026-04-06
Gemini 3.1 Pro (preview)	google	$2.00	$12.00	$0.200	1,000,000	2026-04-06
Gemini 3 Flash (preview)	google	$0.50	$3.00	$0.050	1,000,000	2026-04-06
Gemini 3.1 Flash-Lite (preview)	google	$0.25	$1.50	$0.025	1,000,000	2026-04-06
Gemini 2.5 Pro	google	$1.25	$10.00	$0.125	2,000,000	2026-04-06
Gemini 2.5 Flash	google	$0.30	$2.50	$0.030	1,000,000	2026-04-06
Gemini 2.5 Flash-Lite	google	$0.10	$0.40	$0.010	1,000,000	2026-04-06
GPT-5.5	openai	$3.00	$20.00	$0.300	1,050,000	2026-04-29
GPT-5.5 mini	openai	$0.90	$5.50	$0.090	400,000	2026-04-29
GPT-5.5 nano	openai	$0.25	$1.50	$0.025	400,000	2026-04-29
GPT-5.4	openai	$2.50	$15.00	$0.250	1,050,000	2026-04-06
GPT-5.4 mini	openai	$0.75	$4.50	$0.075	400,000	2026-04-06
GPT-5.4 nano	openai	$0.20	$1.25	$0.020	400,000	2026-04-06
GPT-5	openai	$1.25	$10.00	$0.125	400,000	2026-04-06
GPT-5 mini	openai	$0.25	$2.00	$0.025	400,000	2026-04-06
GPT-5 nano	openai	$0.05	$0.40	$0.005	400,000	2026-04-06
GPT-4.1	openai	$2.00	$8.00	$0.500	1,047,576	2026-04-06
GPT-4.1 mini	openai	$0.40	$1.60	$0.100	1,047,576	2026-04-06
GPT-4o	openai	$2.50	$10.00	$1.250	128,000	2026-04-06
GPT-4o mini	openai	$0.15	$0.60	$0.075	128,000	2026-04-06
o3	openai	$2.00	$8.00	$0.500	200,000	2026-04-06
o4-mini	openai	$1.10	$4.40	$0.275	200,000	2026-04-06

3. Head-to-head: flagship models across four scenarios

Four request shapes, each provider's top-tier model. Everything below updates automatically when the underlying pricing changes.

Scenario	Tokens (in / out)	GPT-5.5	Claude Opus 4.1	Gemini 3.1 Pro (preview)
Short Q&A One-shot question with a short answer: closer to the classic assistant call than a chat loop.	500 / 150	$0.0045	$0.0187	$0.0028
Typical chat turn A medium-weight conversation with some retrieved context and a conversational reply.	2,000 / 500	$0.0160	$0.0675	$0.0100
Long-context doc Reading a long document end-to-end and emitting a summary or structured extraction.	100,000 / 2,000	$0.3400	$1.6500	$0.2240
Agent step One turn of a tool-using agent loop: prior messages plus tool outputs on input, an action plan on output.	10,000 / 2,000	$0.0700	$0.3000	$0.0440

$0.011250 / request

$33,750 / month

Spread: 250.0× : the cost of picking the wrong flagship for this workload.

5. Key findings

1
Small is eating big at the low end
Nano- and lite-tier models (Gemini 2.5 Flash-Lite, GPT-5 nano, GPT-4o mini) now price output below $1 per million tokens: a range where the marginal cost of an extra user turn rounds to zero.
2
Long context is a price signal
Gemini 2.5 Pro and 3.1 Pro both switch to a 2× tier above 200k input tokens. OpenAI and Anthropic don't tier, which means above 200k of context the dollar math can flip in ways a flat headline rate hides. Always price the tier you'll actually hit.
3
Cached input is the quiet winner
Every OpenAI and Google model ships a cached-input rate 80-90% below the standard rate. If your prompt prefix is stable (system prompts, few-shot examples), caching is the single biggest lever you haven't pulled. See each model page for the cached rate: it's already in the table above.
4
The flagship gap is 15-25×, not 2-3×
People still compare GPT-5 to GPT-5.4 as if they're close. On per-request cost, the nano variant is usually 15-25× cheaper than the flagship at the same context size. For classification, routing, and summarisation that 15-25× is pure margin: run nano and only escalate on uncertainty.

6. Methodology

All prices in this report are read from lib/pricing.ts, which is itself sourced from each provider's public pricing page. Every row carries a verified_at date and a source URL; the oldest verification on the table above is the page's effective floor.

Cost math uses the same estimateCost() helper that powers the estimator and every per-model page, so the numbers here can't drift from the rest of the site. tokenizerMultiplier is applied where a model's tokenizer emits a different token count than our counter (e.g. Claude Opus 4.7 at 1.15×). Gemini long-context tiers switch in automatically at the configured threshold.

Not modelled (yet): prompt caching discounts, batch discounts, extended thinking surcharges, image / audio modalities, and fine-tune hosting. The CSV download includes the cached-input rate so you can model caching yourself.

Machine-readable access: the same data is available as a CSV file and via the pricing RSS feed. Both are CC-BY for any use: attribute Calcis if you'd like to.

Want these numbers for your own workload?

Plug real prompts into the Calcis estimator and we'll cost the exact request you intend to send, not a generic scenario.

Open the estimator →Workload calculators

1. Executive summary

2. The 28-model landscape

3. Head-to-head: flagship models across four scenarios

4. Real workload costs

Consumer chatbot

RAG assistant

Coding agent

Document summarisation

Classification pipeline

5. Key findings

Small is eating big at the low end

Long context is a price signal

Cached input is the quiet winner

The flagship gap is 15-25×, not 2-3×

6. Methodology

Want these numbers for your own workload?