Report

LLM Pricing Report: Q2 2026

A head-to-head look at 25 production LLMs across OpenAI, Anthropic, and Google as of April 2026. Every price is verified against the provider's own pricing page and recomputed on every render — if it changes at the source, this report changes with it.

Download CSV ↓

1. Executive summary

Q2 2026 closes the gap between the "frontier tier" and the "good enough" tier dramatically. The cheapest model that clears a typical chat turn (2,000 input / 500 output) is now GPT-5 nano at $0.000300 per request — within an order of magnitude of what a self-hosted 3B parameter model would cost if you amortised GPUs perfectly.

Long-context workloads are where the providers still diverge the most. At 100,000 input tokens the cheapest flagship is GPT-5 nano at $0.01 — an order of magnitude cheaper than the most expensive flagship on the same shape. Anyone shipping RAG or document analysis at scale should be benchmarking across providers, not just across models.

Context windows converged: every major flagship now offers at least a 200k-token window, and three exceed 1M. Context is no longer a differentiator — price-per-token and output quality are.

2. The 25-model landscape

Every row is sourced from the provider's own pricing page and verified by hand. Context windows in tokens; rates in USD per million tokens.

ModelProviderInput $/MtokOutput $/MtokCached inContextVerified
Claude Opus 4.7anthropic$5.00$25.001,000,0002026-04-17
Claude Opus 4.6anthropic$5.00$25.001,000,0002026-04-06
Claude Sonnet 4.6anthropic$3.00$15.001,000,0002026-04-06
Claude Haiku 4.5anthropic$1.00$5.00200,0002026-04-06
Claude Opus 4.5anthropic$5.00$25.00200,0002026-04-06
Claude Sonnet 4.5anthropic$3.00$15.00200,0002026-04-06
Claude Opus 4.1anthropic$15.00$75.00200,0002026-04-06
Gemini 3.1 Pro (preview)google$2.00$12.00$0.2001,000,0002026-04-06
Gemini 3 Flash (preview)google$0.50$3.00$0.0501,000,0002026-04-06
Gemini 3.1 Flash-Lite (preview)google$0.25$1.50$0.0251,000,0002026-04-06
Gemini 2.5 Progoogle$1.25$10.00$0.1252,000,0002026-04-06
Gemini 2.5 Flashgoogle$0.30$2.50$0.0301,000,0002026-04-06
Gemini 2.5 Flash-Litegoogle$0.10$0.40$0.0101,000,0002026-04-06
GPT-5.4openai$2.50$15.00$0.2501,050,0002026-04-06
GPT-5.4 miniopenai$0.75$4.50$0.075400,0002026-04-06
GPT-5.4 nanoopenai$0.20$1.25$0.020400,0002026-04-06
GPT-5openai$1.25$10.00$0.125400,0002026-04-06
GPT-5 miniopenai$0.25$2.00$0.025400,0002026-04-06
GPT-5 nanoopenai$0.05$0.40$0.005400,0002026-04-06
GPT-4.1openai$2.00$8.00$0.5001,047,5762026-04-06
GPT-4.1 miniopenai$0.40$1.60$0.1001,047,5762026-04-06
GPT-4oopenai$2.50$10.00$1.250128,0002026-04-06
GPT-4o miniopenai$0.15$0.60$0.075128,0002026-04-06
o3openai$2.00$8.00$0.500200,0002026-04-06
o4-miniopenai$1.10$4.40$0.275200,0002026-04-06

3. Head-to-head: flagship models across four scenarios

Four request shapes, each provider's top-tier model. Everything below updates automatically when the underlying pricing changes.

ScenarioTokens (in / out)GPT-5.4Claude Opus 4.1Gemini 3.1 Pro (preview)
Short Q&A
One-shot question with a short answer — closer to the classic assistant call than a chat loop.
500 / 150$0.0035$0.0187$0.0028
Typical chat turn
A medium-weight conversation with some retrieved context and a conversational reply.
2,000 / 500$0.0125$0.0675$0.0100
Long-context doc
Reading a long document end-to-end and emitting a summary or structured extraction.
100,000 / 2,000$0.2800$1.6500$0.2240
Agent step
One turn of a tool-using agent loop — prior messages plus tool outputs on input, an action plan on output.
10,000 / 2,000$0.0550$0.3000$0.0440

Cheapest value per row highlighted in emerald.

4. Real workload costs

Five product loops at realistic monthly volume. For each, we show the cheapest-to-expensive spread so you can see the dollar swing of picking the wrong model.

Consumer chatbot

100,000 users × 5 messages/day × 30 days = 15M requests/month

Cheapest

GPT-5 nano

$0.000300 / request

$4,500 / month

Most expensive

Claude Opus 4.1

$0.067500 / request

$1,012,500 / month

Spread: 225.0× — the cost of picking the wrong flagship for this workload.

RAG assistant

10,000 queries/day × 30 days = 300,000 requests/month, 8k ctx

Cheapest

GPT-5 nano

$0.000560 / request

$168 / month

Most expensive

Claude Opus 4.1

$0.150000 / request

$45,000 / month

Spread: 267.9× — the cost of picking the wrong flagship for this workload.

Coding agent

1,000 tasks/day × 20 steps × 30 days = 600,000 agent steps

Cheapest

GPT-5 nano

$0.001300 / request

$780 / month

Most expensive

Claude Opus 4.1

$0.300000 / request

$180,000 / month

Spread: 230.8× — the cost of picking the wrong flagship for this workload.

Document summarization

5,000 documents/day × 30 days = 150,000 documents

Cheapest

GPT-5 nano

$0.003100 / request

$465 / month

Most expensive

Claude Opus 4.1

$0.862500 / request

$129,375 / month

Spread: 278.2× — the cost of picking the wrong flagship for this workload.

Classification pipeline

100,000/day × 30 days = 3M requests/month

Cheapest

GPT-5 nano

$0.000045 / request

$135 / month

Most expensive

Claude Opus 4.1

$0.011250 / request

$33,750 / month

Spread: 250.0× — the cost of picking the wrong flagship for this workload.

5. Key findings

  • 1

    Small is eating big at the low end

    Nano- and lite-tier models (Gemini 2.5 Flash-Lite, GPT-5 nano, GPT-4o mini) now price output below $1 per million tokens — a range where the marginal cost of an extra user turn rounds to zero.

  • 2

    Long context is a price signal

    Gemini 2.5 Pro and 3.1 Pro both switch to a 2× tier above 200k input tokens. OpenAI and Anthropic don't tier, which means above 200k of context the dollar math can flip in ways a flat headline rate hides. Always price the tier you'll actually hit.

  • 3

    Cached input is the quiet winner

    Every OpenAI and Google model ships a cached-input rate 80-90% below the standard rate. If your prompt prefix is stable (system prompts, few-shot examples), caching is the single biggest lever you haven't pulled. See each model page for the cached rate — it's already in the table above.

  • 4

    The flagship gap is 15-25×, not 2-3×

    People still compare GPT-5 to GPT-5.4 as if they're close. On per-request cost, the nano variant is usually 15-25× cheaper than the flagship at the same context size. For classification, routing, and summarization that 15-25× is pure margin — run nano and only escalate on uncertainty.

6. Methodology

All prices in this report are read from lib/pricing.ts, which is itself sourced from each provider's public pricing page. Every row carries a verified_at date and a source URL; the oldest verification on the table above is the page's effective floor.

Cost math uses the same estimateCost() helper that powers the estimator and every per-model page, so the numbers here can't drift from the rest of the site. tokenizerMultiplier is applied where a model's tokenizer emits a different token count than our counter (e.g. Claude Opus 4.7 at 1.15×). Gemini long-context tiers switch in automatically at the configured threshold.

Not modelled (yet): prompt caching discounts, batch discounts, extended thinking surcharges, image / audio modalities, and fine-tune hosting. The CSV download includes the cached-input rate so you can model caching yourself.

Machine-readable access: the same data is available as a CSV file and via the pricing RSS feed. Both are CC-BY for any use — attribute Calcis if you'd like to.

Want these numbers for your own workload?

Plug real prompts into the Calcis estimator and we'll cost the exact request you intend to send, not a generic scenario.