Changelog
Pricing changelog
Every pricing change, logged. Updated within hours of provider announcements.
Q2 2026 pricing report published
Full Q2 2026 landscape report with 25-model comparison, head-to-head flagship scenarios, and a machine-readable CSV distribution.
- Published the Q2 2026 LLM Pricing Report at /reports/llm-pricing-q2-2026
- Full 25-model landscape with head-to-head flagship scenarios and five workload cost calculations
- Machine-readable dataset available at /data/llm-pricing-q2-2026.csv (RFC 4180, columns match ModelPricing)
- Dataset + Report JSON-LD so search engines can surface the CSV distribution
Claude Opus 4.7 added
Anthropic shipped Claude Opus 4.7 at $5/$25 per 1M tokens with a retrained tokenizer that emits 1.0–1.35× more tokens per unit of text.
- Added Claude Opus 4.7 ($5/$25 per 1M tokens, 1M context)
- Added tokenizer multiplier (1.15×) for Opus 4.7 new tokenizer
- Added xhigh effort level for Anthropic models
- Added effort level cost modelling for all Anthropic and OpenAI models
Model expansion + pricing updates
Ingested the full 2025–2026 model catalogue: GPT-5 and GPT-5.4 families, o3/o4-mini, Gemini 3.1 previews, and the remaining current Anthropic tiers.
- Added GPT-5.4, GPT-5.4 mini, GPT-5.4 nano
- Added GPT-5, GPT-5 mini, GPT-5 nano
- Added GPT-4.1, GPT-4.1 mini
- Added o3, o4-mini
- Added Gemini 3.1 Pro, Gemini 3 Flash, Gemini 3.1 Flash-Lite (preview)
- Added Gemini 2.5 Flash-Lite
- Updated all Anthropic models to current pricing
Initial pricing database
Calcis launched with an initial 24-model pricing database spanning OpenAI, Anthropic, and Google.
- Launched with 24 models across OpenAI, Anthropic, and Google
- Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
- GPT-4o, GPT-4o mini
- Gemini 2.5 Pro, Gemini 2.5 Flash
- Backfilled
Claude Opus 4.6 released
Anthropic shipped Claude Opus 4.6 as the flagship tier at $5/$25 per 1M tokens with a 1M-token context window.
- Claude Opus 4.6 released at $5/$25 per 1M tokens
- 1M-token context window with 128K max output
- Backfilled
OpenAI GPT-5.4 family launch
OpenAI released GPT-5.4, GPT-5.4 mini, and GPT-5.4 nano with the same token window as GPT-5 but lower blended cost on typical workloads.
- GPT-5.4 shipped at $2.50/$15.00 per 1M tokens, 1.05M context
- GPT-5.4 mini at $0.75/$4.50 per 1M tokens
- GPT-5.4 nano at $0.20/$1.25 per 1M tokens
- Backfilled
Gemini 3 Flash (preview)
Google launched Gemini 3 Flash in preview at $0.50/$3.00 per 1M tokens with a 1M-token context.
- Gemini 3 Flash preview at $0.50/$3.00 per 1M tokens
- Cached input priced at $0.05 per 1M tokens
- Backfilled
Gemini 3.1 Pro (preview)
Google introduced Gemini 3.1 Pro in preview with tiered long-context pricing above the 200K-token threshold.
- Gemini 3.1 Pro preview at $2.00/$12.00 per 1M tokens
- Long-context tier (>200K tokens): $4.00/$18.00 per 1M
- Cached input at $0.20 per 1M tokens
- Backfilled
Gemini 3.1 Flash-Lite (preview)
Google shipped Gemini 3.1 Flash-Lite in preview as the lowest-tier model in the 3.x family at $0.25/$1.50 per 1M tokens.
- Gemini 3.1 Flash-Lite preview at $0.25/$1.50 per 1M tokens
- 1M-token context with aggressive cache pricing at $0.025/1M
- Backfilled
Claude Sonnet 4.6 released
Anthropic released Claude Sonnet 4.6 at $3/$15 per 1M tokens with a 1M-token context window for standard tier users.
- Sonnet 4.6 released at $3/$15 per 1M tokens
- Context window extended to 1M tokens
- Backfilled
Claude Opus 4.5 released
Anthropic released Claude Opus 4.5 at $5/$25 per 1M tokens, a steep cut from Opus 4.1's $15/$75 rate.
- Opus 4.5 at $5/$25 per 1M tokens (down from $15/$75 for 4.1)
- 200K-token context, 64K max output
- Backfilled
Claude Haiku 4.5 released
Anthropic refreshed the Haiku tier at $1/$5 per 1M tokens with a 200K context window.
- Haiku 4.5 at $1/$5 per 1M tokens
- 200K-token context, 64K max output
- Backfilled
Claude Sonnet 4.5 released
Anthropic released Claude Sonnet 4.5 at $3/$15 per 1M tokens.
- Sonnet 4.5 at $3/$15 per 1M tokens
- 200K-token context, 64K max output
- Backfilled
Gemini 2.5 Flash-Lite GA
Google graduated Gemini 2.5 Flash-Lite to general availability at $0.10/$0.40 per 1M tokens, the cheapest tier in the 2.5 line.
- Gemini 2.5 Flash-Lite GA at $0.10/$0.40 per 1M tokens
- Cached input at $0.01 per 1M tokens
- Backfilled
OpenAI GPT-5 family launch
OpenAI launched GPT-5, GPT-5 mini, and GPT-5 nano as the post-GPT-4.1 flagship line.
- GPT-5 at $1.25/$10.00 per 1M tokens, 400K context
- GPT-5 mini at $0.25/$2.00 per 1M tokens
- GPT-5 nano at $0.05/$0.40 per 1M tokens
- Automatic cached-input discount: 10× off input rate
- Backfilled
Claude Opus 4.1 released
Anthropic released Claude Opus 4.1 at $15/$75 per 1M tokens, the last entry in the 4.x top-tier pricing before the 4.5 reset.
- Opus 4.1 at $15/$75 per 1M tokens
- 200K context, 32K max output
- Backfilled
Gemini 2.5 Flash GA pricing
Google finalized Gemini 2.5 Flash GA pricing at $0.30/$2.50 per 1M tokens with a 1M-token context.
- Gemini 2.5 Flash GA at $0.30/$2.50 per 1M tokens
- 1M-token context window
- Backfilled
Claude 4: Opus 4 and Sonnet 4 launch
Anthropic introduced the Claude 4 family with Opus 4 and Sonnet 4, a hybrid reasoning architecture across both tiers.
- Claude Opus 4 and Sonnet 4 launched
- First Claude family with unified hybrid reasoning across tiers
- Initial Opus 4 at $15/$75; Sonnet 4 at $3/$15
- Backfilled
OpenAI o3 and o4-mini launch
OpenAI released o3 and o4-mini as the next generation of reasoning models following the o1/o3-mini lineage.
- o3 at $2.00/$8.00 per 1M tokens, 200K context, 100K max output
- o4-mini at $1.10/$4.40 per 1M tokens
- Both priced with cached-input 4× discount
- Backfilled
OpenAI GPT-4.1 family launch
OpenAI launched GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano with an expanded 1M-token context window.
- GPT-4.1 at $2.00/$8.00 per 1M tokens, ~1M context
- GPT-4.1 mini at $0.40/$1.60 per 1M tokens
- First OpenAI flagship at 1M-token context
- Backfilled
Gemini 2.5 Pro launch
Google launched Gemini 2.5 Pro with a 2M-token context window and tiered pricing above 200K tokens.
- Gemini 2.5 Pro at $1.25/$10.00 per 1M tokens (standard tier)
- Long-context tier (>200K): $2.50/$15.00 per 1M tokens
- 2M-token context window
- Backfilled
Claude 3.7 Sonnet with extended thinking
Anthropic released Claude 3.7 Sonnet, the first Claude model to expose extended-thinking as a user-controllable parameter.
- Claude 3.7 Sonnet at $3/$15 per 1M tokens
- Introduced extended-thinking budget parameter
- 200K context, 64K max output
- Backfilled
Gemini 2.0 Pro Experimental
Google exposed Gemini 2.0 Pro Experimental via AI Studio ahead of the 2.5 line.
- Gemini 2.0 Pro Experimental available in AI Studio
- Free tier during experimental window
- Backfilled
OpenAI o3-mini launch
OpenAI released o3-mini as the second-generation reasoning model, replacing o1-mini across most reasoning benchmarks.
- o3-mini launched with reasoning-effort parameter
- Pricing parity with o1-mini at launch
- Backfilled
OpenAI o1 full release
OpenAI released the full o1 model (graduating from o1-preview) and introduced o1-pro mode in ChatGPT.
- o1 full release at $15/$60 per 1M tokens
- o1-pro mode available in ChatGPT Pro
- Reasoning-effort parameter added to the API
- Backfilled
Gemini 2.0 Flash launch
Google launched Gemini 2.0 Flash as the first of the Gemini 2.0 line, with native tool use and multimodal live streaming.
- Gemini 2.0 Flash launched with native tool use
- Multimodal live streaming API
- Free tier in AI Studio during experimental window
- Backfilled
Claude 3.5 Haiku launch
Anthropic launched Claude 3.5 Haiku at $1/$5 per 1M tokens, a 4× price increase over 3 Haiku but with substantially stronger benchmarks.
- Claude 3.5 Haiku at $1/$5 per 1M tokens
- 200K context, 8K max output at launch
- Price increase vs 3 Haiku ($0.25/$1.25) drew community pushback
- Backfilled
Claude 3.5 Sonnet v2 + computer use
Anthropic refreshed Claude 3.5 Sonnet and introduced the computer use beta API for agent workflows.
- Claude 3.5 Sonnet v2 ('claude-3-5-sonnet-20241022')
- Computer use beta API launched
- Pricing held at $3/$15 per 1M tokens
- Backfilled
Anthropic Message Batches API
Anthropic launched the Message Batches API with a 50% discount for async workloads completed within 24 hours.
- Message Batches API at 50% discount
- 24-hour turnaround window
- Backfilled
OpenAI prompt caching
OpenAI introduced automatic prompt caching with a 50% input-token discount on cached prefixes (≥1024 tokens).
- Automatic prompt caching for prefixes ≥1024 tokens
- 50% discount on cached input tokens
- Applied to GPT-4o, o1 family, and GPT-4o mini at launch
- Backfilled
Gemini 1.5 Pro 64% price cut
Google cut Gemini 1.5 Pro input/output pricing by 64% across the standard (<128K) tier.
- Gemini 1.5 Pro standard-tier: $1.25/$5.00 per 1M tokens (down from $3.50/$10.50)
- Long-context tier >128K: $2.50/$10.00 per 1M tokens
- 64% reduction on the mainline tier
- Backfilled
OpenAI o1-preview + o1-mini launch
OpenAI introduced the o1 family of reasoning models with chain-of-thought natively priced into output tokens.
- o1-preview at $15/$60 per 1M tokens
- o1-mini at $3/$12 per 1M tokens
- Reasoning tokens counted against the output meter
- Backfilled
Anthropic prompt caching (beta)
Anthropic launched prompt caching in beta with tiered cache-write / cache-read pricing.
- Cache writes at 1.25× base input rate (5-minute TTL)
- Cache reads at 0.1× base input rate
- Initially limited to 3.5 Sonnet and 3 Haiku
- Backfilled
GPT-4o price cut
OpenAI cut GPT-4o input/output pricing from $5/$15 to $2.50/$10.00 per 1M tokens.
- GPT-4o input: $5.00 → $2.50 per 1M tokens (50% cut)
- GPT-4o output: $15.00 → $10.00 per 1M tokens (33% cut)
- Backfilled
GPT-4o mini launch
OpenAI launched GPT-4o mini at $0.15/$0.60 per 1M tokens, replacing GPT-3.5 Turbo as the entry-tier model.
- GPT-4o mini at $0.15/$0.60 per 1M tokens
- 128K context, 16K max output
- Positioned as the GPT-3.5 Turbo replacement
- Backfilled
Claude 3.5 Sonnet launch
Anthropic released Claude 3.5 Sonnet at Sonnet-tier pricing but with benchmarks above 3 Opus, the first 'mid-tier beats top-tier' Claude release.
- Claude 3.5 Sonnet at $3/$15 per 1M tokens
- 200K context, 8K max output at launch
- Benchmark parity with or above Claude 3 Opus
- Backfilled
Gemini 1.5 Flash launch
Google launched Gemini 1.5 Flash at I/O 2024 as the low-latency sibling of 1.5 Pro, with a 1M-token context.
- Gemini 1.5 Flash at $0.35/$1.05 per 1M tokens at launch
- 1M-token context window
- Tiered pricing introduced above 128K tokens
- Backfilled
GPT-4o launch
OpenAI launched GPT-4o at $5/$15 per 1M tokens, half the price of GPT-4 Turbo with native multimodality.
- GPT-4o at $5/$15 per 1M tokens
- 128K context, 4K max output at launch
- Native audio + vision + text multimodality
- Backfilled
OpenAI Batch API
OpenAI introduced the Batch API with a 50% discount for workloads that can tolerate a 24-hour turnaround.
- Batch API at 50% discount vs real-time pricing
- 24-hour turnaround window
- Separate rate-limit bucket from sync API
- Backfilled
Claude 3 family launch
Anthropic launched the Claude 3 family (Opus, Sonnet, Haiku) with 200K-token context across all tiers.
- Claude 3 Opus at $15/$75 per 1M tokens
- Claude 3 Sonnet at $3/$15 per 1M tokens
- Claude 3 Haiku at $0.25/$1.25 per 1M tokens
- 200K context across all three tiers
- Backfilled
Gemini 1.5 Pro preview
Google previewed Gemini 1.5 Pro with a 1M-token context window, a 5× jump over the 200K ceiling of competing flagships at the time.
- Gemini 1.5 Pro preview at 1M-token context
- Free preview in AI Studio at launch
- Standard/long-context dual-tier pricing introduced
- Backfilled
GPT-3.5 Turbo 50% price cut + text-embedding-3
OpenAI cut GPT-3.5 Turbo input by 50% and output by 25%, and launched text-embedding-3-small / -large.
- GPT-3.5 Turbo input: $0.001 → $0.0005 per 1K tokens
- GPT-3.5 Turbo output: $0.002 → $0.0015 per 1K tokens
- text-embedding-3-small at $0.00002 per 1K tokens (5× cheaper than ada-002)
- text-embedding-3-large at $0.00013 per 1K tokens
- Backfilled
Claude 3.5 Sonnet 8K output
Anthropic raised the max output of Claude 3.5 Sonnet from 4K to 8K tokens without a pricing change.
- Claude 3.5 Sonnet max output raised 4K → 8K tokens
- No change to per-token pricing
- Backfilled
Claude 3 Haiku on Bedrock
Anthropic expanded Claude 3 Haiku availability to Amazon Bedrock at parity pricing with the direct API.
- Claude 3 Haiku GA on Amazon Bedrock
- Pricing parity with direct Anthropic API
- Backfilled
gpt-3.5-turbo-0125 + GPT-4 Turbo 0125-preview
OpenAI released updated GPT-3.5 Turbo and GPT-4 Turbo snapshots fixing the 'lazy model' regression reported in late 2023.
- gpt-3.5-turbo-0125 with improved instruction following
- gpt-4-0125-preview addresses December 'lazy model' reports
- Backfilled
Gemini 1.0 API launch
Google launched the Gemini 1.0 API (Pro tier) as the first Gemini-branded model family, replacing PaLM 2.
- Gemini 1.0 Pro API launched
- First Gemini-branded model family
- Replaced PaLM 2 on Google's generative stack
- Backfilled
Claude 2.1 with 200K context
Anthropic released Claude 2.1 with a 200K-token context window, 2.4× the 80K ceiling of Claude 2.0.
- Claude 2.1 at 200K-token context
- Tool use beta introduced
- Pricing held at $8/$24 per 1M tokens
- Backfilled
GPT-4 Turbo launch (DevDay 2023)
OpenAI launched GPT-4 Turbo at $10/$30 per 1M tokens (3× cheaper input and 2× cheaper output than GPT-4) with a 128K context.
- GPT-4 Turbo at $10/$30 per 1M tokens (vs $30/$60 for GPT-4 8K)
- 128K context window
- JSON mode + seed parameter introduced
- Backfilled
Claude 2 API launch
Anthropic launched the Claude 2 API with 100K-token context and a public developer waitlist.
- Claude 2 at $11.02/$32.68 per 1M tokens
- 100K-token context at launch
- First publicly accessible Claude API
- Backfilled
GPT-3.5 Turbo 16K + function calling
OpenAI released gpt-3.5-turbo-16k and introduced function calling across the GPT-4 and 3.5 families.
- gpt-3.5-turbo-16k at $0.003/$0.004 per 1K tokens (2× base rate)
- Function calling added to GPT-4 and GPT-3.5 Turbo
- gpt-3.5-turbo base input cut by 25% to $0.0015/1K
- Backfilled
GPT-4 launch
OpenAI launched GPT-4 at $30/$60 per 1M tokens (8K context) and $60/$120 per 1M tokens (32K context).
- GPT-4 8K at $30/$60 per 1M tokens
- GPT-4 32K at $60/$120 per 1M tokens
- Initial waitlist-gated developer access
- Backfilled
GPT-3.5 Turbo API launch
OpenAI launched the GPT-3.5 Turbo API at $0.002 per 1K tokens, a 10× price cut vs text-davinci-003.
- GPT-3.5 Turbo at $0.002 per 1K tokens (both in + out)
- 10× price cut versus text-davinci-003
- First ChatGPT-style chat-completions API