Changelog

Pricing changelog

Every pricing change, logged. Updated within hours of provider announcements.

  1. Q2 2026 pricing report published

    Full Q2 2026 landscape report with 25-model comparison, head-to-head flagship scenarios, and a machine-readable CSV distribution.

    • Published the Q2 2026 LLM Pricing Report at /reports/llm-pricing-q2-2026
    • Full 25-model landscape with head-to-head flagship scenarios and five workload cost calculations
    • Machine-readable dataset available at /data/llm-pricing-q2-2026.csv (RFC 4180, columns match ModelPricing)
    • Dataset + Report JSON-LD so search engines can surface the CSV distribution

    Source

  2. Claude Opus 4.7 added

    Anthropic shipped Claude Opus 4.7 at $5/$25 per 1M tokens with a retrained tokenizer that emits 1.0–1.35× more tokens per unit of text.

    • Added Claude Opus 4.7 ($5/$25 per 1M tokens, 1M context)
    • Added tokenizer multiplier (1.15×) for Opus 4.7 new tokenizer
    • Added xhigh effort level for Anthropic models
    • Added effort level cost modelling for all Anthropic and OpenAI models

    Source

  3. Model expansion + pricing updates

    Ingested the full 2025–2026 model catalogue: GPT-5 and GPT-5.4 families, o3/o4-mini, Gemini 3.1 previews, and the remaining current Anthropic tiers.

    • Added GPT-5.4, GPT-5.4 mini, GPT-5.4 nano
    • Added GPT-5, GPT-5 mini, GPT-5 nano
    • Added GPT-4.1, GPT-4.1 mini
    • Added o3, o4-mini
    • Added Gemini 3.1 Pro, Gemini 3 Flash, Gemini 3.1 Flash-Lite (preview)
    • Added Gemini 2.5 Flash-Lite
    • Updated all Anthropic models to current pricing

    Source

  4. Initial pricing database

    Calcis launched with an initial 24-model pricing database spanning OpenAI, Anthropic, and Google.

    • Launched with 24 models across OpenAI, Anthropic, and Google
    • Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
    • GPT-4o, GPT-4o mini
    • Gemini 2.5 Pro, Gemini 2.5 Flash

    Source

  5. Backfilled

    Claude Opus 4.6 released

    Anthropic shipped Claude Opus 4.6 as the flagship tier at $5/$25 per 1M tokens with a 1M-token context window.

    • Claude Opus 4.6 released at $5/$25 per 1M tokens
    • 1M-token context window with 128K max output

    Source

  6. Backfilled

    OpenAI GPT-5.4 family launch

    OpenAI released GPT-5.4, GPT-5.4 mini, and GPT-5.4 nano with the same token window as GPT-5 but lower blended cost on typical workloads.

    • GPT-5.4 shipped at $2.50/$15.00 per 1M tokens, 1.05M context
    • GPT-5.4 mini at $0.75/$4.50 per 1M tokens
    • GPT-5.4 nano at $0.20/$1.25 per 1M tokens

    Source

  7. Backfilled

    Gemini 3 Flash (preview)

    Google launched Gemini 3 Flash in preview at $0.50/$3.00 per 1M tokens with a 1M-token context.

    • Gemini 3 Flash preview at $0.50/$3.00 per 1M tokens
    • Cached input priced at $0.05 per 1M tokens

    Source

  8. Backfilled

    Gemini 3.1 Pro (preview)

    Google introduced Gemini 3.1 Pro in preview with tiered long-context pricing above the 200K-token threshold.

    • Gemini 3.1 Pro preview at $2.00/$12.00 per 1M tokens
    • Long-context tier (>200K tokens): $4.00/$18.00 per 1M
    • Cached input at $0.20 per 1M tokens

    Source

  9. Backfilled

    Gemini 3.1 Flash-Lite (preview)

    Google shipped Gemini 3.1 Flash-Lite in preview as the lowest-tier model in the 3.x family at $0.25/$1.50 per 1M tokens.

    • Gemini 3.1 Flash-Lite preview at $0.25/$1.50 per 1M tokens
    • 1M-token context with aggressive cache pricing at $0.025/1M

    Source

  10. Backfilled

    Claude Sonnet 4.6 released

    Anthropic released Claude Sonnet 4.6 at $3/$15 per 1M tokens with a 1M-token context window for standard tier users.

    • Sonnet 4.6 released at $3/$15 per 1M tokens
    • Context window extended to 1M tokens

    Source

  11. Backfilled

    Claude Opus 4.5 released

    Anthropic released Claude Opus 4.5 at $5/$25 per 1M tokens, a steep cut from Opus 4.1's $15/$75 rate.

    • Opus 4.5 at $5/$25 per 1M tokens (down from $15/$75 for 4.1)
    • 200K-token context, 64K max output

    Source

  12. Backfilled

    Claude Haiku 4.5 released

    Anthropic refreshed the Haiku tier at $1/$5 per 1M tokens with a 200K context window.

    • Haiku 4.5 at $1/$5 per 1M tokens
    • 200K-token context, 64K max output

    Source

  13. Backfilled

    Claude Sonnet 4.5 released

    Anthropic released Claude Sonnet 4.5 at $3/$15 per 1M tokens.

    • Sonnet 4.5 at $3/$15 per 1M tokens
    • 200K-token context, 64K max output

    Source

  14. Backfilled

    Gemini 2.5 Flash-Lite GA

    Google graduated Gemini 2.5 Flash-Lite to general availability at $0.10/$0.40 per 1M tokens, the cheapest tier in the 2.5 line.

    • Gemini 2.5 Flash-Lite GA at $0.10/$0.40 per 1M tokens
    • Cached input at $0.01 per 1M tokens

    Source

  15. Backfilled

    OpenAI GPT-5 family launch

    OpenAI launched GPT-5, GPT-5 mini, and GPT-5 nano as the post-GPT-4.1 flagship line.

    • GPT-5 at $1.25/$10.00 per 1M tokens, 400K context
    • GPT-5 mini at $0.25/$2.00 per 1M tokens
    • GPT-5 nano at $0.05/$0.40 per 1M tokens
    • Automatic cached-input discount: 10× off input rate

    Source

  16. Backfilled

    Claude Opus 4.1 released

    Anthropic released Claude Opus 4.1 at $15/$75 per 1M tokens, the last entry in the 4.x top-tier pricing before the 4.5 reset.

    • Opus 4.1 at $15/$75 per 1M tokens
    • 200K context, 32K max output

    Source

  17. Backfilled

    Gemini 2.5 Flash GA pricing

    Google finalized Gemini 2.5 Flash GA pricing at $0.30/$2.50 per 1M tokens with a 1M-token context.

    • Gemini 2.5 Flash GA at $0.30/$2.50 per 1M tokens
    • 1M-token context window

    Source

  18. Backfilled

    Claude 4: Opus 4 and Sonnet 4 launch

    Anthropic introduced the Claude 4 family with Opus 4 and Sonnet 4, a hybrid reasoning architecture across both tiers.

    • Claude Opus 4 and Sonnet 4 launched
    • First Claude family with unified hybrid reasoning across tiers
    • Initial Opus 4 at $15/$75; Sonnet 4 at $3/$15

    Source

  19. Backfilled

    OpenAI o3 and o4-mini launch

    OpenAI released o3 and o4-mini as the next generation of reasoning models following the o1/o3-mini lineage.

    • o3 at $2.00/$8.00 per 1M tokens, 200K context, 100K max output
    • o4-mini at $1.10/$4.40 per 1M tokens
    • Both priced with cached-input 4× discount

    Source

  20. Backfilled

    OpenAI GPT-4.1 family launch

    OpenAI launched GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano with an expanded 1M-token context window.

    • GPT-4.1 at $2.00/$8.00 per 1M tokens, ~1M context
    • GPT-4.1 mini at $0.40/$1.60 per 1M tokens
    • First OpenAI flagship at 1M-token context

    Source

  21. Backfilled

    Gemini 2.5 Pro launch

    Google launched Gemini 2.5 Pro with a 2M-token context window and tiered pricing above 200K tokens.

    • Gemini 2.5 Pro at $1.25/$10.00 per 1M tokens (standard tier)
    • Long-context tier (>200K): $2.50/$15.00 per 1M tokens
    • 2M-token context window

    Source

  22. Backfilled

    Claude 3.7 Sonnet with extended thinking

    Anthropic released Claude 3.7 Sonnet, the first Claude model to expose extended-thinking as a user-controllable parameter.

    • Claude 3.7 Sonnet at $3/$15 per 1M tokens
    • Introduced extended-thinking budget parameter
    • 200K context, 64K max output

    Source

  23. Backfilled

    Gemini 2.0 Pro Experimental

    Google exposed Gemini 2.0 Pro Experimental via AI Studio ahead of the 2.5 line.

    • Gemini 2.0 Pro Experimental available in AI Studio
    • Free tier during experimental window

    Source

  24. Backfilled

    OpenAI o3-mini launch

    OpenAI released o3-mini as the second-generation reasoning model, replacing o1-mini across most reasoning benchmarks.

    • o3-mini launched with reasoning-effort parameter
    • Pricing parity with o1-mini at launch

    Source

  25. Backfilled

    OpenAI o1 full release

    OpenAI released the full o1 model (graduating from o1-preview) and introduced o1-pro mode in ChatGPT.

    • o1 full release at $15/$60 per 1M tokens
    • o1-pro mode available in ChatGPT Pro
    • Reasoning-effort parameter added to the API

    Source

  26. Backfilled

    Gemini 2.0 Flash launch

    Google launched Gemini 2.0 Flash as the first of the Gemini 2.0 line, with native tool use and multimodal live streaming.

    • Gemini 2.0 Flash launched with native tool use
    • Multimodal live streaming API
    • Free tier in AI Studio during experimental window

    Source

  27. Backfilled

    Claude 3.5 Haiku launch

    Anthropic launched Claude 3.5 Haiku at $1/$5 per 1M tokens, a 4× price increase over 3 Haiku but with substantially stronger benchmarks.

    • Claude 3.5 Haiku at $1/$5 per 1M tokens
    • 200K context, 8K max output at launch
    • Price increase vs 3 Haiku ($0.25/$1.25) drew community pushback

    Source

  28. Backfilled

    Claude 3.5 Sonnet v2 + computer use

    Anthropic refreshed Claude 3.5 Sonnet and introduced the computer use beta API for agent workflows.

    • Claude 3.5 Sonnet v2 ('claude-3-5-sonnet-20241022')
    • Computer use beta API launched
    • Pricing held at $3/$15 per 1M tokens

    Source

  29. Backfilled

    Anthropic Message Batches API

    Anthropic launched the Message Batches API with a 50% discount for async workloads completed within 24 hours.

    • Message Batches API at 50% discount
    • 24-hour turnaround window

    Source

  30. Backfilled

    OpenAI prompt caching

    OpenAI introduced automatic prompt caching with a 50% input-token discount on cached prefixes (≥1024 tokens).

    • Automatic prompt caching for prefixes ≥1024 tokens
    • 50% discount on cached input tokens
    • Applied to GPT-4o, o1 family, and GPT-4o mini at launch

    Source

  31. Backfilled

    Gemini 1.5 Pro 64% price cut

    Google cut Gemini 1.5 Pro input/output pricing by 64% across the standard (<128K) tier.

    • Gemini 1.5 Pro standard-tier: $1.25/$5.00 per 1M tokens (down from $3.50/$10.50)
    • Long-context tier >128K: $2.50/$10.00 per 1M tokens
    • 64% reduction on the mainline tier

    Source

  32. Backfilled

    OpenAI o1-preview + o1-mini launch

    OpenAI introduced the o1 family of reasoning models with chain-of-thought natively priced into output tokens.

    • o1-preview at $15/$60 per 1M tokens
    • o1-mini at $3/$12 per 1M tokens
    • Reasoning tokens counted against the output meter

    Source

  33. Backfilled

    Anthropic prompt caching (beta)

    Anthropic launched prompt caching in beta with tiered cache-write / cache-read pricing.

    • Cache writes at 1.25× base input rate (5-minute TTL)
    • Cache reads at 0.1× base input rate
    • Initially limited to 3.5 Sonnet and 3 Haiku

    Source

  34. Backfilled

    GPT-4o price cut

    OpenAI cut GPT-4o input/output pricing from $5/$15 to $2.50/$10.00 per 1M tokens.

    • GPT-4o input: $5.00 → $2.50 per 1M tokens (50% cut)
    • GPT-4o output: $15.00 → $10.00 per 1M tokens (33% cut)

    Source

  35. Backfilled

    GPT-4o mini launch

    OpenAI launched GPT-4o mini at $0.15/$0.60 per 1M tokens, replacing GPT-3.5 Turbo as the entry-tier model.

    • GPT-4o mini at $0.15/$0.60 per 1M tokens
    • 128K context, 16K max output
    • Positioned as the GPT-3.5 Turbo replacement

    Source

  36. Backfilled

    Claude 3.5 Sonnet launch

    Anthropic released Claude 3.5 Sonnet at Sonnet-tier pricing but with benchmarks above 3 Opus, the first 'mid-tier beats top-tier' Claude release.

    • Claude 3.5 Sonnet at $3/$15 per 1M tokens
    • 200K context, 8K max output at launch
    • Benchmark parity with or above Claude 3 Opus

    Source

  37. Backfilled

    Gemini 1.5 Flash launch

    Google launched Gemini 1.5 Flash at I/O 2024 as the low-latency sibling of 1.5 Pro, with a 1M-token context.

    • Gemini 1.5 Flash at $0.35/$1.05 per 1M tokens at launch
    • 1M-token context window
    • Tiered pricing introduced above 128K tokens

    Source

  38. Backfilled

    GPT-4o launch

    OpenAI launched GPT-4o at $5/$15 per 1M tokens, half the price of GPT-4 Turbo with native multimodality.

    • GPT-4o at $5/$15 per 1M tokens
    • 128K context, 4K max output at launch
    • Native audio + vision + text multimodality

    Source

  39. Backfilled

    OpenAI Batch API

    OpenAI introduced the Batch API with a 50% discount for workloads that can tolerate a 24-hour turnaround.

    • Batch API at 50% discount vs real-time pricing
    • 24-hour turnaround window
    • Separate rate-limit bucket from sync API

    Source

  40. Backfilled

    Claude 3 family launch

    Anthropic launched the Claude 3 family (Opus, Sonnet, Haiku) with 200K-token context across all tiers.

    • Claude 3 Opus at $15/$75 per 1M tokens
    • Claude 3 Sonnet at $3/$15 per 1M tokens
    • Claude 3 Haiku at $0.25/$1.25 per 1M tokens
    • 200K context across all three tiers

    Source

  41. Backfilled

    Gemini 1.5 Pro preview

    Google previewed Gemini 1.5 Pro with a 1M-token context window, a 5× jump over the 200K ceiling of competing flagships at the time.

    • Gemini 1.5 Pro preview at 1M-token context
    • Free preview in AI Studio at launch
    • Standard/long-context dual-tier pricing introduced

    Source

  42. Backfilled

    GPT-3.5 Turbo 50% price cut + text-embedding-3

    OpenAI cut GPT-3.5 Turbo input by 50% and output by 25%, and launched text-embedding-3-small / -large.

    • GPT-3.5 Turbo input: $0.001 → $0.0005 per 1K tokens
    • GPT-3.5 Turbo output: $0.002 → $0.0015 per 1K tokens
    • text-embedding-3-small at $0.00002 per 1K tokens (5× cheaper than ada-002)
    • text-embedding-3-large at $0.00013 per 1K tokens

    Source

  43. Backfilled

    Claude 3.5 Sonnet 8K output

    Anthropic raised the max output of Claude 3.5 Sonnet from 4K to 8K tokens without a pricing change.

    • Claude 3.5 Sonnet max output raised 4K → 8K tokens
    • No change to per-token pricing

    Source

  44. Backfilled

    Claude 3 Haiku on Bedrock

    Anthropic expanded Claude 3 Haiku availability to Amazon Bedrock at parity pricing with the direct API.

    • Claude 3 Haiku GA on Amazon Bedrock
    • Pricing parity with direct Anthropic API

    Source

  45. Backfilled

    gpt-3.5-turbo-0125 + GPT-4 Turbo 0125-preview

    OpenAI released updated GPT-3.5 Turbo and GPT-4 Turbo snapshots fixing the 'lazy model' regression reported in late 2023.

    • gpt-3.5-turbo-0125 with improved instruction following
    • gpt-4-0125-preview addresses December 'lazy model' reports

    Source

  46. Backfilled

    Gemini 1.0 API launch

    Google launched the Gemini 1.0 API (Pro tier) as the first Gemini-branded model family, replacing PaLM 2.

    • Gemini 1.0 Pro API launched
    • First Gemini-branded model family
    • Replaced PaLM 2 on Google's generative stack

    Source

  47. Backfilled

    Claude 2.1 with 200K context

    Anthropic released Claude 2.1 with a 200K-token context window, 2.4× the 80K ceiling of Claude 2.0.

    • Claude 2.1 at 200K-token context
    • Tool use beta introduced
    • Pricing held at $8/$24 per 1M tokens

    Source

  48. Backfilled

    GPT-4 Turbo launch (DevDay 2023)

    OpenAI launched GPT-4 Turbo at $10/$30 per 1M tokens (3× cheaper input and 2× cheaper output than GPT-4) with a 128K context.

    • GPT-4 Turbo at $10/$30 per 1M tokens (vs $30/$60 for GPT-4 8K)
    • 128K context window
    • JSON mode + seed parameter introduced

    Source

  49. Backfilled

    Claude 2 API launch

    Anthropic launched the Claude 2 API with 100K-token context and a public developer waitlist.

    • Claude 2 at $11.02/$32.68 per 1M tokens
    • 100K-token context at launch
    • First publicly accessible Claude API

    Source

  50. Backfilled

    GPT-3.5 Turbo 16K + function calling

    OpenAI released gpt-3.5-turbo-16k and introduced function calling across the GPT-4 and 3.5 families.

    • gpt-3.5-turbo-16k at $0.003/$0.004 per 1K tokens (2× base rate)
    • Function calling added to GPT-4 and GPT-3.5 Turbo
    • gpt-3.5-turbo base input cut by 25% to $0.0015/1K

    Source

  51. Backfilled

    GPT-4 launch

    OpenAI launched GPT-4 at $30/$60 per 1M tokens (8K context) and $60/$120 per 1M tokens (32K context).

    • GPT-4 8K at $30/$60 per 1M tokens
    • GPT-4 32K at $60/$120 per 1M tokens
    • Initial waitlist-gated developer access

    Source

  52. Backfilled

    GPT-3.5 Turbo API launch

    OpenAI launched the GPT-3.5 Turbo API at $0.002 per 1K tokens, a 10× price cut vs text-davinci-003.

    • GPT-3.5 Turbo at $0.002 per 1K tokens (both in + out)
    • 10× price cut versus text-davinci-003
    • First ChatGPT-style chat-completions API

    Source