Google Gemini

Google Gemini API Pricing (2026)

The Gemini 3.x previews, Gemini 2.5 stable lineup, and Flash/Flash-Lite tiers priced in one place. Long-context surcharges, cache rates, and batch savings computed at today's prices.

Pricing verified 2026-04-06

Complete rate card

Every Google Gemini model Calcis tracks, sorted cheapest to most expensive on a typical chat-shape request (1k in, 2k out). Rates are USD per 1M tokens.

ModelInput / 1MOutput / 1MContextMax outDetails
Gemini 2.5 Flash-Lite$0.100$0.4001M-View →
Gemini 3.1 Flash-Lite (preview)$0.250$1.501M-View →
Gemini 2.5 Flash$0.300$2.501M-View →
Gemini 3 Flash (preview)$0.500$3.001M-View →
Gemini 2.5 Pro$1.25$10.002M-View →
Gemini 3.1 Pro (preview)$2.00$12.001M-View →

Every Google Gemini model

Each page has a headline price card, cost ladder, a narrative on when to reach for that model, and an FAQ with the common pricing questions answered from the live rate card.

What does Gemini 2.5 Flash-Lite vs Gemini 3.1 Pro (preview) actually cost?

Four workload shapes from tiny to massive. Output is roughly half the input (a typical chat/completion pattern). The ratio column shows the Google Gemini spread at a glance.

ScenarioTokens (in / out)Gemini 2.5 Flash-LiteGemini 3.1 Pro (preview)Ratio
Tiny100 / 50$0.000030$0.0008026.7×
Short request1,000 / 500$0.00030$0.0080026.7×
Long document10,000 / 5,000$0.00300$0.080026.7×
Massive context100,000 / 50,000$0.0300$0.800026.7×

Discounts & modifiers

Long-context surcharge. Gemini 2.5 Pro and Gemini 3.1 Pro bill at elevated rates above 200K input tokens. Gemini 2.5 Pro jumps from $1.25/$10 to $2.50/$15 per 1M once you cross the threshold. Gemini 3.1 Pro does the same starting at $2.00/$12 base and $4.00/$18 long. Flash, Flash-Lite, and the preview Flash tiers have flat pricing across the full context window.

Context caching. Google offers explicit context caching: when you reuse the same context across multiple requests, the cached portion bills at roughly 10% of the input rate - or 25% on the Pro tier. Gemini 2.5 Flash-Lite caches at $0.01/1M, the cheapest cached-input rate published by any provider. Workloads that run the same system prompt plus a rotating user message should always be using context caching.

Batch API. 50% discount on both input and output in exchange for 24-hour turnaround. Stacks with context caching for overnight reports, bulk translation, or research corpus analysis.

The spread between cheapest (Gemini 2.5 Flash-Lite) and most expensive (Gemini 3.1 Pro (preview)) at the same chat-shape request is roughly 29×. Start with Flash or Flash-Lite; reach for Pro only when your evals say you have to.

Which Google Gemini model should I use?

For high-volume cheap tasks (classification, tagging, moderation), Gemini 2.5 Flash-Lite at $0.10 input / $0.40 output per 1M tokens is the cheapest frontier-capable model on the market. Flash-Lite routinely undercuts GPT-5 nano and Claude Haiku on any token-heavy pipeline.

For production chat and RAG, Gemini 2.5 Flash at $0.30/$2.50 per 1M hits the sweet spot: strong instruction-following, 1M context, and a 10-20x price advantage over the flagships. This is the model to default to unless your evals say otherwise.

For reasoning-heavy work (hard analysis, complex tool use, multi-step agents), Gemini 2.5 Pro at $1.25/$10 per 1M (or Gemini 3.1 Pro preview at $2.00/$12) is the tier that actually competes with GPT-5.4 and Claude Opus on capability. The 2M context window on Gemini 2.5 Pro is unique at this price point.

Watch the long-context surcharge: above 200K input tokens, Gemini 2.5 Pro and Gemini 3.1 Pro bill at roughly 2x the base rate. Calcis applies the tier automatically when your prompt crosses the threshold.

Estimate your Google Gemini costs →

Drop a prompt into the estimator, pick any Google Gemini model, and get the exact dollar cost - input tokens counted with Google Gemini's own tokenizer, output tokens predicted by our regression model.

Frequently asked

How much does Gemini 2.5 Pro cost per 1M tokens?
Gemini 2.5 Pro costs $1.25 per 1M input tokens and $10.00 per 1M output under 200K tokens of input. Above 200K input tokens the long-context tier kicks in at $2.50 input / $15.00 output per 1M.
What is the cheapest Google Gemini model?
Gemini 2.5 Flash-Lite is the cheapest Gemini model at $0.10 input / $0.40 output per 1M tokens. It is also the cheapest frontier-capable model tracked on Calcis, full stop.
Does Google offer batch pricing discounts?
Yes. The Gemini Batch API offers a 50% discount on both input and output tokens with 24-hour turnaround. Combine with the context caching discount (up to 75% off repeated context) and batch workloads can run for a tenth of the base rate card price.
How does Google Gemini pricing compare to competitors?
At the flagship tier Gemini 2.5 Pro ($1.25/$10 per 1M) undercuts OpenAI GPT-5.4 ($2.50/$15) and Anthropic Claude Opus 4.7 ($5/$25) significantly. Gemini's gap only widens at the cheap tier: Flash-Lite at $0.10/$0.40 has no competitor in its price bracket.
What is the Gemini free tier?
Google offers a free tier on the Gemini API for most models with rate limits - typically 5-15 requests per minute depending on the model. The free tier is intended for development and low-volume personal use. Production workloads need a billing-enabled project but keep the same rate card listed here.

Pricing verified 2026-04-06 from Google Gemini's published rate card.