LLM API Pricing Calculator
Pick a model, enter your token counts, get the exact cost. Compare prices across every major provider in one place.
Configure
Cost per request
Volume projections
All models compared at 1k in / 500 out
| Model | Provider | Input rate | Output rate | Total cost |
|---|---|---|---|---|
| GPT-5 nanocheapest | OpenAI | $0.05/M | $0.4/M | $0.0003 |
| Gemini 2.5 Flash-Lite | $0.1/M | $0.4/M | $0.0003 | |
| GPT-4o mini | OpenAI | $0.15/M | $0.6/M | $0.0004 |
| GPT-5.4 nano | OpenAI | $0.2/M | $1.25/M | $0.0008 |
| Gemini 3.1 Flash-Lite (preview) | $0.25/M | $1.5/M | $0.0010 | |
| GPT-4.1 mini | OpenAI | $0.4/M | $1.6/M | $0.0012 |
| GPT-5 mini | OpenAI | $0.25/M | $2/M | $0.0013 |
| Gemini 2.5 Flash | $0.3/M | $2.5/M | $0.0015 | |
| Gemini 3 Flash (preview) | $0.5/M | $3/M | $0.0020 | |
| GPT-5.4 mini | OpenAI | $0.75/M | $4.5/M | $0.0030 |
| o4-mini | OpenAI | $1.1/M | $4.4/M | $0.0033 |
| Claude Haiku 4.5 | Anthropic | $1/M | $5/M | $0.0035 |
| GPT-4.1 | OpenAI | $2/M | $8/M | $0.0060 |
| o3 | OpenAI | $2/M | $8/M | $0.0060 |
| Gemini 2.5 Pro | $1.25/M | $10/M | $0.0063 | |
| GPT-5 | OpenAI | $1.25/M | $10/M | $0.0063 |
| GPT-4o | OpenAI | $2.5/M | $10/M | $0.0075 |
| Gemini 3.1 Pro (preview) | $2/M | $12/M | $0.0080 | |
| GPT-5.4 | OpenAI | $2.5/M | $15/M | $0.0100 |
| Claude Sonnet 4.6 | Anthropic | $3/M | $15/M | $0.0105 |
| Claude Sonnet 4.5 | Anthropic | $3/M | $15/M | $0.0105 |
| Claude Opus 4.6 | Anthropic | $5/M | $25/M | $0.0175 |
| Claude Opus 4.5 | Anthropic | $5/M | $25/M | $0.0175 |
| Claude Opus 4.7 | Anthropic | $5/M | $25/M | $0.0201 |
| Claude Opus 4.1 | Anthropic | $15/M | $75/M | $0.0525 |
How token pricing works
LLM APIs charge per token, not per request. A token is roughly 3/4 of a word in English. Every API call has two cost components: input tokens (your prompt, system instructions, and any context you send) and output tokens (the model's response).
Output tokens are typically 3-15x more expensive than input tokens depending on the provider. This means the length of the model's response usually dominates your bill, not the length of your prompt.
Some providers offer additional pricing tiers. Google Gemini models switch to a higher rate when your input crosses a long-context threshold (usually 200k tokens). Anthropic and OpenAI offer discounted rates for cached inputs when the same prompt prefix is reused across requests.
The costs shown here are per-request. In production, costs compound quickly: a feature that makes 10 LLM calls per user action at 1,000 daily active users is 300,000 API calls per month. Use the volume projections above to plan accordingly.
Need to estimate costs from an actual prompt? The estimator counts exact tokens using each provider's own tokenizer and predicts the output length before you make the API call.