OpenAI
o3 pricing
OpenAI's reasoning model. Chain-of-thought built in - you pay for the thinking, not just the answer.
Input
$2.00/ 1M tok
Output
$8.00/ 1M tok
- Context window
- 200K
- Max output
- 100K
- Cached input
- $0.500 / 1M
- Verified
- 2026-04-06
o3 is OpenAI's reasoning tier - the model pays itself to think through problems before answering, and you pay for those reasoning tokens as part of the output. At $2 per 1M input and $8 per 1M output, the headline price is identical to GPT-4.1, but the bill shape is different: o3 routinely produces many more output tokens per response because the chain-of-thought is real and the cost of that thinking lands in the output count.
For problems where reasoning quality matters - math, multi-step code, agentic tool use, careful structured extraction - o3 typically pays for itself despite the inflated output. For chat, summary, and routing, stay on GPT-5 or GPT-5 mini; there's no capability lift to justify the thinking overhead.
Calcis counts o3 input tokens with o200k_base (tiktoken) and predicts response length from the prompt. The prediction accounts for the reasoning token overhead on o-series models so the dollar forecast doesn't under-count.
Estimate your cost on o3
Paste your prompt into the estimator, pick o3, and see the exact dollar cost - input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.
Frequently asked
- What are reasoning tokens and do I pay for them?
- Reasoning tokens are the chain-of-thought o3 generates internally before producing the visible answer. They count as output tokens on your bill, so a response that looks like 200 tokens to you might be billed as 2,000 tokens if the model thought hard.
- How much does o3 cost per request?
- Depends heavily on problem difficulty. A simple query with a 500-token reply runs about $0.006. A hard reasoning problem with 5,000 reasoning tokens + a 500-token answer runs about $0.046 - same input, same visible answer, 7x higher bill.
- Is o3 worth using over GPT-5?
- For math, code, agentic loops, and structured reasoning, usually yes - o3 produces measurably better answers. For chat, summary, and routing, no - GPT-5 matches quality and costs less per response because there's no reasoning overhead.
- What's the cached input discount on o3?
- $0.50 per 1M cached tokens - a 75% discount on the standard $2 input rate. Caching applies to the input side only; reasoning tokens aren't cacheable because they vary per request.
Pricing verified 2026-04-06 from the provider's rate card.