OpenAI

GPT-4o mini pricing

The workhorse mini of 2024. 128K context and sub-penny costs on most prompts, still pulling heavy production chat traffic.

Input

$0.15/ 1M tok

Output

$0.60/ 1M tok

Context window
128K
Max output
16K
Cached input
$0.075 / 1M
Verified
2026-04-06

GPT-4o mini landed in 2024 as the “cheap enough to ship anywhere” tier: $0.15 per 1M input and $0.60 per 1M output. Two years later it's still the default for most production chat backends because the ecosystem is mature, the price is predictable, and the 128K context window covers realistic chat and document workflows.

A 1,000-token prompt with a 500-token reply costs about $0.00045. Ten million of those calls a month is $4,500 - a cost most teams can reason about without reaching for a spreadsheet. GPT-5 mini at $0.25 / $2 is newer but 1.67x more expensive on input and 3.3x on output, so migration is a real decision.

Calcis counts GPT-4o mini input tokens with o200k_base (tiktoken), the exact tokenizer OpenAI bills against, and handles multimodal inputs (image tokens) on their own rate card when you include them.

Estimate your cost on GPT-4o mini

Paste your prompt into the estimator, pick GPT-4o mini, and see the exact dollar cost - input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.

Frequently asked

Should I migrate from GPT-4o mini to GPT-5 mini?
Usually no. 4o mini is cheaper ($0.15/$0.60 vs $0.25/$2 per 1M) and well-benchmarked for chat workloads. Upgrade only when you've measured a specific capability lift that justifies the price increase.
How much does GPT-4o mini cost per request?
A 1,000-token prompt with a 500-token reply costs about $0.00045 ($0.00015 input + $0.0003 output). At 10 million requests a month, that's around $4,500.
Is GPT-4o mini multimodal?
Yes - accepts image input at a separate per-token rate. Audio support landed later than the main 4o variant; check the OpenAI rate card for current multimodal tokens.
What's the cached input discount on GPT-4o mini?
$0.075 per 1M cached tokens - a 50% discount on the standard $0.15 input rate. Less aggressive than GPT-5-era models' 90% discounts, but still meaningful at volume.

Pricing verified 2026-04-06 from the provider's rate card.