Question 1

When should I pick o4-mini over o3?

Accepted Answer

When you want reasoning capability but can't absorb o3's cost. o4-mini is ~2x cheaper on both input and output. Capability gap is real but workload-dependent - benchmark both on your specific task before deciding.

Question 2

How much does o4-mini cost per request?

Accepted Answer

Depends on how much the model thinks. A simple query with a 500-token visible answer runs about $0.003. A reasoning-heavy problem with 3,000 reasoning tokens + 500-token answer runs about $0.017 - same input, same visible answer, 5x higher bill.

Question 3

Are o4-mini reasoning tokens cacheable?

Accepted Answer

No - reasoning tokens vary per request, so they can't be cached. The cached input discount ($0.275 per 1M, 75% off) applies only to the input side of your request.

Question 4

Is o4-mini cheaper than GPT-5?

Accepted Answer

On headline rates, o4-mini ($1.10 input) is cheaper than GPT-5 ($1.25 input) and roughly matches on output. But reasoning tokens push the effective output count higher on o-series, so end-to-end cost usually lands above GPT-5 for equivalent workloads.

o4-mini pricing

Estimate your cost on o4-mini

Frequently asked