Question 1

What's the cheapest model for classification?

Accepted Answer

GPT-5-nano and Gemini 2.5 Flash-Lite are dramatically cheaper per input token than any competitor and handle most classification workloads with acceptable accuracy. Step up to GPT-5-mini or Claude Haiku if you see failure modes on tricky cases.

Question 2

Should I use structured output / tool calling?

Accepted Answer

Always, for classification. Free-form output invites the model to "explain its reasoning," which burns output tokens and makes downstream parsing fragile. JSON schema or tool calling locks the output and usually cuts it to under 10 tokens.

Question 3

Is batch API worth it for classification?

Accepted Answer

Yes, usually. 50% off from OpenAI and Anthropic in exchange for up to 24-hour delivery. Classification is almost always async: new tickets, new posts to moderate: so batch is a free win.

Question 4

When should I train a smaller classifier instead?

Accepted Answer

At roughly 10M items/month or $3k/month spend on classification, training a DistilBERT or similar small model on LLM-generated labels usually pays back within a quarter and drops inference cost by 100×+. Below that scale, stick with an LLM.

Question 5

Does calling the LLM via batch affect classification accuracy?

Accepted Answer

No: same model, same weights, same output quality. Batch only affects delivery latency and price. If quality is identical to real-time inference, the 50% discount is almost always worth the wait for async workloads.

Model	Per request	Per day	Monthly (×30)	Details
GPT-5 nanoopenai	$0.00001	$0.14	$4.20	View →
Gemini 2.5 Flash-Litegoogle	$0.00002	$0.24	$7.20	View →
GPT-4o miniopenai	$0.00004	$0.36	$10.80	View →
GPT-5.4 nanoopenai	$0.00005	$0.53	$15.75	View →
Gemini 3.1 Flash-Lite (preview)google	$0.00007	$0.65	$19.50	View →
GPT-5.5 nanoopenai	$0.00007	$0.65	$19.50	View →
GPT-5 miniopenai	$0.00007	$0.70	$21.00	View →
Gemini 2.5 Flashgoogle	$0.00009	$0.85	$25.50	View →

Classification API cost calculator

Workload parameters

Top 8 cheapest models for this workload

Scaling GPT-5 nano

Optimisation tips

Frequently asked

Need a precise number for your actual prompt?