Cost calculator
Classification API cost calculator
Tagging, labelling, routing, and moderation workloads. Short inputs, tiny outputs, huge volume — exactly the shape the cheapest models were built for.
Classification is the workload where the cheapest model always wins. A typical classifier sees 50-500 tokens of input and emits 5-20 tokens of output (a label or short JSON). At that shape the output cost is rounding error, and the question is purely which model has the lowest input rate at the quality bar you need.
The defaults below assume a 200-token input (a tweet, email subject, or short ticket) and a 10-token structured output. Tune upward if you're classifying full documents.
Workload parameters
Costs update live across every model in the table below.
Top 8 cheapest models for this workload
Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).
| Model | Per request | Per day | Monthly (×30) | Details |
|---|---|---|---|---|
| GPT-5 nanoopenai | $0.00001 | $0.14 | $4.20 | View → |
| Gemini 2.5 Flash-Litegoogle | $0.00002 | $0.24 | $7.20 | View → |
| GPT-4o miniopenai | $0.00004 | $0.36 | $10.80 | View → |
| GPT-5.4 nanoopenai | $0.00005 | $0.53 | $15.75 | View → |
| Gemini 3.1 Flash-Lite (preview)google | $0.00007 | $0.65 | $19.50 | View → |
| GPT-5 miniopenai | $0.00007 | $0.70 | $21.00 | View → |
| Gemini 2.5 Flashgoogle | $0.00009 | $0.85 | $25.50 | View → |
| GPT-4.1 miniopenai | $0.00010 | $0.96 | $28.80 | View → |
Scaling GPT-5 nano
What the cheapest option costs as your traffic grows.
baseline
$4.20
per month
2× volume
$8.40
per month
5× volume
$21.00
per month
10× volume
$42.00
per month
Optimization tips
- Structured outputs force the model to emit just the label. OpenAI's JSON mode and Anthropic's tool calling cut output tokens to the minimum, killing runaway explanations.
- Batch API is a huge win here — 50% off on async classification jobs where a few hours' latency doesn't matter.
- Use Gemini 2.5 Flash-Lite or GPT-5-nano as the default. Classification rarely rewards frontier models; quality differences show up only on ambiguous edge cases.
- Distil if volume is massive. Above 10M classifications/month, training a small classifier on LLM-labelled data often beats running the LLM at inference time.
Frequently asked
What's the cheapest model for classification?
GPT-5-nano and Gemini 2.5 Flash-Lite are dramatically cheaper per input token than any competitor and handle most classification workloads with acceptable accuracy. Step up to GPT-5-mini or Claude Haiku if you see failure modes on tricky cases.
Should I use structured output / tool calling?
Always, for classification. Free-form output invites the model to "explain its reasoning," which burns output tokens and makes downstream parsing fragile. JSON schema or tool calling locks the output and usually cuts it to under 10 tokens.
Is batch API worth it for classification?
Yes, usually. 50% off from OpenAI and Anthropic in exchange for up to 24-hour delivery. Classification is almost always async — new tickets, new posts to moderate — so batch is a free win.
When should I train a smaller classifier instead?
At roughly 10M items/month or $3k/month spend on classification, training a DistilBERT or similar small model on LLM-generated labels usually pays back within a quarter and drops inference cost by 100×+. Below that scale, stick with an LLM.
Does calling the LLM via batch affect classification accuracy?
No — same model, same weights, same output quality. Batch only affects delivery latency and price. If quality is identical to real-time inference, the 50% discount is almost always worth the wait for async workloads.
Need a precise number for your actual prompt?
Paste a real prompt into the estimator and get token-accurate costs — input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.