Cost calculator
Summarization API cost calculator
Cost out a batch of document summaries across every major LLM. Long input, short output — the inverse of a chatbot.
Summarization workloads are input-heavy: you feed the model a long document and ask for a short digest. That makes input cost the dominant line item and flips the model ranking — the cheapest chatbot isn't necessarily the cheapest summarizer.
The defaults assume a 10,000-token document (roughly a 30-page PDF) with a 500-token summary. Bump the input if you're summarizing transcripts, calls, or papers; bump the output if you need full multi-section digests.
Workload parameters
Costs update live across every model in the table below.
Top 8 cheapest models for this workload
Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).
| Model | Per request | Per day | Monthly (×30) | Details |
|---|---|---|---|---|
| GPT-5 nanoopenai | $0.00070 | $0.35 | $10.50 | View → |
| Gemini 2.5 Flash-Litegoogle | $0.00120 | $0.60 | $18.00 | View → |
| GPT-4o miniopenai | $0.00180 | $0.90 | $27.00 | View → |
| GPT-5.4 nanoopenai | $0.00263 | $1.31 | $39.38 | View → |
| Gemini 3.1 Flash-Lite (preview)google | $0.00325 | $1.63 | $48.75 | View → |
| GPT-5 miniopenai | $0.00350 | $1.75 | $52.50 | View → |
| Gemini 2.5 Flashgoogle | $0.00425 | $2.13 | $63.75 | View → |
| GPT-4.1 miniopenai | $0.00480 | $2.40 | $72.00 | View → |
Scaling GPT-5 nano
What the cheapest option costs as your traffic grows.
baseline
$10.50
per month
2× volume
$21.00
per month
5× volume
$52.50
per month
10× volume
$105.00
per month
Optimization tips
- Batch API cuts cost 50% on OpenAI and Anthropic — perfect for async summarization jobs where a 24-hour turnaround is fine.
- For Gemini, watch the 200K long-context threshold: prices double above it. Chunk-and-map large documents under 200K per call to stay in the cheap tier.
- Summarize iteratively for very long docs: chunk → summarize each chunk → summarize the summaries. Cheaper than a single long-context call and often higher quality.
- Gemini 2.5 Flash-Lite is unreasonably cheap for extractive summarization. If you don't need abstractive creativity, it's often 10-20× cheaper than Claude or GPT with acceptable quality.
Frequently asked
What's the cheapest model for summarization?
For pure extractive work, Gemini 2.5 Flash-Lite is almost always cheapest — its input rate is dramatically below every competitor. For abstractive or multi-document synthesis, Sonnet 4.6 and GPT-5-mini offer the best cost/quality balance.
How do I handle documents longer than the context window?
Chunk the document, summarize each chunk, then summarize the summaries (map-reduce). This is usually cheaper AND higher quality than cramming into a long-context tier — the model focuses better on shorter inputs.
Does batch API help for summarization?
Yes — 50% off on OpenAI and Anthropic if you can wait up to 24 hours for results. Summarization is typically asynchronous, so it's the ideal workload for batch.
Should I use prompt caching?
Only if the same document is summarized multiple times (with different prompts for different audiences, say). One-shot summarization can't benefit from caching because each document is different.
How do I estimate output token count?
Target a ratio — e.g. 5% of input length for an executive summary, 15-20% for a detailed digest. The calculator multiplies your output estimate by the rate; err on the high side if in doubt.
Need a precise number for your actual prompt?
Paste a real prompt into the estimator and get token-accurate costs — input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.