Cost calculator

Summarization API cost calculator

Cost out a batch of document summaries across every major LLM. Long input, short output — the inverse of a chatbot.

Summarization workloads are input-heavy: you feed the model a long document and ask for a short digest. That makes input cost the dominant line item and flips the model ranking — the cheapest chatbot isn't necessarily the cheapest summarizer.

The defaults assume a 10,000-token document (roughly a 30-page PDF) with a 500-token summary. Bump the input if you're summarizing transcripts, calls, or papers; bump the output if you need full multi-section digests.

Workload parameters

Costs update live across every model in the table below.

Top 8 cheapest models for this workload

Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).

ModelPer requestPer dayMonthly (×30)Details
GPT-5 nanoopenai$0.00070$0.35$10.50View →
Gemini 2.5 Flash-Litegoogle$0.00120$0.60$18.00View →
GPT-4o miniopenai$0.00180$0.90$27.00View →
GPT-5.4 nanoopenai$0.00263$1.31$39.38View →
Gemini 3.1 Flash-Lite (preview)google$0.00325$1.63$48.75View →
GPT-5 miniopenai$0.00350$1.75$52.50View →
Gemini 2.5 Flashgoogle$0.00425$2.13$63.75View →
GPT-4.1 miniopenai$0.00480$2.40$72.00View →

Scaling GPT-5 nano

What the cheapest option costs as your traffic grows.

baseline

$10.50

per month

2× volume

$21.00

per month

5× volume

$52.50

per month

10× volume

$105.00

per month

Optimization tips

  • Batch API cuts cost 50% on OpenAI and Anthropic — perfect for async summarization jobs where a 24-hour turnaround is fine.
  • For Gemini, watch the 200K long-context threshold: prices double above it. Chunk-and-map large documents under 200K per call to stay in the cheap tier.
  • Summarize iteratively for very long docs: chunk → summarize each chunk → summarize the summaries. Cheaper than a single long-context call and often higher quality.
  • Gemini 2.5 Flash-Lite is unreasonably cheap for extractive summarization. If you don't need abstractive creativity, it's often 10-20× cheaper than Claude or GPT with acceptable quality.

Frequently asked

What's the cheapest model for summarization?

For pure extractive work, Gemini 2.5 Flash-Lite is almost always cheapest — its input rate is dramatically below every competitor. For abstractive or multi-document synthesis, Sonnet 4.6 and GPT-5-mini offer the best cost/quality balance.

How do I handle documents longer than the context window?

Chunk the document, summarize each chunk, then summarize the summaries (map-reduce). This is usually cheaper AND higher quality than cramming into a long-context tier — the model focuses better on shorter inputs.

Does batch API help for summarization?

Yes — 50% off on OpenAI and Anthropic if you can wait up to 24 hours for results. Summarization is typically asynchronous, so it's the ideal workload for batch.

Should I use prompt caching?

Only if the same document is summarized multiple times (with different prompts for different audiences, say). One-shot summarization can't benefit from caching because each document is different.

How do I estimate output token count?

Target a ratio — e.g. 5% of input length for an executive summary, 15-20% for a detailed digest. The calculator multiplies your output estimate by the rate; err on the high side if in doubt.

Need a precise number for your actual prompt?

Paste a real prompt into the estimator and get token-accurate costs — input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.