Cost calculator

Summarisation API cost calculator

Cost out a batch of document summaries across every major LLM. Long input, short output: the inverse of a chatbot.

Summarisation workloads are input-heavy: you feed the model a long document and ask for a short digest. That makes input cost the dominant line item and flips the model ranking: the cheapest chatbot isn't necessarily the cheapest summarizer.

The defaults assume a 10,000-token document (roughly a 30-page PDF) with a 500-token summary. Bump the input if you're summarising transcripts, calls, or papers; bump the output if you need full multi-section digests.

Workload parameters

Costs update live across every model in the table below.

Top 8 cheapest models for this workload

Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).

ModelPer requestPer dayMonthly (×30)Details
GPT-5 nanoopenai$0.00070$0.35$10.50View →
Gemini 2.5 Flash-Litegoogle$0.00120$0.60$18.00View →
GPT-4o miniopenai$0.00180$0.90$27.00View →
GPT-5.4 nanoopenai$0.00263$1.31$39.38View →
Gemini 3.1 Flash-Lite (preview)google$0.00325$1.63$48.75View →
GPT-5.5 nanoopenai$0.00325$1.63$48.75View →
GPT-5 miniopenai$0.00350$1.75$52.50View →
Gemini 2.5 Flashgoogle$0.00425$2.13$63.75View →

Scaling GPT-5 nano

What the cheapest option costs as your traffic grows.

baseline

$10.50

per month

2× volume

$21.00

per month

5× volume

$52.50

per month

10× volume

$105.00

per month

Optimisation tips

  • Batch API cuts cost 50% on OpenAI and Anthropic: perfect for async summarisation jobs where a 24-hour turnaround is fine.
  • For Gemini, watch the 200K long-context threshold: prices double above it. Chunk-and-map large documents under 200K per call to stay in the cheap tier.
  • Summarise iteratively for very long docs: chunk → summarise each chunk → summarise the summaries. Cheaper than a single long-context call and often higher quality.
  • Gemini 2.5 Flash-Lite is unreasonably cheap for extractive summarisation. If you don't need abstractive creativity, it's often 10-20× cheaper than Claude or GPT with acceptable quality.

Frequently asked

What's the cheapest model for summarisation?

For pure extractive work, Gemini 2.5 Flash-Lite is almost always cheapest: its input rate is dramatically below every competitor. For abstractive or multi-document synthesis, Sonnet 4.6 and GPT-5-mini offer the best cost/quality balance.

How do I handle documents longer than the context window?

Chunk the document, summarise each chunk, then summarise the summaries (map-reduce). This is usually cheaper AND higher quality than cramming into a long-context tier: the model focuses better on shorter inputs.

Does batch API help for summarisation?

Yes: 50% off on OpenAI and Anthropic if you can wait up to 24 hours for results. Summarisation is typically asynchronous, so it's the ideal workload for batch.

Should I use prompt caching?

Only if the same document is summarised multiple times (with different prompts for different audiences, say). One-shot summarisation can't benefit from caching because each document is different.

How do I estimate output token count?

Target a ratio: e.g. 5% of input length for an executive summary, 15-20% for a detailed digest. The calculator multiplies your output estimate by the rate; err on the high side if in doubt.

Need a precise number for your actual prompt?

Paste a real prompt into the estimator and get token-accurate costs: input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.