Question 1

What's the cheapest model for summarisation?

Accepted Answer

For pure extractive work, Gemini 2.5 Flash-Lite is almost always cheapest: its input rate is dramatically below every competitor. For abstractive or multi-document synthesis, Sonnet 4.6 and GPT-5-mini offer the best cost/quality balance.

Question 2

How do I handle documents longer than the context window?

Accepted Answer

Chunk the document, summarise each chunk, then summarise the summaries (map-reduce). This is usually cheaper AND higher quality than cramming into a long-context tier: the model focuses better on shorter inputs.

Question 3

Does batch API help for summarisation?

Accepted Answer

Yes: 50% off on OpenAI and Anthropic if you can wait up to 24 hours for results. Summarisation is typically asynchronous, so it's the ideal workload for batch.

Question 4

Should I use prompt caching?

Accepted Answer

Only if the same document is summarised multiple times (with different prompts for different audiences, say). One-shot summarisation can't benefit from caching because each document is different.

Question 5

How do I estimate output token count?

Accepted Answer

Target a ratio: e.g. 5% of input length for an executive summary, 15-20% for a detailed digest. The calculator multiplies your output estimate by the rate; err on the high side if in doubt.

Model	Per request	Per day	Monthly (×30)	Details
GPT-5 nanoopenai	$0.00070	$0.35	$10.50	View →
Gemini 2.5 Flash-Litegoogle	$0.00120	$0.60	$18.00	View →
GPT-4o miniopenai	$0.00180	$0.90	$27.00	View →
GPT-5.4 nanoopenai	$0.00263	$1.31	$39.38	View →
Gemini 3.1 Flash-Lite (preview)google	$0.00325	$1.63	$48.75	View →
GPT-5.5 nanoopenai	$0.00325	$1.63	$48.75	View →
GPT-5 miniopenai	$0.00350	$1.75	$52.50	View →
Gemini 2.5 Flashgoogle	$0.00425	$2.13	$63.75	View →

Summarisation API cost calculator

Workload parameters

Top 8 cheapest models for this workload

Scaling GPT-5 nano

Optimisation tips

Frequently asked

Need a precise number for your actual prompt?