Question 1

What's the cheapest model for summarization?

Accepted Answer

For pure extractive work, Gemini 2.5 Flash-Lite is almost always cheapest — its input rate is dramatically below every competitor. For abstractive or multi-document synthesis, Sonnet 4.6 and GPT-5-mini offer the best cost/quality balance.

Question 2

How do I handle documents longer than the context window?

Accepted Answer

Chunk the document, summarize each chunk, then summarize the summaries (map-reduce). This is usually cheaper AND higher quality than cramming into a long-context tier — the model focuses better on shorter inputs.

Question 3

Does batch API help for summarization?

Accepted Answer

Yes — 50% off on OpenAI and Anthropic if you can wait up to 24 hours for results. Summarization is typically asynchronous, so it's the ideal workload for batch.

Question 4

Should I use prompt caching?

Accepted Answer

Only if the same document is summarized multiple times (with different prompts for different audiences, say). One-shot summarization can't benefit from caching because each document is different.

Question 5

How do I estimate output token count?

Accepted Answer

Target a ratio — e.g. 5% of input length for an executive summary, 15-20% for a detailed digest. The calculator multiplies your output estimate by the rate; err on the high side if in doubt.

Model	Per request	Per day	Monthly (×30)	Details
GPT-5 nanoopenai	$0.00070	$0.35	$10.50	View →
Gemini 2.5 Flash-Litegoogle	$0.00120	$0.60	$18.00	View →
GPT-4o miniopenai	$0.00180	$0.90	$27.00	View →
GPT-5.4 nanoopenai	$0.00263	$1.31	$39.38	View →
Gemini 3.1 Flash-Lite (preview)google	$0.00325	$1.63	$48.75	View →
GPT-5 miniopenai	$0.00350	$1.75	$52.50	View →
Gemini 2.5 Flashgoogle	$0.00425	$2.13	$63.75	View →
GPT-4.1 miniopenai	$0.00480	$2.40	$72.00	View →

Summarization API cost calculator

Workload parameters

Top 8 cheapest models for this workload

Scaling GPT-5 nano

Optimization tips

Frequently asked

Need a precise number for your actual prompt?