Converter

Pages to tokens converter

How many tokens is a 30-page PDF? A 500-page book? Convert document length in pages to an LLM token budget instantly.

Assumes single-spaced printed pages (~500 words each). Double-spaced halves the count.

Quick math: 10 pages 8,000 tokens

How the conversion works

A single-spaced printed page averages about 500 words, which converts to roughly 665 tokens — rounded up to 800 to account for formatting, headers, and the natural variance in page density. This is the figure we use across Calcis.

Double-spaced pages run closer to 250 words (400 tokens). Dense legal or academic text can hit 700 words per page (930 tokens). Use the reference table below to calibrate against common document types.

This ratio is useful for two workflows: sizing a context window ("will my 80-page contract fit in Claude's 200K window?") and cost-estimating a document-processing job ("how much to summarize 10,000 PDFs?").

Documents by size

1 page~800 tokens
A 10-page report~8,000 tokens
A 30-page paper~24,000 tokens
A 100-page contract~80,000 tokens
A 250-page short book~200,000 tokens (fills Claude)
A 400-page novel~320,000 tokens (fits in GPT-5)
A 1,000-page reference book~800,000 tokens (Gemini only)
A 2,500-page set (fits 2M Gemini)~2M tokens

Frequently asked

How accurate is 800 tokens per page?

Within ±25% for typical documents. Dense academic papers or legal contracts run higher; novels with lots of dialogue run lower. For precise work, count actual tokens with one of our /tools/ counters.

How do I handle double-spaced pages?

Halve the number — double-spaced runs about 400 tokens per page. Or just count by word total (500 words ≈ 665 tokens for single-spaced, 250 words ≈ 333 tokens for double-spaced).

What about PDFs with tables, figures, or formulas?

Tables tokenize similarly to prose (maybe slightly denser). Figures extract zero tokens — images are processed by vision models separately, with their own pricing. Formulas are worse case, because mathematical notation rarely appears in tokenizer vocab.

Does this include OCR overhead or image tokens?

No. This counts text only. If you're feeding scanned pages through a vision model, those have separate image-token pricing (typically $0.01-0.05 per image for GPT-4o / Claude / Gemini). Use the /estimator for image-inclusive pricing.

How do I fit a large book into a context window?

Chunk it. Even Gemini's 2M-token window has performance degradation past ~500K tokens. Best practice: chunk to 50-100K token sections, retrieve relevant chunks with embeddings, feed only those to the model. This is the RAG pattern — see /calculators/rag-cost.

Ready to estimate a real prompt?

Paste your actual text into the estimator for exact token counts and dollar costs across every model.