Question 1

How accurate is 800 tokens per page?

Accepted Answer

Within ±25% for typical documents. Dense academic papers or legal contracts run higher; novels with lots of dialogue run lower. For precise work, count actual tokens with one of our /tools/ counters.

Question 2

How do I handle double-spaced pages?

Accepted Answer

Halve the number: double-spaced runs about 400 tokens per page. Or just count by word total (500 words ≈ 665 tokens for single-spaced, 250 words ≈ 333 tokens for double-spaced).

Question 3

What about PDFs with tables, figures, or formulas?

Accepted Answer

Tables tokenize similarly to prose (maybe slightly denser). Figures extract zero tokens: images are processed by vision models separately, with their own pricing. Formulas are worse case, because mathematical notation rarely appears in tokenizer vocab.

Question 4

Does this include OCR overhead or image tokens?

Accepted Answer

No. This counts text only. If you're feeding scanned pages through a vision model, those have separate image-token pricing (typically $0.01-0.05 per image for GPT-4o / Claude / Gemini). Use the /estimator for image-inclusive pricing.

Question 5

How do I fit a large book into a context window?

Accepted Answer

Chunk it. Even Gemini's 2M-token window has performance degradation past ~500K tokens. Best practice: chunk to 50-100K token sections, retrieve relevant chunks with embeddings, feed only those to the model. This is the RAG pattern: see /calculators/rag-cost.

1 page	~800 tokens
A 10-page report	~8,000 tokens
A 30-page paper	~24,000 tokens
A 100-page contract	~80,000 tokens
A 250-page short book	~200,000 tokens (fills Claude)
A 400-page novel	~320,000 tokens (fits in GPT-5)
A 1,000-page reference book	~800,000 tokens (Gemini only)
A 2,500-page set (fits 2M Gemini)	~2M tokens

Pages to tokens converter

How the conversion works

Documents by size

Frequently asked

Ready to estimate a real prompt?