Cost calculator

Embedding API cost calculator

Compute the cost of embedding a corpus. One-off index builds, continuous indexing, and query-time embedding are all priced here.

Embeddings are the cheapest part of any RAG pipeline by far: often 2-5% of the total. The bill grows when you reindex frequently, when chunk size is small, or when the corpus is very large (100M+ tokens). This calculator covers all three cases.

Note: embedding models are NOT in the standard PRICING table (they're not chat models), but the math uses the same model list so you can compare costs against chat-model use of the same input for context. For true embedding model pricing see OpenAI text-embedding-3-small ($0.02/1M), Voyage-3 ($0.06/1M), or Gemini text-embedding-004 ($0.025/1M): these are the production workhorses.

The defaults approximate embedding a 10M-token corpus split into 500-token chunks, plus ongoing query embeddings.

Workload parameters

Costs update live across every model in the table below.

Input tokens per embedding callChunk size for indexing, or query length for retrieval.Output tokens (ignore: embeddings don't charge output)Leave at 0 for embedding workloads.Embedding calls per dayFor a 10M-token corpus at 500-token chunks = 20K calls.

Top 8 cheapest models for this workload

Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).

Model	Per request	Per day	Monthly (×30)	Details
GPT-5 nanoopenai	$0.00003	$0.50	$15.00	View →
Gemini 2.5 Flash-Litegoogle	$0.00005	$1.00	$30.00	View →
GPT-4o miniopenai	$0.00007	$1.50	$45.00	View →
GPT-5.4 nanoopenai	$0.00010	$2.00	$60.00	View →
Gemini 3.1 Flash-Lite (preview)google	$0.00013	$2.50	$75.00	View →
GPT-5.5 nanoopenai	$0.00013	$2.50	$75.00	View →
GPT-5 miniopenai	$0.00013	$2.50	$75.00	View →
Gemini 2.5 Flashgoogle	$0.00015	$3.00	$90.00	View →

Scaling GPT-5 nano

What the cheapest option costs as your traffic grows.

baseline

$15.00

per month

2× volume

$30.00

per month

5× volume

$75.00

per month

10× volume

$150.00

per month

Optimisation tips

Use text-embedding-3-small unless you have a specific quality complaint. It's 6× cheaper than text-embedding-3-large and the retrieval quality difference is usually within noise for most corpora.
Dedupe before indexing. Hash chunks and skip embedding duplicates: real-world corpora often have 10-30% duplication (repeated headers, footers, boilerplate).
Batch embed. Most providers accept arrays of up to 2048 inputs per call. Single-item embedding wastes latency budget without helping cost.
Only reindex on change. Store source hashes alongside vectors; skip re-embedding anything whose source hash is unchanged. Naive "reindex everything nightly" is the #1 embedding cost leak.

Frequently asked

How much does it cost to embed a corpus?

At OpenAI text-embedding-3-small rates ($0.02/1M tokens), a 10M-token corpus costs about $0.20 to embed once. Even a 1-billion-token corpus is only $20. Embedding cost is essentially never the bottleneck.

Which embedding model is cheapest?

OpenAI text-embedding-3-small is the cheapest of the top-tier options at $0.02/1M tokens. Gemini text-embedding-004 is comparable at $0.025/1M. Voyage-3 is $0.06/1M but often wins on retrieval quality benchmarks.

Should I embed with a frontier model instead?

No. Embedding models are specialized and optimised for retrieval tasks; using a chat model to produce embeddings is dramatically more expensive with no quality benefit. Stick to purpose-built embedding endpoints.

How often should I reindex?

Only when source documents change. Store a hash (SHA-256 of cleaned text) alongside each vector; on ingest, re-embed only if the hash is new or changed. This turns nightly reindex jobs from "every doc" into "1-5% of docs".

Do embedding calls have rate limits?

Yes, but generous: OpenAI's tier 1 is 3,000 RPM for embeddings, rising with usage. Large corpus ingestion typically fits under these limits with a simple concurrency limiter.

Need a precise number for your actual prompt?

Paste a real prompt into the estimator and get token-accurate costs: input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.

Open the estimator →Full calculator →