Cost calculator

Embedding API cost calculator

Compute the cost of embedding a corpus. One-off index builds, continuous indexing, and query-time embedding are all priced here.

Embeddings are the cheapest part of any RAG pipeline by far — often 2-5% of the total. The bill grows when you reindex frequently, when chunk size is small, or when the corpus is very large (100M+ tokens). This calculator covers all three cases.

Note: embedding models are NOT in the standard PRICING table (they're not chat models), but the math uses the same model list so you can compare costs against chat-model use of the same input for context. For true embedding model pricing see OpenAI text-embedding-3-small ($0.02/1M), Voyage-3 ($0.06/1M), or Gemini text-embedding-004 ($0.025/1M) — these are the production workhorses.

The defaults approximate embedding a 10M-token corpus split into 500-token chunks, plus ongoing query embeddings.

Workload parameters

Costs update live across every model in the table below.

Top 8 cheapest models for this workload

Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).

ModelPer requestPer dayMonthly (×30)Details
GPT-5 nanoopenai$0.00003$0.50$15.00View →
Gemini 2.5 Flash-Litegoogle$0.00005$1.00$30.00View →
GPT-4o miniopenai$0.00007$1.50$45.00View →
GPT-5.4 nanoopenai$0.00010$2.00$60.00View →
Gemini 3.1 Flash-Lite (preview)google$0.00013$2.50$75.00View →
GPT-5 miniopenai$0.00013$2.50$75.00View →
Gemini 2.5 Flashgoogle$0.00015$3.00$90.00View →
GPT-4.1 miniopenai$0.00020$4.00$120.00View →

Scaling GPT-5 nano

What the cheapest option costs as your traffic grows.

baseline

$15.00

per month

2× volume

$30.00

per month

5× volume

$75.00

per month

10× volume

$150.00

per month

Optimization tips

  • Use text-embedding-3-small unless you have a specific quality complaint. It's 6× cheaper than text-embedding-3-large and the retrieval quality difference is usually within noise for most corpora.
  • Dedupe before indexing. Hash chunks and skip embedding duplicates — real-world corpora often have 10-30% duplication (repeated headers, footers, boilerplate).
  • Batch embed. Most providers accept arrays of up to 2048 inputs per call. Single-item embedding wastes latency budget without helping cost.
  • Only reindex on change. Store source hashes alongside vectors; skip re-embedding anything whose source hash is unchanged. Naive "reindex everything nightly" is the #1 embedding cost leak.

Frequently asked

How much does it cost to embed a corpus?

At OpenAI text-embedding-3-small rates ($0.02/1M tokens), a 10M-token corpus costs about $0.20 to embed once. Even a 1-billion-token corpus is only $20. Embedding cost is essentially never the bottleneck.

Which embedding model is cheapest?

OpenAI text-embedding-3-small is the cheapest of the top-tier options at $0.02/1M tokens. Gemini text-embedding-004 is comparable at $0.025/1M. Voyage-3 is $0.06/1M but often wins on retrieval quality benchmarks.

Should I embed with a frontier model instead?

No. Embedding models are specialized and optimized for retrieval tasks; using a chat model to produce embeddings is dramatically more expensive with no quality benefit. Stick to purpose-built embedding endpoints.

How often should I reindex?

Only when source documents change. Store a hash (SHA-256 of cleaned text) alongside each vector; on ingest, re-embed only if the hash is new or changed. This turns nightly reindex jobs from "every doc" into "1-5% of docs".

Do embedding calls have rate limits?

Yes, but generous — OpenAI's tier 1 is 3,000 RPM for embeddings, rising with usage. Large corpus ingestion typically fits under these limits with a simple concurrency limiter.

Need a precise number for your actual prompt?

Paste a real prompt into the estimator and get token-accurate costs — input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.