Cost calculator
Embedding API cost calculator
Compute the cost of embedding a corpus. One-off index builds, continuous indexing, and query-time embedding are all priced here.
Embeddings are the cheapest part of any RAG pipeline by far — often 2-5% of the total. The bill grows when you reindex frequently, when chunk size is small, or when the corpus is very large (100M+ tokens). This calculator covers all three cases.
Note: embedding models are NOT in the standard PRICING table (they're not chat models), but the math uses the same model list so you can compare costs against chat-model use of the same input for context. For true embedding model pricing see OpenAI text-embedding-3-small ($0.02/1M), Voyage-3 ($0.06/1M), or Gemini text-embedding-004 ($0.025/1M) — these are the production workhorses.
The defaults approximate embedding a 10M-token corpus split into 500-token chunks, plus ongoing query embeddings.
Workload parameters
Costs update live across every model in the table below.
Top 8 cheapest models for this workload
Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).
| Model | Per request | Per day | Monthly (×30) | Details |
|---|---|---|---|---|
| GPT-5 nanoopenai | $0.00003 | $0.50 | $15.00 | View → |
| Gemini 2.5 Flash-Litegoogle | $0.00005 | $1.00 | $30.00 | View → |
| GPT-4o miniopenai | $0.00007 | $1.50 | $45.00 | View → |
| GPT-5.4 nanoopenai | $0.00010 | $2.00 | $60.00 | View → |
| Gemini 3.1 Flash-Lite (preview)google | $0.00013 | $2.50 | $75.00 | View → |
| GPT-5 miniopenai | $0.00013 | $2.50 | $75.00 | View → |
| Gemini 2.5 Flashgoogle | $0.00015 | $3.00 | $90.00 | View → |
| GPT-4.1 miniopenai | $0.00020 | $4.00 | $120.00 | View → |
Scaling GPT-5 nano
What the cheapest option costs as your traffic grows.
baseline
$15.00
per month
2× volume
$30.00
per month
5× volume
$75.00
per month
10× volume
$150.00
per month
Optimization tips
- Use text-embedding-3-small unless you have a specific quality complaint. It's 6× cheaper than text-embedding-3-large and the retrieval quality difference is usually within noise for most corpora.
- Dedupe before indexing. Hash chunks and skip embedding duplicates — real-world corpora often have 10-30% duplication (repeated headers, footers, boilerplate).
- Batch embed. Most providers accept arrays of up to 2048 inputs per call. Single-item embedding wastes latency budget without helping cost.
- Only reindex on change. Store source hashes alongside vectors; skip re-embedding anything whose source hash is unchanged. Naive "reindex everything nightly" is the #1 embedding cost leak.
Frequently asked
How much does it cost to embed a corpus?
At OpenAI text-embedding-3-small rates ($0.02/1M tokens), a 10M-token corpus costs about $0.20 to embed once. Even a 1-billion-token corpus is only $20. Embedding cost is essentially never the bottleneck.
Which embedding model is cheapest?
OpenAI text-embedding-3-small is the cheapest of the top-tier options at $0.02/1M tokens. Gemini text-embedding-004 is comparable at $0.025/1M. Voyage-3 is $0.06/1M but often wins on retrieval quality benchmarks.
Should I embed with a frontier model instead?
No. Embedding models are specialized and optimized for retrieval tasks; using a chat model to produce embeddings is dramatically more expensive with no quality benefit. Stick to purpose-built embedding endpoints.
How often should I reindex?
Only when source documents change. Store a hash (SHA-256 of cleaned text) alongside each vector; on ingest, re-embed only if the hash is new or changed. This turns nightly reindex jobs from "every doc" into "1-5% of docs".
Do embedding calls have rate limits?
Yes, but generous — OpenAI's tier 1 is 3,000 RPM for embeddings, rising with usage. Large corpus ingestion typically fits under these limits with a simple concurrency limiter.
Need a precise number for your actual prompt?
Paste a real prompt into the estimator and get token-accurate costs — input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.