Pre-flight cost guardrails for every LLM API call. Predict spend, set budgets, block PRs that blow them. 28 models across OpenAI, Anthropic, and Google.
The problem
Agent loops, runaway batch jobs, and RAG queries that 10x context overnight are the three patterns that break budgets. Calcis catches all three before the call leaves your code.
Observability tools surface the spend after it has already happened. By the time the dashboard turns red, the invoice has already moved. We sit one step earlier in the loop: predict the cost, set the cap, block the PR before the regression ships.
The name is older than the category. Roman merchants counted with calculi, small pebbles, before committing to a deal. The cost was known first. Always first.
Monthly invoice
$847.23
OpenAI API, billed after the fact.
No warning. No preview.Three jobs the guardrail does.
Predict
Exact tokens for OpenAI and Google models, calibrated approximation for Anthropic until we wire their token-counting API back up. Output prediction with a P10 to P90 confidence band trained on 33,000 real prompt-response pairs. The cost number lands before the call does.
How prediction worksCap
Per-route, per-model, per-environment caps. Calcis knows what your prompt will cost. You decide what 'too much' is.
See the tiersBlock
GitHub Action fails the PR when predicted spend crosses the line. CLI exits non-zero. VS Code shows the warning inline. Catch it in dev, not in the bill.
See how it works in CIThree patterns that break budgets.
The same three failure modes account for almost every surprise invoice we hear about. Calcis sees each one before the call leaves your code.
Your LangGraph agent hits a recursion edge case and runs 50 iterations on what should have been 3. Calcis caps the loop before the bill catches up.
See the pattern →
Your RAG retriever pulls 40 chunks instead of 8 because someone tweaked the threshold. Calcis flags the cost spike on the next PR.
See the pattern →
Someone routed a classification step to Opus. Calcis shows the 200x cost spread vs Haiku before you ship.
See the pattern →
Frontier model pricing, live.
Ranked by a 1K in / 500 out token benchmark. Updated last month.
Prices sourced direct from provider docs · Methodology
Tracked, timestamped, verifiable.
Every price on Calcis resolves to a source and a date. If a provider moves a number, the change shows up here first, and you can audit the whole history at any time.
From the Q2 2026 Pricing Report
“At 100K input tokens, 7.4× separates the cheapest flagship (Gemini 3.1 Pro (preview) at $0.22) from the most expensive (Claude Opus 4.1 at $1.65) on the same 2K-output shape. Anyone shipping RAG or document analysis at scale should be benchmarking across providers, not just across models.”
149 changes logged across 52 entries. No retroactive edits.
New provider prices land on Calcis within hours of announcement. Every update is timestamped on the public changelog.
View changelogEvery price change is logged with its date, what moved, and a link to the provider announcement that triggered it. No silent edits.
See every changeEach model row carries a direct link to the provider's own pricing page. Anything we publish, you can verify in two clicks.
Browse model sourcesThe full pricing dataset ships as CSV and JSON-LD alongside a public RSS feed. Build on it, diff it, pipe it into your own tools.
Subscribe to RSSPick the shape of the question.
Each calculator ranks every tracked model cheapest-first for that specific workload, with sensible defaults and tuning inputs.
Tagging, labelling, routing, and moderation workloads. Short inputs, tiny outputs, huge volume: exactly the shape the cheapest models were built for.
Open calculator →
Compute the cost of embedding a corpus. One-off index builds, continuous indexing, and query-time embedding are all priced here.
Open calculator →
Agentic coding loops are the most expensive LLM workload there is: many tool calls, big context windows, long outputs. Cost yours honestly.
Open calculator →
Cost out a batch of document summaries across every major LLM. Long input, short output: the inverse of a chatbot.
Open calculator →
Estimate what a production chatbot costs per conversation and per month across every major LLM: GPT-5, Claude, Gemini: with realistic defaults.
Open calculator →
Cost out a retrieval-augmented generation pipeline: embedding queries, retrieving chunks, and paying for the context-heavy chat call on top.
Open calculator →
See all on the workload calculators page.
Free for the basics. Pro for the predictor.
Yearly billing on every paid tier. Team and Enterprise on the full pricing page.
Full feature comparison and FAQ on the pricing page.
Pipe Calcis into your stack.
Calcis isn't only a website. Four live surfaces wrap the same pricing dataset so you can estimate cost from CI, your terminal, your own API, or your RSS reader.
Drop into any CI pipeline. Posts cost estimates on every pull request in under 30 seconds.
rc397/calcis-action@v1Token counts and cost estimates from your terminal. No API key for the free tier.
npx calcisKey-scoped estimate endpoint. JSON in, JSON out. Same numbers the site renders.
POST /api/v1/estimateSubscribe to every change by RSS, or pull the dataset as an npm package to diff in your own tools.
@calcis/pricingOne step in your CI pipeline
Add Calcis to your GitHub workflow in 30 seconds.
name: LLM Cost Estimate
on:
pull_request:
types: [opened, synchronize]
jobs:
estimate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: rc397/calcis-action@v1
with:
api-key: ${{ secrets.CALCIS_API_KEY }}
model: claude-sonnet-4-6calcis/calcis-action once the GitHub org is approved.Get your API key at calcis.dev/dashboard
Show LLM cost in your README.
Drop a tiny SVG badge into your README, blog post, or docs. It shows the per-call cost for any tracked model and updates the day your provider's price moves. No re-deploy needed.
These are the real badge endpoint: same SVGs any embed receives, cached 24h at the edge.
Common questions
Input token counts are exact. We use the same tokeniser as each provider. Output predictions are estimates based on prompt patterns; actual output varies by model and response.
Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, GPT-5.4, GPT-5, GPT-4o, Gemini 2.5 Pro, Gemini 2.5 Flash, and 17+ more. Updated when providers change pricing.
The raw text is never written to disk. We store a one-way SHA-512 hash plus token counts for 90 days (see our privacy policy). OpenAI tokenization runs locally; Claude and Gemini tokenization calls the providers' free countTokens endpoints, which sends the prompt to them purely for counting.
Free (Initium) gets unlimited token-exact counts on every supported model, the heuristic and regression cost predictors, the multi-model calculator, the workload calculators, the CLI, and the GitHub Action on public repos. Pro ($15/mo) adds the LLM-assisted predictor (Precise mode), the Bayesian P10 to P90 confidence band, the multi-turn session simulator, the context-file analyser, and a public REST API key. Max ($29/mo) adds prompt compression and higher quotas. Team ($40/seat/mo) adds pooled quotas and admin dashboards. Full breakdown lives on the pricing page.
Calcis is an independent project maintained by a single engineer (rc397 on GitHub). Every price change, source link, and changelog entry passes through one set of eyes before it publishes - no scraped content, no distributed editorial board. The tradeoff is that updates are manual; the benefit is that the dataset has a consistent, auditable hand behind every row.
A mix of provider blog RSS feeds, direct watches on the OpenAI / Anthropic / Google pricing pages, and manual monitoring of their changelog and release channels. Every published change is cross-referenced against at least one official provider URL before it lands in the dataset - you can see that source link on every entry of the public changelog.
You shouldn't - the provider's own page is always the authoritative source. Calcis is an aggregator: it saves you from clicking through three vendor sites, surfaces moves the day they happen, and lets you diff the landscape across providers on an identical workload. Every model row links back to the provider's pricing page so two clicks take you to the ground truth.
OpenAI models (GPT-4o, GPT-5, GPT-5.4) use o200k_base via js-tiktoken, computed locally so no prompt text leaves your browser. Legacy GPT-4 and GPT-3.5 fall back to cl100k_base (same library). Anthropic Claude models call the provider's free messages.countTokens endpoint with a 4s timeout and a heuristic fallback. Google Gemini models use the countTokens method on the Generative Language API with the same timeout and fallback behaviour.
Know what every LLM call will cost before you send it. 25+ models, three providers, prices verified and timestamped. No account needed to start.