Cost calculator

Chatbot API cost calculator

Estimate what a production chatbot costs per conversation and per month across every major LLM — GPT-5, Claude, Gemini — with realistic defaults.

A customer-facing chatbot is the most common LLM workload, and also the one product teams most often underestimate. The headline $/1M token rate is almost meaningless in isolation — it's the conversation shape (system prompt, message history, output length) that drives the bill.

The calculator below runs every tracked model against your workload. Tune the inputs on the left and watch the table re-sort in real time. Start with the defaults — a 500-token system prompt, 200 tokens of user input, and 300 tokens of assistant reply is a reasonable mid-sized support bot — and adjust from there.

Workload parameters

Costs update live across every model in the table below.

Top 8 cheapest models for this workload

Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).

ModelPer requestPer dayMonthly (×30)Details
GPT-5 nanoopenai$0.00015$0.15$4.65View →
Gemini 2.5 Flash-Litegoogle$0.00019$0.19$5.70View →
GPT-4o miniopenai$0.00028$0.28$8.55View →
GPT-5.4 nanoopenai$0.00051$0.51$15.45View →
Gemini 3.1 Flash-Lite (preview)google$0.00063$0.63$18.75View →
GPT-4.1 miniopenai$0.00076$0.76$22.80View →
GPT-5 miniopenai$0.00077$0.78$23.25View →
Gemini 2.5 Flashgoogle$0.00096$0.96$28.80View →

Scaling GPT-5 nano

What the cheapest option costs as your traffic grows.

baseline

$4.65

per month

2× volume

$9.30

per month

5× volume

$23.25

per month

10× volume

$46.50

per month

Optimization tips

  • Use prompt caching (Anthropic or OpenAI) for the static part of your system prompt — cuts 50-90% off input costs for repeat traffic.
  • Trim conversation history to the last 4-6 turns. Full-history replay is the single biggest cost line on most chatbots.
  • Route simple intents (greetings, small talk) to a cheaper model like GPT-5-nano or Haiku. Reserve Opus or GPT-5 for complex reasoning.
  • Cap max output tokens. Most "runaway" conversations are models that rambled past a natural stopping point.

Frequently asked

How do I estimate chatbot cost?

Multiply average input tokens per message (system prompt + history + user turn) by the input rate, add average output tokens times the output rate, then multiply by messages per day and 30 for a monthly bill. The calculator above does this for every model at once.

What's the cheapest model for a chatbot?

For typical support chat workloads (~700 in / 300 out), GPT-5-nano and Gemini 2.5 Flash-Lite are usually the cheapest, followed by Claude Haiku 4.5. The ranking flips if you enable prompt caching, because Anthropic's 10× cache read discount is aggressive.

Does the calculator include prompt caching discounts?

No — it shows base input/output rates so the numbers match a worst-case read of the rate card. Caching can cut input cost 50-90% in practice; apply that as a separate mental discount once you've identified your shortlist.

How accurate are the output token estimates?

The calculator uses the number you enter as-is. If you don't know your real output length, sample 50 actual conversations and average. For quick math, the average customer-support reply is 150-400 tokens; a RAG-style answer with citations is usually 300-600.

Should I factor in latency or rate limits?

Not for cost modelling — those affect UX, not the bill. But a model with a 2-second tail latency will lose conversations, which reduces volume, which reduces cost. Use the estimator for a single-prompt deep dive once you've picked a shortlist.

Need a precise number for your actual prompt?

Paste a real prompt into the estimator and get token-accurate costs — input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.