Cost calculator

Coding agent cost calculator

Agentic coding loops are the most expensive LLM workload there is — many tool calls, big context windows, long outputs. Cost yours honestly.

Coding agents — Claude Code, Cursor, Devin, a home-grown loop — blow past typical chatbot costs because a single user turn can fan out into 5-30 model calls as the agent reads files, runs tools, and iterates. Under-modelling that fan-out is the single biggest source of "why is our bill so high" surprise.

The defaults below encode a realistic mid-size coding task: 15 tool calls per user turn, 8,000 tokens of average context (file reads + previous messages), and 600 tokens of output per call. Tune the tool-call multiplier first — it's usually the dominant number.

Workload parameters

Costs update live across every model in the table below.

Top 8 cheapest models for this workload

Sorted by total cost per request (input + output, with tokenizer and long-context adjustments applied).

ModelPer requestPer dayMonthly (×30)Details
GPT-5 nanoopenai$0.00624$0.62$18.72View →
Gemini 2.5 Flash-Litegoogle$0.01224$1.22$36.72View →
GPT-4o miniopenai$0.01836$1.84$55.08View →
GPT-5.4 nanoopenai$0.02475$2.48$74.25View →
Gemini 3.1 Flash-Lite (preview)google$0.03090$3.09$92.70View →
GPT-5 miniopenai$0.03120$3.12$93.60View →
Gemini 2.5 Flashgoogle$0.03750$3.75$112.50View →
GPT-4.1 miniopenai$0.04896$4.90$146.88View →

Scaling GPT-5 nano

What the cheapest option costs as your traffic grows.

baseline

$18.72

per month

2× volume

$37.44

per month

5× volume

$93.60

per month

10× volume

$187.20

per month

Optimization tips

  • Prompt caching is non-negotiable. Agent loops re-read the same files, tool schemas, and system prompts — Anthropic's 5-minute cache catches most of this at 10% of the input rate.
  • Watch the Opus 4.7 tokenizer. Its 1.15× multiplier quietly rebills most coding workloads at 15% higher cost than Opus 4.6 for the same transcript; budget accordingly.
  • Cheaper models for mechanical steps. Route file reads, grep, and bash through Haiku or GPT-5-mini; save Opus/GPT-5 for the planning and diff-writing steps.
  • Cap iteration count. An agent that hasn't solved a task in 10 turns usually won't in 50 — it'll just spend more. Hard limits save money and ship better UX.

Frequently asked

Why are coding agents so expensive?

Every user request fans out into many LLM calls, and each call carries heavy context (file contents, tool schemas, previous messages). A single "fix this bug" request typically costs 10-50× what a chatbot reply of the same apparent user-facing size would.

Which model is cheapest for agentic coding?

Claude Haiku 4.5 and Gemini 2.5 Flash are dramatically cheaper per token, but they're also noticeably worse at multi-step planning. For production agents, Sonnet 4.6 or GPT-5-mini usually win on cost-per-successful-task; Opus 4.7 and GPT-5 win when task completion matters more than per-request cost.

How much does prompt caching save on a coding agent?

40-85%, depending on how often the same files and tool schemas appear across calls in the cache window. Anthropic's 5-minute TTL is usually long enough for a user's active session; OpenAI's caching is automatic but only 25-50% off.

Does the calculator include extended thinking / reasoning tokens?

No — it treats "output" as the final answer. If you enable extended thinking (Claude) or reasoning tokens (OpenAI o-series), add an extra 500-3,000 output tokens per call depending on the task; those bill at the output rate.

Should I self-host a model for cost reasons?

Only at >$3k/month API spend, and only if you have GPU ops expertise. For most teams, optimizing the prompt and caching aggressively clears more cost than switching to self-hosted open-weight models, with zero quality hit.

Need a precise number for your actual prompt?

Paste a real prompt into the estimator and get token-accurate costs — input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.