Question 1

Is this different from the tokens-to-dollars converter?

Accepted Answer

Same math, different UI. This one breaks the total into input cost / output cost / total so you can see which side of the ledger dominates. Use /convert/tokens-to-dollars for a simpler view.

Question 2

When should I optimise input vs output?

Accepted Answer

If your workload is input-heavy (summarisation, long context), optimise context size (trim history, rerank retrieval, smaller chunks, caching). If output-heavy (creative generation, long replies), optimise max_tokens and model choice.

Question 3

Why is output so much more expensive than input?

Accepted Answer

Generation requires a full forward pass per output token, while input tokens are processed in parallel. Output rates are typically 4-10× the input rate across every provider. For a 50/50 workload, 80% of cost will be output.

Question 4

What counts as 'input' exactly?

Accepted Answer

Everything sent to the model: system prompt + user message + prior messages + tool schemas + any structured-output JSON schema. OpenAI and Anthropic charge a small overhead for message formatting too, which this converter doesn't model (it's usually <5%).

Question 5

Does this include fine-tuning costs?

Accepted Answer

No. Fine-tuning has a one-time training cost plus a usually-higher per-token inference cost. This converter only handles base-model inference. For fine-tuning, see each provider's pricing page directly.

Input-heavy workload (10K in / 500 out)	~90% of cost is input
Balanced workload (1K in / 1K out)	~80% of cost is output
Output-heavy workload (500 in / 2K out)	~95% of cost is output
Chat default (700 in / 300 out)	~60% of cost is output
Summarisation (10K in / 500 out)	~75% of cost is input
RAG (4K context + 200 query / 400 out)	~65% of cost is input

Tokens to API cost calculator

Pick a model, enter token counts, see the dollar cost

How the conversion works

Where does the cost land?

Frequently asked

Ready to estimate a real prompt?