Question 1

Which tokenizer does this use?

Accepted Answer

o200k_base, the encoding used by GPT-4o, GPT-4.1, GPT-5, o3, and o4-mini. If you're using a legacy model (GPT-4, GPT-3.5-turbo), those use cl100k_base and counts will be 10-15% higher.

Question 2

Does the count include system messages and tool schemas?

Accepted Answer

This tool counts whatever text you paste. In a real chat API call, you pay for the full serialized payload (system + messages + tools + response), so for production cost estimation, paste your complete formatted request, not just the user message.

Question 3

Is this count exact?

Accepted Answer

Yes: js-tiktoken is a pure-JS port of OpenAI's own tiktoken library and produces byte-identical results. The official OpenAI Python tiktoken package would give you the same number.

Question 4

Why is my count different from the OpenAI API response?

Accepted Answer

Most likely you're comparing against a model that uses cl100k_base instead of o200k_base (legacy GPT-4 / 3.5), or you're not including the system prompt, tool schemas, or message formatting overhead. The API usage field counts everything in the request.

Question 5

How do I estimate cost from this?

Accepted Answer

Multiply tokens by the model's rate from /pricing. E.g. GPT-5 input is $2.50/1M so 5,000 tokens = $0.0125. For output tokens, use the output rate (usually 4-10× the input rate).

Text	Characters	~Tokens
Hello	5	1
Hello, world!	13	4
The quick brown fox jumps over the lazy dog.	44	11
function add(a, b) { return a + b; }	36	13
pneumonoultramicroscopicsilicovolcanoconiosis	45	11
🎉🚀✨ emoji counts are surprising	30	12

OpenAI token counter

What is a token?

How tokens relate to characters

Frequently asked

Know the tokens? Get the cost.