Free tool
OpenAI token counter
Exact token counts for GPT-5, GPT-4o, GPT-4.1, o3, and o4-mini, using the official o200k_base tokenizer. Runs in your browser — your text is never uploaded.
OpenAI (GPT-5, 4.x)
~0
approx until tokenizer loads
Characters
0
Words
0
What is a token?
OpenAI's current frontier models (GPT-4o, GPT-4.1, GPT-5 families, o3, o4-mini) all use the o200k_base tokenizer, an open-source BPE vocabulary of roughly 200,000 subwords. Older models (GPT-4, GPT-4-turbo, GPT-3.5-turbo) use cl100k_base, which is about half the size.
This tool uses o200k_base by default because it covers every model OpenAI currently recommends for new work. Counts here exactly match what the OpenAI API will bill you for — no approximation.
Token counts scale with content type: English prose runs ~4 chars/token, code and JSON ~3 chars/token, Chinese/Japanese/Korean ~1.5 chars/token, and emoji 1-4 tokens each.
How tokens relate to characters
| Text | Characters | ~Tokens |
|---|---|---|
| Hello | 5 | 1 |
| Hello, world! | 13 | 4 |
| The quick brown fox jumps over the lazy dog. | 44 | 11 |
| function add(a, b) { return a + b; } | 36 | 13 |
| pneumonoultramicroscopicsilicovolcanoconiosis | 45 | 11 |
| 🎉🚀✨ emoji counts are surprising | 30 | 12 |
Frequently asked
Which tokenizer does this use?
o200k_base, the encoding used by GPT-4o, GPT-4.1, GPT-5, o3, and o4-mini. If you're using a legacy model (GPT-4, GPT-3.5-turbo), those use cl100k_base and counts will be 10-15% higher.
Does the count include system messages and tool schemas?
This tool counts whatever text you paste. In a real chat API call, you pay for the full serialized payload (system + messages + tools + response), so for production cost estimation, paste your complete formatted request, not just the user message.
Is this count exact?
Yes — js-tiktoken is a pure-JS port of OpenAI's own tiktoken library and produces byte-identical results. The official OpenAI Python tiktoken package would give you the same number.
Why is my count different from the OpenAI API response?
Most likely you're comparing against a model that uses cl100k_base instead of o200k_base (legacy GPT-4 / 3.5), or you're not including the system prompt, tool schemas, or message formatting overhead. The API usage field counts everything in the request.
How do I estimate cost from this?
Multiply tokens by the model's rate from /pricing. E.g. GPT-5 input is $2.50/1M so 5,000 tokens = $0.0125. For output tokens, use the output rate (usually 4-10× the input rate).
Know the tokens? Get the cost.
Once you've got a token count, the estimator turns it into an exact dollar forecast across every model.