Free tool

OpenAI token counter

Exact token counts for GPT-5, GPT-4o, GPT-4.1, o3, and o4-mini, using the official o200k_base tokenizer. Runs in your browser — your text is never uploaded.

OpenAI (GPT-5, 4.x)

~0

approx until tokenizer loads

Characters

0

Words

0

What is a token?

OpenAI's current frontier models (GPT-4o, GPT-4.1, GPT-5 families, o3, o4-mini) all use the o200k_base tokenizer, an open-source BPE vocabulary of roughly 200,000 subwords. Older models (GPT-4, GPT-4-turbo, GPT-3.5-turbo) use cl100k_base, which is about half the size.

This tool uses o200k_base by default because it covers every model OpenAI currently recommends for new work. Counts here exactly match what the OpenAI API will bill you for — no approximation.

Token counts scale with content type: English prose runs ~4 chars/token, code and JSON ~3 chars/token, Chinese/Japanese/Korean ~1.5 chars/token, and emoji 1-4 tokens each.

How tokens relate to characters

TextCharacters~Tokens
Hello51
Hello, world!134
The quick brown fox jumps over the lazy dog.4411
function add(a, b) { return a + b; }3613
pneumonoultramicroscopicsilicovolcanoconiosis4511
🎉🚀✨ emoji counts are surprising3012

Frequently asked

Which tokenizer does this use?

o200k_base, the encoding used by GPT-4o, GPT-4.1, GPT-5, o3, and o4-mini. If you're using a legacy model (GPT-4, GPT-3.5-turbo), those use cl100k_base and counts will be 10-15% higher.

Does the count include system messages and tool schemas?

This tool counts whatever text you paste. In a real chat API call, you pay for the full serialized payload (system + messages + tools + response), so for production cost estimation, paste your complete formatted request, not just the user message.

Is this count exact?

Yes — js-tiktoken is a pure-JS port of OpenAI's own tiktoken library and produces byte-identical results. The official OpenAI Python tiktoken package would give you the same number.

Why is my count different from the OpenAI API response?

Most likely you're comparing against a model that uses cl100k_base instead of o200k_base (legacy GPT-4 / 3.5), or you're not including the system prompt, tool schemas, or message formatting overhead. The API usage field counts everything in the request.

How do I estimate cost from this?

Multiply tokens by the model's rate from /pricing. E.g. GPT-5 input is $2.50/1M so 5,000 tokens = $0.0125. For output tokens, use the output rate (usually 4-10× the input rate).

Know the tokens? Get the cost.

Once you've got a token count, the estimator turns it into an exact dollar forecast across every model.