Every time you send a prompt to an AI model — whether it's GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro — you're charged by the token. But what exactly is a token? It's not a word, not a character, and not a byte. Understanding tokens is the first step to controlling your AI API costs, and it's simpler than most developers think.
What Is a Token?
A token is the smallest unit of text that an AI model processes. Before your text reaches the model, it passes through a tokenizer — a program that splits your input into chunks. These chunks can be whole words, parts of words, punctuation marks, or even spaces.
For example, the sentence Hello, how are you? might be split into five tokens: Hello , how are you ?. Notice that the space before "how" is part of the token — tokenizers treat spaces as meaningful characters.
A token is roughly 4 characters in English, or about ¾ of a word. But this is just an approximation — the actual count depends on the specific text and the tokenizer used.
The key insight is that different AI providers use different tokenizers. The same text will produce different token counts across OpenAI, Anthropic, and Google. This means the same prompt can cost different amounts depending on which model you use.
How Different Providers Count Tokens
Each major AI provider has its own tokenization method. Here's how they compare:
| Provider | Tokenizer | ~Tokens per Word | Chinese Char |
|---|---|---|---|
| OpenAI (GPT-5.5, GPT-5.4) | tiktoken (o200k_base) | 1.3 | 2-3 tokens |
| Anthropic (Claude Opus/Sonnet) | Proprietary | ~1.3 | 1.5-2.5 tokens |
| Google (Gemini) | SentencePiece | ~1.2 | 1-2 tokens |
| DeepSeek | tiktoken-compatible | ~1.3 | 2-3 tokens |
1,000 English words ≈ 1,300 tokens. 500 Chinese characters ≈ 1,000–1,500 tokens. Use these ratios for quick estimates before calculating exact costs.
How AI API Pricing Works
AI providers charge per million tokens, but not all tokens are priced equally. There are typically three pricing tiers:
Input Tokens
These are the tokens you send to the model — your prompt, system instructions, conversation history, and any context you provide. Input tokens are almost always cheaper than output tokens because the model processes them in parallel.
Output Tokens
These are the tokens the model generates in response. Output tokens cost more because the model generates them one at a time, which requires more computation. Most providers charge 3–6x more for output than input.
Cache Read Tokens
If you send the same prefix (like a system prompt) repeatedly, providers can cache it. Cached tokens are dramatically cheaper — typically 90% less than regular input tokens. This is one of the most effective ways to reduce costs for applications with repetitive prompts.
Real-World Pricing Comparison (May 2026)
Here's what the major providers actually charge, per million tokens:
| Model | Input | Output | Cache Read |
|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | $0.50 |
| GPT-5.4 | $2.50 | $15.00 | $0.25 |
| Claude Opus 4.7 | $5.00 | $25.00 | $0.50 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 |
| Gemini 3.1 Pro | $2.00 | $12.00 | $0.50 |
| DeepSeek V4 Pro* | $0.435 | $0.87 | $0.0036 |
| DeepSeek V4 Flash | $0.14 | $0.28 | $0.0028 |
* DeepSeek V4 Pro: 75% off until May 31, 2026. Original price: $1.74 input, $3.48 output.
How to Estimate Your Costs
Let's walk through a real example. Suppose you're building a chatbot that:
- Sends a 2,000-token system prompt with each request
- Maintains 3,000 tokens of conversation history
- Generates about 500 tokens of response per message
- Handles 10,000 messages per day
Daily token usage:
- Input: (2,000 + 3,000) × 10,000 = 50M tokens
- Output: 500 × 10,000 = 5M tokens
Using GPT-5.4 ($2.50 input / $15.00 output):
- Input cost: 50 × $2.50 = $125.00
- Output cost: 5 × $15.00 = $75.00
- Daily total: $200.00
With prompt caching (system prompt cached):
- Input cost: (20 × $0.25) + (30 × $2.50) = $5.00 + $75.00 = $80.00
- Output cost: 5 × $15.00 = $75.00
- Daily total: $155.00 (22% savings)
Switching to DeepSeek V4 Flash would bring the daily cost down to just $8.40 — a 96% reduction. The trade-off is model capability, but for many applications, it's more than enough.
1M input tokens ≈ 750,000 English words ≈ two full-length novels
1M output tokens ≈ the content of a 300-page book
At GPT-5.4 pricing, that's $2.50 to read two novels, and $15.00 to generate one.
5 Ways to Reduce Your Token Costs
- Use prompt caching. If your application sends the same system prompt repeatedly, caching saves up to 90% on those tokens. Most providers support this automatically.
- Shorten your prompts. Every token in your system prompt is repeated with every request. Remove unnecessary instructions, compress verbose examples, and use concise language.
- Choose the right model. Not every task needs GPT-5.5. Use cheaper models like DeepSeek V4 Flash or GPT-5.4 Nano for simple tasks, and reserve premium models for complex reasoning.
- Limit output length. Set a reasonable
max_tokensparameter. If you only need a short answer, don't let the model generate a novel. - Count before you send. Use a token counter to measure your input size before making API calls. Unexpected token counts are the #1 cause of budget overruns.
Calculate your exact costs for any model
Open Cost CalculatorThis guide is updated regularly to reflect current pricing. Last updated April 2026. Prices are sourced directly from each provider's official documentation. For the most accurate and up-to-date rates, always check the provider's pricing page.