Every time you send a prompt to an AI model — whether it's GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro — you're charged by the token. But what exactly is a token? It's not a word, not a character, and not a byte. Understanding tokens is the first step to controlling your AI API costs, and it's simpler than most developers think.

What Is a Token?

A token is the smallest unit of text that an AI model processes. Before your text reaches the model, it passes through a tokenizer — a program that splits your input into chunks. These chunks can be whole words, parts of words, punctuation marks, or even spaces.

For example, the sentence Hello, how are you? might be split into five tokens: Hello , how are you ?. Notice that the space before "how" is part of the token — tokenizers treat spaces as meaningful characters.

A token is roughly 4 characters in English, or about ¾ of a word. But this is just an approximation — the actual count depends on the specific text and the tokenizer used.

The key insight is that different AI providers use different tokenizers. The same text will produce different token counts across OpenAI, Anthropic, and Google. This means the same prompt can cost different amounts depending on which model you use.

How Different Providers Count Tokens

Each major AI provider has its own tokenization method. Here's how they compare:

Provider Tokenizer ~Tokens per Word Chinese Char
OpenAI (GPT-5.5, GPT-5.4) tiktoken (o200k_base) 1.3 2-3 tokens
Anthropic (Claude Opus/Sonnet) Proprietary ~1.3 1.5-2.5 tokens
Google (Gemini) SentencePiece ~1.2 1-2 tokens
DeepSeek tiktoken-compatible ~1.3 2-3 tokens
Practical Tip

1,000 English words ≈ 1,300 tokens. 500 Chinese characters ≈ 1,000–1,500 tokens. Use these ratios for quick estimates before calculating exact costs.

How AI API Pricing Works

AI providers charge per million tokens, but not all tokens are priced equally. There are typically three pricing tiers:

Input Tokens

These are the tokens you send to the model — your prompt, system instructions, conversation history, and any context you provide. Input tokens are almost always cheaper than output tokens because the model processes them in parallel.

Output Tokens

These are the tokens the model generates in response. Output tokens cost more because the model generates them one at a time, which requires more computation. Most providers charge 3–6x more for output than input.

Cache Read Tokens

If you send the same prefix (like a system prompt) repeatedly, providers can cache it. Cached tokens are dramatically cheaper — typically 90% less than regular input tokens. This is one of the most effective ways to reduce costs for applications with repetitive prompts.

Real-World Pricing Comparison (May 2026)

Here's what the major providers actually charge, per million tokens:

Model Input Output Cache Read
GPT-5.5 $5.00 $30.00 $0.50
GPT-5.4 $2.50 $15.00 $0.25
Claude Opus 4.7 $5.00 $25.00 $0.50
Claude Sonnet 4.6 $3.00 $15.00 $0.30
Gemini 3.1 Pro $2.00 $12.00 $0.50
DeepSeek V4 Pro* $0.435 $0.87 $0.0036
DeepSeek V4 Flash $0.14 $0.28 $0.0028

* DeepSeek V4 Pro: 75% off until May 31, 2026. Original price: $1.74 input, $3.48 output.

How to Estimate Your Costs

Let's walk through a real example. Suppose you're building a chatbot that:

  • Sends a 2,000-token system prompt with each request
  • Maintains 3,000 tokens of conversation history
  • Generates about 500 tokens of response per message
  • Handles 10,000 messages per day

Daily token usage:

  • Input: (2,000 + 3,000) × 10,000 = 50M tokens
  • Output: 500 × 10,000 = 5M tokens

Using GPT-5.4 ($2.50 input / $15.00 output):

  • Input cost: 50 × $2.50 = $125.00
  • Output cost: 5 × $15.00 = $75.00
  • Daily total: $200.00

With prompt caching (system prompt cached):

  • Input cost: (20 × $0.25) + (30 × $2.50) = $5.00 + $75.00 = $80.00
  • Output cost: 5 × $15.00 = $75.00
  • Daily total: $155.00 (22% savings)

Switching to DeepSeek V4 Flash would bring the daily cost down to just $8.40 — a 96% reduction. The trade-off is model capability, but for many applications, it's more than enough.

Quick Reference

1M input tokens ≈ 750,000 English words ≈ two full-length novels

1M output tokens ≈ the content of a 300-page book

At GPT-5.4 pricing, that's $2.50 to read two novels, and $15.00 to generate one.

5 Ways to Reduce Your Token Costs

  1. Use prompt caching. If your application sends the same system prompt repeatedly, caching saves up to 90% on those tokens. Most providers support this automatically.
  2. Shorten your prompts. Every token in your system prompt is repeated with every request. Remove unnecessary instructions, compress verbose examples, and use concise language.
  3. Choose the right model. Not every task needs GPT-5.5. Use cheaper models like DeepSeek V4 Flash or GPT-5.4 Nano for simple tasks, and reserve premium models for complex reasoning.
  4. Limit output length. Set a reasonable max_tokens parameter. If you only need a short answer, don't let the model generate a novel.
  5. Count before you send. Use a token counter to measure your input size before making API calls. Unexpected token counts are the #1 cause of budget overruns.

Calculate your exact costs for any model

Open Cost Calculator

This guide is updated regularly to reflect current pricing. Last updated April 2026. Prices are sourced directly from each provider's official documentation. For the most accurate and up-to-date rates, always check the provider's pricing page.