Choosing an AI provider used to be simple: OpenAI was the default. In 2026, there are four major providers, twenty-plus models, and pricing structures that vary wildly. A prompt that costs $30 on GPT-5.5 might cost $0.28 on DeepSeek V4 Flash. This guide puts every model side by side so you can make an informed decision.
All prices below are per million tokens (MTok) in USD, sourced from official provider documentation as of May 2026.
OpenAI — 6 Models
OpenAI offers the widest model range, from the ultra-cheap Nano to the premium GPT-5.5 Pro. Their Batch API provides 50% off all prices for non-real-time workloads.
| Model | Input | Cached | Output | Context |
|---|---|---|---|---|
| GPT-5.5 | $5.00 | $0.50 | $30.00 | 1M |
| GPT-5.5 Pro | $30.00 | — | $180.00 | 1M |
| GPT-5.4 | $2.50 | $0.25 | $15.00 | 1M |
| GPT-5.4 Mini | $0.75 | $0.075 | $4.50 | 1M |
| GPT-5.4 Nano | $0.20 | $0.02 | $1.25 | 128K |
| GPT-5.4 Pro | $21.00 | — | $168.00 | 1M |
Batch API: 50% off all prices for async workloads (24-hour turnaround). Combine with caching for maximum savings.
GPT-5.5 Pro / GPT-5.4 Pro: Extended thinking models designed for the most complex reasoning tasks. No cache pricing available — these are meant for one-shot difficult problems, not high-volume repeated calls.
GPT-5.4 Nano: The budget option. At $0.20 input, it's 25x cheaper than GPT-5.5. Best for classification, extraction, and simple tasks.
Anthropic (Claude) — 6 Models
Anthropic's lineup spans from Haiku (fast and cheap) to Opus (the most capable). They're the only provider that charges a premium for cache writes — you pay 25% more the first time, then 90% less on every subsequent read.
| Model | Input | Cache Write | Cache Read | Output | Context |
|---|---|---|---|---|---|
| Opus 4.7 | $5.00 | $6.25 | $0.50 | $25.00 | 1M |
| Opus 4.6 | $5.00 | $6.25 | $0.50 | $25.00 | 1M |
| Opus 4.5 | $5.00 | $6.25 | $0.50 | $25.00 | 1M |
| Sonnet 4.6 | $3.00 | $3.75 | $0.30 | $15.00 | 1M |
| Sonnet 4.5 | $3.00 | $3.75 | $0.30 | $15.00 | 1M |
| Haiku 4.5 | $1.00 | $1.25 | $0.10 | $5.00 | 200K |
Cache write premium: Anthropic charges 1.25x the normal input price on the first cache write. This means caching only saves money if you send the same prefix 3+ times. For a prompt sent twice, you break even.
Opus 4.5 vs 4.6 vs 4.7: All three are priced identically. Use the latest (4.7) unless you have a specific reason to use an older version. Opus 4.6 and 4.5 remain available for compatibility.
Sonnet 4.5 vs 4.6: Also identically priced. Sonnet 4.6 is the recommended choice for most applications — it's faster and more capable than 4.5.
Haiku 4.5: Anthropic's budget option at $1/$5. Best for high-volume tasks that need Claude's instruction-following quality without Opus-level reasoning.
Google (Gemini) — 6 Models
Google offers the most aggressive pricing on their older models. Gemini 2.5 Flash-Lite at $0.10 input is the cheapest model from any major provider. Their newer Gemini 3.x models are more capable but priced higher.
| Model | Input | Cached | Output | Context |
|---|---|---|---|---|
| Gemini 3.1 Pro | $2.00 | $0.50 | $12.00 | 1M |
| Gemini 3 Deep Think | $4.00 | $1.00 | $24.00 | 1M |
| Gemini 3 Flash | $0.50 | $0.05 | $3.00 | 1M |
| Gemini 2.5 Pro | $1.25 | $0.125 | $10.00 | 1M |
| Gemini 2.5 Flash | $0.30 | $0.03 | $2.50 | 1M |
| Gemini 2.5 Flash-Lite | $0.10 | $0.01 | $0.40 | 1M |
200K threshold: Gemini 3.1 Pro doubles its pricing for prompts over 200K tokens — from $2.00 to $4.00 input, $12.00 to $18.00 output. If your prompts regularly exceed 200K, factor this into your cost calculations.
Gemini 2.5 Pro: Also has a 200K threshold — $1.25 input becomes $2.50, $10.00 output becomes $15.00. Same pattern across Google's Pro-tier models.
Free tier: Google is the only provider that offers a genuine free tier for API access (rate-limited). Good for prototyping and low-volume personal projects.
Gemini 2.5 Flash-Lite: At $0.10/$0.40, this is the cheapest model from any major provider. It's limited compared to Flash, but for simple extraction and classification tasks, it's unbeatable on cost.
DeepSeek — 2 Models
DeepSeek's strategy is simple: be the cheapest. Their cache read pricing is 98-99% below normal input prices — the most aggressive in the industry. The V4 Pro model's 75% price cut was made permanent in May 2026.
| Model | Input | Cache Hit | Output | Context |
|---|---|---|---|---|
| V4 Flash | $0.14 | $0.0028 | $0.28 | 1M |
| V4 Pro | $0.435 | $0.0036 | $0.87 | 1M |
* V4 Pro: permanently reduced to $0.435 input, $0.87 output (75% below original launch price of $1.74/$3.48). Cache hit price reduced to 1/10 of launch price as of April 26, 2026.
Cache pricing: DeepSeek's cache hit price is 1/50th of the normal input price for V4 Flash, and 1/120th for V4 Pro. For agent applications with repeated context, cached tokens are essentially free.
Permanent pricing: V4 Pro's 75% discount was made permanent on May 22, 2026. The price is now locked at $0.435/$0.87 — no expiration date. This makes it one of the most cost-efficient frontier reasoning models available.
Thinking mode: Both models support thinking mode for complex reasoning. Thinking tokens are billed at output rates.
The Full Picture: All 20 Models Ranked by Input Price
Here's every model from all four providers, ranked cheapest to most expensive on input price:
| # | Model | Provider | Input | Output | Ratio |
|---|---|---|---|---|---|
| 1 | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 4x | |
| 2 | DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 2x |
| 3 | GPT-5.4 Nano | OpenAI | $0.20 | $1.25 | 6.3x |
| 4 | Gemini 2.5 Flash | $0.30 | $2.50 | 8.3x | |
| 5 | DeepSeek V4 Pro | DeepSeek | $0.435 | $0.87 | 2x |
| 6 | Gemini 3 Flash | $0.50 | $3.00 | 6x | |
| 7 | GPT-5.4 Mini | OpenAI | $0.75 | $4.50 | 6x |
| 8 | Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 5x |
| 9 | Gemini 2.5 Pro | $1.25 | $10.00 | 8x | |
| 10 | Gemini 3.1 Pro | $2.00 | $12.00 | 6x | |
| 11 | GPT-5.4 | OpenAI | $2.50 | $15.00 | 6x |
| 12 | Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 5x |
| 13 | Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | 5x |
| 14 | Gemini 3 Deep Think | $4.00 | $24.00 | 6x | |
| 15 | GPT-5.5 | OpenAI | $5.00 | $30.00 | 6x |
| 16 | Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 5x |
| 17 | GPT-5.4 Pro | OpenAI | $21.00 | $168.00 | 8x |
| 18 | GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | 6x |
The Ratio column shows output-to-input price multiplier. DeepSeek has the lowest ratio (2x) — their output tokens are proportionally much cheaper than competitors. This matters for applications that generate long responses.
Head-to-Head: Same Tier, Different Providers
Comparing models at similar capability levels reveals the price gaps:
Frontier Models (Best Capability)
| Model | Input | Output | Monthly Cost* |
|---|---|---|---|
| Gemini 3.1 Pro | $2.00 | $12.00 | $3,900 |
| Claude Opus 4.7 | $5.00 | $25.00 | $6,375 |
| GPT-5.5 | $5.00 | $30.00 | $7,500 |
* Based on 10K requests/day, 5K input tokens, 500 output tokens per request.
Mid-Tier Models (Best Balance)
| Model | Input | Output | Monthly Cost* |
|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $10.00 | $3,375 |
| GPT-5.4 | $2.50 | $15.00 | $6,000 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $6,375 |
Budget Models (Cheapest)
| Model | Input | Output | Monthly Cost* |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $30 |
| DeepSeek V4 Flash | $0.14 | $0.28 | $71 |
| GPT-5.4 Nano | $0.20 | $1.25 | $75 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $300 |
Cached Token Pricing: Who Saves the Most?
If your application sends the same system prompt or context repeatedly, cache pricing is more important than normal input pricing. Here's how providers compare on cache read rates:
| Provider | Best Cache Rate | Model | Savings vs Normal |
|---|---|---|---|
| DeepSeek | $0.0028 | V4 Flash | 98% |
| DeepSeek | $0.0036 | V4 Pro | 99% |
| $0.01 | 2.5 Flash-Lite | 90% | |
| OpenAI | $0.02 | GPT-5.4 Nano | 90% |
| Anthropic | $0.10 | Haiku 4.5 | 90% |
All providers offer roughly 90% savings on cached tokens — except DeepSeek, which offers 98-99%. For high-volume agent applications, DeepSeek's cache pricing makes repeated tokens essentially free.
Provider Strengths Summary
Choose OpenAI when:
- You need the widest model selection (6 tiers from Nano to Pro)
- Your workload can use the Batch API for 50% savings
- You're already in the OpenAI ecosystem
- You need extended reasoning (GPT-5.5 Pro / GPT-5.4 Pro)
Choose Anthropic when:
- Accuracy and low hallucination are critical
- You're building code generation or developer tools
- You need strong instruction following and agentic capabilities
- You want consistent pricing across model generations
Choose Google when:
- Cost is the primary concern (cheapest models available)
- You need a free tier for prototyping
- You need strong multilingual support
- You want 1M context on budget models (Flash-Lite and Flash)
Choose DeepSeek when:
- You need the absolute lowest cost per token
- Your application has high cache hit rates (agent workflows)
- You're building high-volume, cost-sensitive applications
- You want the lowest output-to-input price ratio (2x vs 5-8x)
Calculate your exact costs for any model
Open Cost CalculatorWhen to Use a Budget Model (and When Not To)
Use a budget model when:
- The task is well-defined and constrained (extract, classify, summarize)
- You need high throughput (10K+ requests/day)
- Output quality has a clear "good enough" threshold
- You're prototyping and want to minimize burn rate
- You're building a pipeline where a cheap model handles 90% of cases and a premium model handles edge cases
Stick with a frontier model when:
- The task requires complex, multi-step reasoning
- Accuracy is critical and errors are costly (legal, medical, financial)
- You need the model to write production-quality code
- The task involves nuanced judgment or creative work
- You're using the model as the core product, not a utility
The smartest architecture uses cheap models for the heavy lifting and expensive models for the hard problems. Route 90% of your traffic to a $0.10/M model and reserve the $5.00/M model for the 10% that actually needs it.
The Bottom Line
AI API pricing has collapsed. A task that cost $100 per day two years ago now costs $1. The models on this page cover the full spectrum — from $0.10/M Flash-Lite to $30/M frontier models — and each has a clear use case. Gemini 2.5 Flash-Lite at $0.10/M is the cheapest option from a major provider. DeepSeek V4 Flash at $0.14/$0.28 offers the best all-around value for budget workloads. And DeepSeek V4 Pro at $0.435/$0.87 delivers frontier capability for pocket change.
The key is matching the model to the task. Don't pay $5/M for GPT-5.5 to classify emails. Don't use Flash-Lite to write complex code. Pick the right tool, use caching aggressively, and your AI API bill drops from a line item to a rounding error. If you want to automate the model selection process, see our guide on building a model router.
Stay Current
AI API pricing changes frequently. Providers drop prices, launch promotions, and release new models every few months. Bookmark this page — we update it regularly as rates change. For the most current pricing, always check the provider's official pricing page:
- OpenAI: openai.com/api/pricing
- Anthropic: anthropic.com/pricing
- Google: ai.google.dev/pricing
- DeepSeek: api-docs.deepseek.com/quick_start/pricing
Pricing sourced from official provider documentation (May 11, 2026). DeepSeek V4 Pro pricing reflects the permanent 75% price reduction confirmed May 22, 2026. Google's Gemini 2.5 Pro and 3.1 Pro have higher pricing for prompts exceeding 200K tokens. All figures reflect standard API pricing without volume discounts.