The AI Cost Index

What AI actually costs, per million tokens.

AI model APIs bill per token, split into input and output and quoted per 1 million tokens. As of June 14, 2026, PrePrice tracks 76 models across 10 providers with output token prices from $0.08 to $75 per 1M. Output tokens consistently cost more than input, which is why your real bill depends on how your app spends tokens, not the sticker rate. Every price below links to the provider's official pricing page.

Last verified June 14, 2026Catalog 2026-06-14.2Estimate your monthly bill →

LLM API pricing, every model we track

Tap a column to sort · prices in USD / 1M tokens
AI model API pricing per 1 million tokens, input and output, sorted by output price.
Llama 3.1 8B InstantGroq$0.05$0.08
LFM2 24B A2BTogether AI$0.03$0.12
Gemma 3n E4B InstructTogether AI$0.06$0.12
Qwen3.5 9BTogether AI$0.1$0.15
gpt-oss-20BTogether AI$0.05$0.2
deepseek-v4-flashDeepSeek$0.14$0.28
GPT OSS 20BGroq$0.075$0.3
GPT OSS Safeguard 20BGroq$0.075$0.3
Mistral Small 4Mistral$0.1$0.3
Llama 4 Scout (17Bx16E)Groq$0.11$0.34
Gemma 4 31BTogether AI$0.2$0.5
Qwen3 32BGroq$0.29$0.59
Command R+ 04-2024Cohere$0.15$0.6
Command-lightCohere$0.3$0.6
GPT OSS 120BGroq$0.15$0.6
Qwen3 235B A22B FP8 ThroughputTogether AI$0.2$0.6
gpt-oss-120BTogether AI$0.15$0.6
Llama 3.3 70B VersatileGroq$0.59$0.79
deepseek-v4-proDeepSeek$0.435$0.87
DeepSeek-R1-0528Together AI$0.18$0.88
CodestralMistral$0.3$0.9
MiniMax M2.7Together AI$0.3$1.2
Qwen3-Coder-NextTogether AI$0.5$1.2
MiniMax M2.5Together AI$0.3$1.2
Command R 03-2024Cohere$0.5$1.5
Aya ExpanseCohere$0.5$1.5
Gemini 3.1 Flash-LiteGoogle Gemini$1.5
Mistral Large 3Mistral$0.5$1.5
DeepSeek-V3.1Together AI$0.6$1.7
CommandCohere$1$2
Gemini 2.5 FlashGoogle Gemini$2.5
Kimi K2.5Together AI$0.5$2.8
Gemini 3 Flash PreviewGoogle Gemini$3
Kimi K2Groq$3
Qwen3.6-PlusTogether AI$0.5$3
GLM-5Together AI$1$3.2
Llama 3.3 70BTogether AI$3.5$3.5
Qwen3.5-397B-A17BTogether AI$0.6$3.6
Claude Haiku 3.5Anthropic$0.8$4
DeepSeek V4 ProTogether AI$0.6$4.4
GLM-5.1Together AI$1.4$4.4
GPT-5.4 miniOpenAI$0.75$4.5
Kimi K2.6Together AI$1.2$4.5
Claude Haiku 4.5Anthropic$1$5
Magistral MediumMistral$2$5
Claude Haiku 3Anthropic$0.5$6.25
Mistral Medium 3.5Mistral$1.5$7.5
Gemini 3.5 FlashGoogle Gemini$1.5$9
Command R+ 08-2024Cohere$2.5$10
Gemini 2.5 Pro (<=200k)Google Gemini$1.25$10
Gemini 3.1 Pro Preview (<=200k, incl. thinking)Google Gemini$12
Claude Sonnet 4.5Anthropic$3$15
Claude Sonnet 4.6Anthropic$3$15
Claude Sonnet 4Anthropic$3$15
Gemini 2.5 Pro (>200k)Google Gemini$2.5$15
GPT-5.4OpenAI$2.5$15
Gemini 3.1 Pro Preview (>200k, incl. thinking)Google Gemini$18
GPT-Realtime-2 textOpenAI$4$24
Claude Opus 4.8Anthropic$5$25
Claude Opus 4.5Anthropic$5$25
Claude Opus 4.6Anthropic$5$25
Claude Opus 4.7Anthropic$5$25
GPT-5.5OpenAI$5$30
GPT-Image-2 imageOpenAI$8$30
Claude Mythos 5Anthropic$10$50
Claude Fable 5Anthropic$10$50
GPT-Realtime-2 audioOpenAI$32$64
Claude Opus 4.1Anthropic$15$75
Claude Opus 4Anthropic$15$75
Gemini 3.1 Flash-Lite (text/image/video)Google Gemini$0.25
Gemini 2.5 Flash (text/image/video)Google Gemini$0.3
Gemini 3 Flash Preview (text/image/video)Google Gemini$0.5
Gemini 3.1 Pro Preview (<=200k)Google Gemini$2
Gemini 3.1 Pro Preview (>200k)Google Gemini$4
GPT-Realtime-2 imageOpenAI$5
GPT-Image-2 textOpenAI$5

Cached-input reads are typically ~10% of the input rate and are the biggest lever on a high-volume bill. Rates that could not be verified are withheld rather than shown. Always confirm the live price on the provider's page before you commit.

Sticker price is not your price.

These are list rates per token. What your app actually costs depends on how many tokens each user burns, what you cache, and which models you call where. PrePrice scans your code and tells you your real cost per user and what to charge.

Find your real cost — free

The four numbers that decide your AI bill

The per-token price in the table is one input of four. Your monthly AI cost is, roughly, tokens per request × requests per user × active users × price per token. The sticker rate is the smallest and most quoted of the four, and the one you control least. The other three are decisions in your code: how long your prompts are, how often you call the model, and whether you cache.

Input vs output, and why output is the one to watch

Across the providers above, output tokens cost several times more than input, and reasoning models bill their hidden thinking tokens as output. An app that returns long answers, writes code, or runs a multi-step agent spends most of its budget on output. An app that classifies or extracts spends most on input. Two apps on the identical model can have wildly different bills.

Caching is the cheapest 90% you are probably not taking

Cached input reads are typically about 10% of the standard input rate. If your system prompt, tools, or retrieved context repeat across requests, prompt caching can cut input cost by an order of magnitude. Most teams leave it off because the savings are invisible until someone models the per-request token flow, which is exactly what a scan does.

The model price is not your stack price

A real AI app also pays for hosting, a vector database, embeddings, auth, payments, and observability. PrePrice tracks 156+ services so the model is priced next to everything around it. The fastest way to see your whole bill, and what to charge so it clears margin, is to point a scan at your code.

Cost pages by provider

Or see all 156 platforms we price, including hosting, vector databases, auth, and payments.

AI cost questions, answered

How much does the OpenAI / Claude / Gemini API cost?

LLM APIs bill per token, split into input (the prompt) and output (the response), quoted per million tokens. Across the 10 providers tracked here, output tokens currently range from about $0.08 to $75 per 1M, and output almost always costs several times more than input. Pick a specific model in the table above for its exact input, output, and cached-input rates.

Why is my real AI bill higher than the per-token price suggests?

Because the sticker rate is per token, but your bill is tokens-per-request times requests-per-user times users, plus retries, system prompts, tool calls, and reasoning tokens you don't see. The same model can cost 5-10x more in a chatty agent than in a one-shot classifier. Per-token price tells you almost nothing about your monthly bill until you model your actual usage.

What is the cheapest LLM API?

By output-token price, Llama 3.1 8B Instant (Groq) is currently the cheapest in this index at $0.08 per 1M output tokens. Cheapest is not the same as best value: a weaker model that needs more retries or longer prompts can cost more in practice than a pricier model that gets it right in one pass.

How often are these prices updated?

This index is generated from the PrePrice pricing catalog (version 2026-06-14.2), last verified June 14, 2026. Every model links to the provider's official pricing page so you can confirm the live rate before you commit.

Does this include hosting, vector databases, and the rest of the stack?

Yes. Beyond LLM APIs, PrePrice tracks 156+ services across hosting, vector databases, auth, payments, voice, search, and analytics. An AI app's bill is rarely just the model. See any service's cost page or run a scan to get the whole stack priced together.

Now price your own app against all of it.

Point PrePrice at your project. We detect every paid service it calls, compute your real cost per user, and tell you what to charge. Most scans finish in 2 to 4 minutes. Free to run.

Start free