Groq bills per token, split into input (your prompt) and output (the model's reply), priced per million tokens. Across the 8 models tracked here, output runs from Llama 3.1 8B Instant at $0.08 per 1M output tokens up to Kimi K2 at $3. Input is cheaper, and cached input cheaper still. The table below has every model's exact rates, verified June 14, 2026.
| Model | Input / 1M | Output / 1M |
|---|---|---|
| GPT OSS 20B | $0.075 | $0.3 |
| GPT OSS Safeguard 20B | $0.075 | $0.3 |
| GPT OSS 120B | $0.15 | $0.6 |
| Llama 4 Scout (17Bx16E) | $0.11 | $0.34 |
| Qwen3 32B | $0.29 | $0.59 |
| Llama 3.3 70B Versatile | $0.59 | $0.79 |
| Llama 3.1 8B Instant | $0.05 | $0.08 |
| Kimi K2 | — | $3 |
LPU-accelerated inference; flat per-token by model. Generous free tier with rate limits. Prices are per million tokens for LLMs, per million characters for TTS, and per hour transcribed for ASR. Batch API available at 50% lower cost. Prompt caching available for select models with no extra fee.
1 model rate is being re-verified and withheld until confirmed.
Source: groq.com · Catalog 2026-06-14.2. Confirm the live rate before you commit.
The rates above are per unit. Your bill is those rates times how hard your code leans on Groq, plus everything around it. PrePrice scans your project, finds where you call Groq, and computes your real cost per user and what to charge.
Find your real cost — freeBy output-token price, Llama 3.1 8B Instant is currently the cheapest Groq model at $0.08 per 1M output tokens and $0.05 per 1M input. The cheapest model per token is not always the cheapest per finished task, since a weaker model can need more retries or longer prompts.
Groq is pay-as-you-go and billed per token: you pay for input tokens (your prompt and context) and output tokens (the response) separately, priced per million tokens. There is no monthly base fee for the API itself.
The per-unit rate is only one of the four numbers that set your bill. The others are how much your app uses Groq, how often, and across how many users. Long prompts, retries, multi-step agents, and uncached repeated context all multiply the rate. PrePrice models that usage from your code so the number is real, not a guess.
From Groq's official pricing page (groq.com), verified June 14, 2026 and re-checked on a schedule. Pricing changes often, so confirm the live rate before you commit.
See the full AI Cost Index or estimate your monthly bill.