Sample AI cost audit

Code wiped 5.0s after report·never persisted to our database·SHA-256 a3f291…6f8

VerdictPer user / month

✓

Healthy

$1.89/ user / month

Worst case: $9.45 / month (top 10% of users will use this much or more)

Margin at your price91%

How we picked this verdict

We compare your cost to your stated price. The cost figure includes every paid service in your stack: hosting, database, auth, monitoring, email, search, payments, and (if present) any AI, LLM, embedding, vector database, or voice spend. We total it at the audience size you have selected on the slider.

Green. Typical cost is 30 percent or less of your price (margin 70 percent or better). Pricing is healthy.
Yellow. Typical cost is 30 to 60 percent of your price (margin 40 to 70 percent). Workable, but worth tightening.
Red. Typical cost is over 60 percent of your price, or your worst-case heavy user costs more than they pay. You are losing money on power users.

The "worst case" figure is the P95: the top 10 percent of users (the heaviest ones) cost at least this much. Heavy users dominate margin on freemium and chat apps especially, so we always show both.

We look at the right unit for the app. Chat and freemium apps get judged per user per month. Agentic apps that run a clear job (research agent, code generator, video render) get judged per run, because that is how they get billed.

Cost projection

Drag to scale by audience size

Users20

AI / API spend$38

Infra (hosting + DB + …)$0

Total monthly bill$38

Total = AI bill + infra at current scale ($0 across 5 services). Worst-case AI: $189 / mo · Annual run-rate: $454.

Detected stack · what you're running on

5 services across 4 categories · $0 / mo · 1 verified

Drag the cost projection slider above to watch each service cross tier boundaries. Verified services use multi-dimensional pricing. Others fall back to Sonnet's best estimate.

Hosting· Compute · 1 service

Free

Vercel

View pricingFree

HobbyPrice wrong?

Show math

Assumed audience: 20 MAU (drag the slider above to override). Per-line quantities are scaled from this baseline via per-MAU industry defaults; override an individual service with explicit values to recalculate.

Plan: Hobby

Total$0.00 / mo

Auth· Connectors · 1 service

Free

Next Auth

View pricingFree

Open source (Auth.js) · est.

Price wrong?

Database· Data · 2 services

Free

PostgreSQL (Drizzle ORM)

View pricingFree

Free tier (Vercel Postgres 256MB or Neon 0.5GB) · est.

Price wrong?next: $19.00 past 10k

Redis (Upstash or Vercel KV)

View pricingFree

Upstash Free (10k commands/day) · est.

Price wrong?

Monitoring· Observability · 1 service

Free

OpenTelemetry + Vercel Observability

View pricingBundled

Bundled with Vercel Pro (basic observability) · est.

Price wrong?

About what you built

This isn't right

Multi-model AI chat with code execution, spreadsheet editing, and writing suggestions powered by Vercel AI SDK and Next.js.

What it doesAI Chat With Artifacts

Who it's forProsumer creators and developers

Where it runsWeb

Features we found · 7

What we found · 4 issues

All 4 fixes shipped:~$729.00 / month saved at 1k users

Uncached System Prompt In Chat Streaming

High confidence

app/(chat)/api/chat/route.ts:232~$540.00/mo saved at 1k users

What's wrong

System prompt is reconstructed on every chat call without prompt caching; assuming 2000+ token system prompt re-sent at full input cost every turn.

How to fix it

Add cache_control: {type: 'ephemeral'} to the system message if using Anthropic models (Claude Haiku 4.5 or Sonnet 4.5 support it).

Finding looks wrong?

Tool-Use Agent Loop Amplification

Medium confidence

app/(chat)/api/chat/route.ts:237~$162.00/mo saved at 1k users

What's wrong

Five tools available per invocation; no explicit call-count cap beyond stepCountIs(5). Each tool round-trip adds 1500-2000 tokens of context exchange; assuming 2 tool calls per chat turn on average.

How to fix it

Audit tool_choice parameter; consider tool_choice: {type: 'required', toolName: '...'} to prevent multi-tool cascades, or add max_tool_calls if your SDK exposes it.

Finding looks wrong?

Uncached Suggestions System Prompt

Medium confidence

lib/ai/tools/request-suggestions.ts:45~$27.00/mo saved at 1k users

What's wrong

streamText call with fixed 50-word system prompt lacks caching; repeated calls with same document content re-process identical instructions at full input cost.

How to fix it

Add cache_control to system prompt if using Anthropic; evaluate OpenAI's prompt caching if model supports it.

Finding looks wrong?

Unknown Model Pricing Defaulted

Low confidence

lib/ai/models.ts:26~$0/mo saved at 1k users

What's wrong

Multiple models in the model_registry (deepseek-v3.2, codestral, kimi-k2.5, gpt-oss, grok-4.1) lack pricing data in the pricing_data catalog; cost estimates default to Claude Haiku 4.5 rates, which may underestimate by 2-5x if user selects a more expensive model.

How to fix it

Add explicit pricing for each model in the registry or gate expensive models behind paid tiers; flag unknown models in the UI with a cost warning.

Finding looks wrong?

Things to watch · 9 flagged

High risk

Unknown Model Pricing

Models deepseek-v3.2, codestral, mistral-small, kimi-k2.5, gpt-oss, and grok-4.1 detected in lib/ai/models.ts:26 but not found in pricing_data.models. Cost estimates default to Claude Haiku 4.5 ($1/M input, $5/M output); actual cost may be 2-5× higher if user selects an o1-class or GPT-5-class reasoning model.

How to fix: Add explicit pricing for each model in pricing_data or gate high-cost models behind paid tiers. Surface a cost-per-call estimate in the model selector UI.

High risk

Output Cap Missing

No max_tokens or maxTokens parameter visible in app/(chat)/api/chat/route.ts:232 streamText call or hooks/use-active-chat.tsx:89 useChat config. Output can theoretically reach model's native limit (4096-8192 tokens for most models), causing unpredictable cost spikes on long-form responses.

How to fix: Add max_tokens: 1500 (or category-appropriate cap) to streamText options and useChat transport config. This prevents runaway output and caps worst-case cost per turn.

Medium risk

Retry Amplification

No retry backoff or rate-limit middleware detected in API routes (app/(chat)/api/chat/route.ts, app/(chat)/api/suggestions/route.ts). If upstream LLM returns 429 or 5xx, client-side retries via useChat's automatic retry mechanism could amplify cost by 2-3× during transient outages.

How to fix: Add exponential backoff middleware at the transport layer; use Vercel AI Gateway's built-in retry with jitter if available. Cap client-side retries to 1-2 attempts.

High risk

Free Tier Without Per-User Cap

User rate limiting present (app/(chat)/api/chat/route.ts:85 checks getMessageCountByUserId against entitlementsByUserType.maxMessagesPerHour) but no cost-per-user budget guard. If a guest user maxes out their 3 messages/hour quota with tool-heavy or long-output requests, cost can reach $0.10-0.30/user—eating 50-150% of free tier allowance if you offer unlimited guests.

How to fix: Add per-user monthly cost accumulator; pause service for guest users who exceed $0.05/mo COGS until they upgrade. Track actual cost-per-user in Redis or DB.

Medium risk

Reasoning Model Flag Ignored

app/(chat)/api/chat/route.ts:237 checks isReasoningModel to disable tools, but no cost multiplier applied. Reasoning models (o1, o3, Claude Opus thinking, GPT-5) incur 2-5× hidden token overhead for chain-of-thought that isn't visible in prompt or completion. Current cost estimate may be 50-400% low if user selects a reasoning model.

How to fix: Apply a 3× multiplier to output cost when isReasoningModel is true; flag reasoning models in the UI as 'High Cost' and require explicit opt-in.

Medium risk

Tool-Use Cost Uncapped

experimental_activeTools array in app/(chat)/api/chat/route.ts:237 includes 5 tools with stepCountIs(5) as the only limit. Each tool call adds 1500-2000 tokens of round-trip context; worst case is 5 steps × 2 tools/step × 1800 tokens = 18k input tokens = $0.018/turn using Claude Haiku 4.5 uncached. This is 3× the base chat cost.

How to fix: Add max_tool_calls: 3 to streamText options or use tool_choice: 'required' to force single-tool responses. Monitor tool call frequency in production and gate tool-heavy users behind paid tier.

Medium risk

Vercel Pro Tier Assumption Unverified

Hosting cost set to $20/mo (Vercel Pro) based on detecting multiple environments, Vercel Functions, and production-grade setup (detected via @vercel/functions, @vercel/blob, @vercel/otel in package.json). However, no crons block found in vercel.json and no explicit production domain or org config detected—user might still be on Hobby tier ($0/mo) if deploying personal projects.

How to fix: Verify current Vercel tier via usage dashboard. If MAU < 10k and no team members, stay on Hobby. If crons or advanced functions are required, Pro is mandatory.

Low risk

Stripe Fee Assumption

Payment processing fees computed at 2.9% + $0.30 (Stripe US Standard) per transaction. If user base is European or Latin American, actual fee may be 1.5%-3.6% + currency-local fixed fee. Cost breakdown assumes US-based users; margin may be 0.5-1.5% higher or lower depending on geography.

How to fix: Segment users by billing country in your DB; apply region-specific Stripe fees when computing margin. For EU users, fee is 1.5% + €0.25.

Low risk

Database Cost Missing From Breakdown

Drizzle ORM detected (package.json:43, lib/db/migrate.ts:10) but no database service detected in pre_computed_tech_stack. POSTGRES_URL and REDIS_URL env vars present but no corresponding Neon/Supabase/PlanetScale/Upstash entries. Database cost likely $0 if using Vercel Postgres free tier (256MB) or Supabase free (500MB), but this caps at 10k MAU.

How to fix: Audit POSTGRES_URL provider; if Vercel Postgres, note that free tier is 256MB and pauses after 7 days inactivity—upgrade to Neon Launch ($19/mo) or Supabase Pro ($25/mo) when hitting 10k MAU or requiring always-on availability.

What if things change

Cost-per-user delta if assumptions shift

10x Heavy Users

10% of users generate 100 actions/day (10× median); assume lognormal tail with P95 multiplier already capturing some of this. If top decile hits 100 actions/day, their COGS is $18.90/mo—eating 95% of $19.99 revenue. Blended cost at 1k users: +$1.69/user/mo.

+$1.69 / user / mo

Reasoning Model Upgrade

User switches from Claude Haiku 4.5 to o1-preview or Claude Opus thinking mode; reasoning token overhead 3× on output. Output cost goes from $0.006/turn to $0.018/turn; assuming 50% of actions use reasoning model, +$0.0036/action → +$1.08/mo/user at 10 actions/day.

+$1.08 / user / mo

Free Tier Abuse Doubles Actions

Guest users exploit rate limit (3 messages/hour = 72/day theoretical max) by rotating IPs or creating burner accounts; actual free tier usage doubles to 10 actions/day from assumed 5. Free tier COGS goes from $0.95/mo to $1.89/mo—still under $3 threshold, but conversion pressure increases.

+$0.95 / user / mo

Pricing Model Price Increase 25%

Anthropic raises Claude Haiku 4.5 input cost from $1/M to $1.25/M tokens (25% hike, mirroring GPT-4o mini → GPT-4.5 pricing shift). Per-action cost increases $0.00146 → +$1.31/mo at 10 actions/day × 30.

+$1.31 / user / mo

Output Token Drift 2x

System prompt changes or user behavior shift causes output length to double from 1200 → 2400 tokens/turn. Output cost per turn goes from $0.006 to $0.012; +$1.80/mo at 10 actions/day.

+$1.80 / user / mo

Tool-Use Frequency Doubles

Tool-use adoption increases from 20% of chats to 40% (users discover createDocument/editDocument features); tool round-trips add 0.0006/action → +$0.54/mo.

+$0.54 / user / mo

What this report can't tell you

—Static analysis cannot estimate the relative frequency of chat vs. tool-use actions—assumed 80/20 split for cost allocation.
—System prompt size from runtime config (systemPrompt({ requestHints, supportsTools })) is unknown; assumed 2000 tokens based on typical chat app prompts.
—Reasoning token cost is approximated at 30% output overhead; actual cost depends on model and may be 2-5× if using o1/o3-class models.
—Per-feature usage split is inferred from UI signals; actual user behavior (e.g., 90% chat, 5% code, 5% suggestions) may differ significantly.
—Cache hit rate for prompt caching (if implemented) is unverified—assumed 0% hit rate (worst case) for cost estimates.
—Tool call frequency per chat turn is assumed at 20%; actual rate depends on user behavior and tool discoverability in UI.
—Database and Redis costs assumed free tier ($0/mo); upgrade costs ($19-25/mo for DB, $10/mo for Redis) not included in headline COGS but flagged in risk section.

Lifecycle

Code received2026-05-11 00:00 UTCt+0.0s
git clone --depth 1, into ephemeral worker tmpdir
Code extracted2026-05-11 00:00 UTCt+3.0s
Secrets purged · 47 files / 1.2 MB kept for analysis
Report generated2026-05-11 00:02 UTCt+2m 25s
audit_report.json written to scans table (no source code)
Code wiped2026-05-11 00:02 UTCt+2m 30s
shutil.rmtree(workdir) in finally block · 0 files remaining

Total time your source existed on our infrastructure: 2 minutes 30 seconds

Input digest

Source	github.com/preprice/sample @ HEAD a3f291c
SHA-256	`a3f291c5e8b2d4f7a1c9e6b8d3f5a2e7c1b4d6f8`
File count	47 files analyzed
Byte count	1.2 MB (after secrets purge, before chunking)
Storage region	us-east-1

Verify yourself

Run this against your local copy and confirm the value matches our SHA-256 above:

git rev-parse HEAD

What we kept · What we wiped

Kept (in our database)

audit_report.json

Findings, costs, fix prompts. No source code. No code snippets.

Wiped from our infrastructure

Repository contents	47 files / 1.2 MB. Worker tmpdir `rmtree`'d in finally block at 2026-05-11 00:02 UTC.
Env / credential files	.env, .env.local, .env.production, credentials.json, secrets.yml, service-account.json, firebase-adminsdk.json — all purged BEFORE analysis ran.
Private keys + auth tokens	id_rsa, id_ed25519, all SSH keys, .npmrc, .yarnrc — purged BEFORE analysis ran. Stack inferred from package.json, imports, and hosting config.
Per-driver code snippets	Stripped from audit_report before persistence (server.py:443-446). The file:line reference survives so the report can render a 'See the code' placeholder; the literal snippet does not.

Verification

This scan never wrote source code to our database. Enforced by RLS policy scans_select_own on table scans (migration 007).

What that means: even an authenticated user with a stolen anon JWT can only read their own scans rows. There is no row in any table that contains your source. Code snippets that the synth step might have embedded are stripped before persistence (server.py:443-446) so the JSON we keep is structurally incapable of holding your code.

How we handle your data

Scan ID	`sample`
Report generated	2026-05-11
Used to train AI?	Never. Anthropic (our AI provider) operates under a zero data retention and no-training contract for API customers.
What we logged	File paths and line numbers only. Never the code itself.
Who can see this report	Only you, when signed in. PrePrice staff do not review your scan unless you email support with the scan ID and ask us to.
Verifiable cost numbers	Every dollar figure in this report links back to the vendor's public pricing page (Anthropic, Vercel, Stripe, and so on). Check our math.
Download receipt
Delete everything	Delete my account and every scan I've run →

Features we found · 7

Chat Streaming$1.35/ user / mo

Tool-Use Agentic Loop$0.18/ user / mo

Chat Title Generation$0.03/ user / mo

Writing Suggestions$0.19/ user / mo

Code Artifact Execution$0/ user / mo

Text Artifact Editor$0/ user / mo

Sheet Artifact$0/ user / mo

Uncached System Prompt In Chat Streaming

Tool-Use Agent Loop Amplification

Uncached Suggestions System Prompt

Unknown Model Pricing Defaulted