Nova Labs is currently on pause. New product purchases are unavailable. The blog remains live as an archive of the experiment.
Back to blog

How much does Claude Code cost per month? Real production numbers

April 5, 2026 9 min read

The short answer: it depends. Claude Code pricing is token-based, so your monthly cost is a function of how many sessions you run, how long your context windows get, and which model you use. That makes it hard to estimate before you start.

The longer answer requires real data. We have been running Claude Code on production business workflows for 30 days straight. Scheduled tasks, blog writing, code generation, email processing, API integrations. Here is what the actual bill looks like.

The pricing structure

Claude Code offers two paths: pay-per-token via the API, or a flat monthly subscription via Claude Max.

  • API pricing: You pay per input and output token. Opus (the most capable model) runs about $15 per million input tokens and $75 per million output tokens. Sonnet is cheaper at $3/$15. Haiku is cheapest at roughly $0.25/$1.25.
  • Claude Max: A flat subscription ($100-200/month depending on tier) that includes generous usage limits. No per-token billing, but you hit rate limits under heavy use.

Most people starting with Claude Code for business use the API. It feels cheaper at first. Whether that stays true depends on volume.

What a real production month looks like

Our setup runs Claude Code on a heartbeat schedule. It checks tasks every few minutes, processes emails twice a day, writes and publishes blog posts, manages a sales pipeline, monitors ad campaigns, and handles customer support. That is heavy use by any measure.

Here is the breakdown from a typical 30-day window:

  • Total sessions: 400+
  • Total tokens processed: ~80 million input, ~15 million output
  • Model split: 60% Opus (complex reasoning), 30% Sonnet (routine tasks), 10% Haiku (simple lookups)
  • Estimated API cost at list rates: $1,800-2,200
  • Actual cost on Claude Max: $200/month flat

The difference is dramatic. For light use (a few sessions per day, short context), API pricing wins. For production workloads, Max pays for itself within the first week.

Where the tokens actually go

Most people assume output tokens (the text Claude generates) are the expensive part. In practice, input tokens dominate the bill. Every time Claude Code reads a file, loads context, or processes a long conversation, that is input tokens.

The three biggest token sinks we found:

1. Context loading on every session

Claude Code reads your CLAUDE.md, project rules, and relevant files at the start of every session. If your context files are large, you pay for those tokens every single time. A 5,000-token context file loaded across 400 sessions is 2 million input tokens just for setup.

2. Large file reads

Reading an entire 2,000-line source file costs more than reading the 50 lines you actually need. Claude Code can read specific line ranges, but many prompts trigger full file reads. This adds up fast in codebases with large files.

3. Long conversations without breaks

Every message in a conversation includes the full history as context. A conversation with 20 back-and-forth exchanges means message 20 includes all 19 previous messages as input. Starting fresh sessions for distinct tasks is cheaper than one long conversation.

The hidden cost: cache misses

Anthropic offers prompt caching, which reduces costs for repeated context. When your context hits the cache, you pay a fraction of the full input price. When it misses, you pay full rate.

Cache misses happen when your context changes between requests, when sessions are spaced too far apart, or when your context is too small to qualify for caching. In our production setup, the cache hit rate averages around 70%. The other 30% pays full price.

If your cache hit rate drops below 50%, your effective cost per session nearly doubles compared to what you might expect from the cached price.

Check your own numbers

Claude Code stores detailed usage logs locally in JSONL format. The data is there. The problem is that there is no built-in dashboard to read it.

We built a free cost analyzer that reads your local Claude Code usage files and breaks down your spending by session, model, and project. It runs entirely in your browser. No data leaves your machine. You upload the JSONL files, and it shows you exactly where your tokens are going.

What it shows you:

  • Total cost estimate based on current API pricing
  • Model breakdown so you can see how much Opus vs Sonnet vs Haiku you are using
  • Session-level detail so you can identify which workflows cost the most
  • Cache hit rate so you know if you are getting the caching discount
  • Waste detection that flags sessions with unusually high token counts

Most people who run the analyzer discover that 2-3 specific workflows account for 60-80% of their total spend. Fixing those workflows often cuts the bill in half.

Five ways to reduce your Claude Code costs

1. Route simple tasks to cheaper models

Not every task needs Opus. File lookups, simple formatting, status checks, and template generation work fine on Sonnet or Haiku at a fraction of the cost. If you are using an AI OS or skill system, you can set the model per task in configuration rather than defaulting everything to Opus.

2. Keep context files lean

Your CLAUDE.md and project rules load on every session. If they contain 10,000 tokens of instructions, that is 10,000 tokens billed every time. Move rarely-used context into separate files that load on demand. Use tiered loading: core rules always, domain knowledge only when relevant.

3. Use targeted file reads

Instead of reading entire files, specify line ranges when you know which section you need. Read file.py lines 50-80 costs a fraction of reading all 500 lines. This is especially important for files you read frequently.

4. Break long conversations into sessions

Each message replay costs input tokens. A 30-message conversation means message 30 includes ~29 messages of context. If you are switching topics or starting a new task, start a fresh session instead of continuing the same one.

5. Monitor and iterate

Run the cost analyzer weekly. Look for sessions that cost 5-10x the average. Those are your optimization targets. Often it is a single misconfigured workflow or an unexpectedly large file read that drives the spike.

When to switch to Claude Max

The crossover point depends on your usage, but as a rough guide: if your estimated API bill exceeds $150-200 per month, Claude Max is probably cheaper. For production workloads with scheduled tasks, the crossover usually happens within the first two weeks.

The trade-off is rate limits. Max does not bill per token, but it caps how many requests you can make per time window. For batch processing or high-frequency scheduled tasks, you may hit those limits. For most business automation use cases, the limits are generous enough.

The real cost of not tracking

The most expensive Claude Code setup is one you never audit. Token costs feel small per request, so they do not trigger the same scrutiny as a $500 SaaS subscription. But a poorly optimized setup can easily cost $500-1,000 per month on API pricing without any obvious signal that something is wrong.

Run the free analyzer. It takes five minutes. At minimum, you will know what you are spending. At best, you will find the two or three changes that cut your bill in half.

If you want ongoing monitoring with automated alerts when spending spikes, the Pro version of CostPilot is in development. You can join the waitlist to get early access.

Want to build your own AI OS?

The AI OS Blueprint gives you the complete system: 53-page playbook, working skills, and a clonable repo. Starting at $47.

30-day money-back guarantee. No subscription.