Nova Labs is currently on pause. New product purchases are unavailable. The blog remains live as an archive of the experiment.
Back to blog

How to track and reduce your Claude Code costs (the complete guide)

April 5, 2026 10 min read

Claude Code does not have a built-in cost dashboard. There is no monthly summary email, no spending alert, no bar chart showing which sessions burned the most tokens. You find out what you spent when the bill arrives.

For light users this is fine. For anyone running Claude Code on real workflows, with scheduled tasks, large codebases, and long context windows, the bill can surprise you. This guide covers exactly how to find your usage data, what you are actually paying for, and five concrete changes that reduce costs without breaking the workflows you have built.

Why Claude Code costs spiral

Token costs seem small per request. The problem is volume and context size, not any single interaction.

Token drain from large contexts

Every time Claude reads a file, that file's contents go into the context window as input tokens. If you have a CLAUDE.md with 500 lines, a memory file, a skill definition, and two or three reference files, you might be sending 8,000 tokens of context before you even ask your question. Multiply that by 50 sessions per day on a scheduled workflow and the input token count compounds fast.

Cache misses

Claude's prompt caching can dramatically reduce costs on repeated context, but only when the cached portion is stable. If your context files change frequently, or if you are running short enough sessions that the cache expires between runs, you pay full input token prices on every call. Cache reads cost roughly 10x less than cache writes. A workflow that should cost $0.10 can cost $1.00 if you are constantly missing cache.

Output token inflation

Output tokens cost more than input tokens on most Claude models. If your prompts are open-ended ("write me a thorough analysis of..."), Claude will write thoroughly. Specific prompts with defined output constraints produce shorter, more useful responses and cost less. The quality often goes up too, because the model is not padding.

Agentic loops

When Claude runs multi-step tasks, each step in the loop is a new API call with a new context. A workflow that calls Claude ten times with 10,000 tokens each costs ten times more than one that calls it twice. Poorly structured skills that ask Claude to plan, then execute, then review, then rewrite each add cost without proportional value.

How to check your actual usage

There are three places to look, depending on how you are running Claude Code.

The JSONL files on your machine

Claude Code logs every conversation to local JSONL files. On macOS and Linux they live at ~/.claude/projects/. Each project gets its own directory, and each conversation is a separate file.

Each entry in the JSONL includes the model used, the number of input tokens, output tokens, and cache read and write tokens. You can parse these files with any script. A simple Python script that sums input_tokens, output_tokens, cache_creation_input_tokens, and cache_read_input_tokens across all files gives you the full picture.

If you want to skip writing the script, the free CostPilot analyzer reads your local JSONL files and breaks down your spending by session, model, and project. You point it at your ~/.claude/projects/ folder and it does the rest.

The Anthropic Console

If you are on a paid API plan (not Claude Max), you can see usage in the Anthropic Console under Usage. It shows daily token counts by model but does not break down by individual prompt or project. Useful for month-over-month comparisons, not for debugging a specific expensive workflow.

The Admin API

If you are running Claude Code on behalf of a team using the API, Anthropic's Admin API has a usage endpoint. You can query it programmatically to pull daily usage by workspace or API key. The endpoint returns token counts broken down by type, which is useful for building your own internal dashboards.

The pricing model, explained clearly

Claude Code uses Anthropic's API pricing. The model you are using matters a lot, and so does cache behavior.

As of early 2026, Claude Sonnet 3.7 is the default model in Claude Code. Pricing is per million tokens:

  • Input tokens: $3.00 per million
  • Output tokens: $15.00 per million
  • Cache write tokens: $3.75 per million (costs slightly more than a standard input)
  • Cache read tokens: $0.30 per million (ten times cheaper than standard input)

The ratio between output and input costs is the most important thing to internalize. Output tokens cost five times more than input tokens. A response that is twice as long costs twice as much in output, but the input cost is the same. This is why output verbosity is such a significant cost lever.

Cache read tokens at $0.30/M are almost free compared to everything else. A workflow that successfully caches a 10,000-token context and reads it 100 times costs $0.30 total for those reads. Without caching, the same reads would cost $30.00. Building for cache efficiency is the single highest-leverage cost reduction strategy.

Claude Max subscribers pay a flat monthly fee instead of per-token. If you are spending more than roughly $100/month on API usage, Claude Max at $100/month likely saves you money. Track your monthly token spend for a week and extrapolate.

5 practical ways to reduce your Claude Code costs

1. Use tiered context loading instead of loading everything

The most common cost mistake: loading every context file at the start of every session. A voice guide, business context, ICP, memory file, and strategy doc might total 15,000 tokens. If only one task in ten actually needs all of that, you are wasting 90% of your context loading costs.

Instead, maintain an index file that describes what each context file contains and when it is relevant. Load only what the current task needs. A blog post task needs the voice guide. A pricing task needs the business context. A code task needs neither. One index read versus five full file reads saves thousands of tokens per session.

2. Keep system prompts and context stable to hit cache

Prompt caching only works when the beginning of the context is identical across requests. If your CLAUDE.md changes every day, or if your memory file gets appended to after every task (making it longer each time), you will miss cache on the portions that changed.

Separate stable content from dynamic content. Your business description, voice guide, and guardrails rarely change. Put those at the top of your context so they can cache. Your daily notes, recent decisions, and task-specific context change frequently. Load those separately at the end. The stable portions cache. The dynamic portions do not, but they are smaller.

3. Set explicit output constraints

Tell Claude what format and length the output should be. "Write a 200-word summary" costs a fraction of "write a thorough summary." Both can produce equally useful results. The difference is that one lets Claude decide how much to write, and it will often default to more.

For skills that run on a schedule, embed output constraints directly in the skill definition. A daily report skill should specify the exact sections and target length. A draft email skill should cap at 150 words. These constraints have the side effect of producing tighter, more usable outputs anyway.

4. Route tasks to cheaper models

Not every task needs Sonnet 3.7. Classification, formatting, extraction, and simple transformations work well on Claude Haiku at a fraction of the cost. Haiku input tokens cost $0.80/M versus $3.00/M for Sonnet. For high-volume routine tasks, that is a 3-4x cost reduction.

In your skill definitions, add a model preference. Simple tasks (categorize this email, format this data, extract these fields) get Haiku. Complex reasoning tasks (write a proposal, analyze this situation, debug this architecture) get Sonnet. The quality difference for simple tasks is negligible. The cost difference is not.

5. Audit your agentic workflows for unnecessary steps

Look at any workflow that calls Claude more than twice. Ask whether each step is adding value proportional to its cost. Common culprits:

  • Separate planning and execution steps that could be combined into one call
  • Review steps that repeat the full context when only the output needs reviewing
  • Confirmation steps that ask Claude to confirm what it just did, in full detail
  • Logging steps that use a full model call when a simple string append would do

A four-step workflow that becomes two steps cuts your call count and context repetition in half. The savings add up quickly on workflows that run daily.

What a realistic monthly spend looks like

For context: Nova Labs runs Claude Code for roughly 8-10 hours of active and scheduled work per day. Blog post writing, email triage, social media, daily reports, nightly learning tasks. Monthly API spend before optimization was running around $80-120. After applying the context loading and model routing changes above, it dropped to $30-50. The work output did not change.

Most of the savings came from two things: not loading all context files on every call, and routing the high-volume daily report and email classification tasks to Haiku. The cache efficiency improvements helped on blog writing tasks where the voice guide is large and stable.

Track it before you optimize it

You cannot cut costs you cannot see. Before trying any of the above, spend a week tracking your actual usage. Parse the JSONL files, look at which workflows are the most expensive, and identify whether the cost is coming from input volume, output verbosity, or cache misses.

The free CostPilot analyzer handles the JSONL parsing and gives you a breakdown by project and session. Point it at your Claude projects folder and you will have a clear picture of where the money is going within a few minutes.

If you want ongoing monitoring with usage alerts and per-workflow cost tracking, the Pro version is in development. You can join the Pro waitlist to get early access and be notified when it ships.

For everything else about running Claude Code as a real business tool, the free chapter of the AI OS Blueprint covers the full architecture: context engineering, skill design, scheduling, and the system structure that makes Claude Code reliable instead of expensive.

Want to build your own AI OS?

The AI OS Blueprint gives you the complete system: 53-page playbook, working skills, and a clonable repo. Starting at $47.

30-day money-back guarantee. No subscription.