Here is something most Claude Code users do not realize: even though output tokens cost more per token, input tokens dominate your total bill by sheer volume. In a typical session, input tokens outnumber output tokens 5-10x. And the biggest source of input tokens is not the code you ask Claude to read. It is the context window filling up with things you did not explicitly ask for.
In a typical 20-message Claude Code session, over 70% of your total token volume is input. That includes your CLAUDE.md loaded on every message, the growing conversation history, file contents from previous reads still in context, and tool definitions you never use.
The good news: most of this waste is fixable. Here are seven specific changes that reduce context window costs, based on real data from 30 days of continuous Claude Code use.
1. Trim your CLAUDE.md to under 500 words
Your CLAUDE.md loads on every single message in every session. If it is 2,000 words (roughly 3,000 tokens), and you send 15 messages in a session across 10 sessions per day, that is 450,000 tokens per day just for your project instructions.
Most CLAUDE.md files grow over time. They accumulate conventions, debugging notes, architectural decisions, and workflow instructions that seemed important once. Audit yours. Move reference material to separate files. Keep the CLAUDE.md to critical instructions only.
A lean CLAUDE.md of 300-500 words (450-750 tokens) saves you 60-75% of context loading costs compared to a 2,000-word version.
2. Use tiered context loading
Instead of dumping everything into CLAUDE.md, create an index file that points to detailed reference files. Tell Claude to load them only when relevant:
# CLAUDE.md (lean version)
## Architecture
See `docs/ARCHITECTURE.md` for full details.
## Conventions
See `docs/CONVENTIONS.md` for coding standards.
## API Reference
See `docs/API.md` when working on API endpoints.
Claude reads the 50-word CLAUDE.md on every message. It only reads ARCHITECTURE.md when it needs to understand the architecture. The savings compound across every message.
3. Keep sessions under 15 messages
This is the single highest-impact change you can make. Here is why:
Every message in a Claude Code session includes the full conversation history. Message 1 carries only the initial context. Message 10 carries messages 1-9 plus their responses. Message 20 carries everything from the entire session.
The token cost per message grows linearly, which means total session cost grows quadratically. A 20-message session does not cost twice as much as a 10-message session. It costs roughly four times as much.
When you finish a logical unit of work (one bug fixed, one feature added, one test passing), start a new session. The context reset is free. The productivity hit is minimal because you start the new session with a clear, specific task.
4. Use /compact before long sessions
If you need a longer session, use Claude Code's /compact command at message 10-12. This summarizes the conversation so far into a shorter representation, reducing the context that gets carried into subsequent messages.
In practice, /compact reduces the carried context by 50-70%. A session that would have cost 200K tokens for messages 12-20 drops to 80-120K tokens. The trade-off is that some nuance from earlier messages gets lost in the summary, but for most coding tasks the summary retains what matters.
5. Be specific about which files to read
When you say "fix the bug in the auth module," Claude might read every file in the auth directory looking for the problem. If that directory has 15 files averaging 300 lines each, you just loaded 45,000 tokens of context.
Instead: "fix the bug in src/auth/middleware.ts, line 47 throws when the session cookie is missing." Claude reads one file, jumps to the relevant section, and fixes it. Total context loaded: 3,000 tokens.
Being specific about file paths and line numbers is not just faster. It is 10-15x cheaper in tokens.
6. Route simple tasks to cheaper models
Claude Code defaults to Opus for reasoning-heavy tasks. Opus is more expensive per token than Sonnet or Haiku. For tasks that do not require complex reasoning (formatting, simple edits, file lookups, running commands), routing to a cheaper model saves both tokens and rate limit quota.
If you use automated scripts or structured workflows with Claude Code, you can specify a cheaper model for mechanical tasks. Some setups support a model flag or configuration that routes simple jobs to Sonnet or Haiku automatically. This is especially effective for automated workflows that run without you watching.
7. Clean up your rules directory
Claude Code loads .claude/rules/ files into context alongside CLAUDE.md. Every rule file adds to the per-message token cost. If you have six rule files averaging 200 words each, that is 1,200 words (1,800 tokens) loaded on every message.
Audit your rules. Merge related ones. Delete rules for behavior Claude already follows by default. Move conditional rules (only relevant for specific tasks) into skill-specific files that load on demand, not globally.
Measuring the impact
The challenge with all of these optimizations is knowing whether they worked. Claude Code does not show you a live token counter or a before-and-after comparison.
What it does give you is JSONL session logs. Every session writes a detailed log file that includes token counts per message, model used, and tool calls. The data is there. You just need a way to read it.
Our free cost analyzer parses those logs and shows you exactly where tokens go. Per-session breakdowns, input vs output splits, cache hit rates, and waste detection. Run it before you optimize and after. The numbers tell you what worked.
The compound effect
None of these changes is dramatic on its own. Trimming CLAUDE.md saves 15%. Shorter sessions save 20%. Specific file reads save 10% per instance. But they compound. Apply all seven and you can realistically cut your token usage by 40-60%.
That might mean your current plan has plenty of headroom when you thought you needed an upgrade. Or it might mean your Max 5x plan feels like Max 10x. Either way, you are getting more work done per dollar.
Start by analyzing your current usage. You cannot optimize what you cannot see.
You might also like
Want to build your own AI OS?
The AI OS Blueprint gives you the complete system: 53-page playbook, working skills, and a clonable repo. Starting at $47.
30-day money-back guarantee. No subscription.