How to analyze your Claude Code usage logs and find where tokens are wasted
Claude Code generates detailed logs of every session. Token counts, model selections, tool calls, timestamps, response lengths. The data is there. The problem is that nobody tells you where to find it or how to read it.
If you are trying to understand why your costs are high or why you keep hitting rate limits, the logs have the answer. Here is how to get to it.
Where Claude Code stores session logs
Claude Code writes session data to your local filesystem. The exact location depends on your operating system:
- macOS:
~/.claude/projects/contains per-project session data - Linux: Same path,
~/.claude/projects/ - Windows (WSL): Inside your WSL home directory at the same path
Inside each project directory, you will find JSONL files (JSON Lines format) that contain the raw session data. Each line is a JSON object representing one message, tool call, or system event.
A single active day of Claude Code use can generate 10-50MB of JSONL data. If you have been using Claude Code for a month, you might be sitting on gigabytes of usage data that you have never looked at.
What the logs contain
Each JSONL entry typically includes:
- Message role: Whether this was a user message, assistant response, or system prompt
- Token counts: Input tokens and output tokens for each exchange
- Model used: Which Claude model handled the request (Opus, Sonnet, Haiku)
- Tool calls: Every file read, grep, bash command, and edit, with the content that was sent and returned
- Timestamps: When each message was sent and how long the response took
- Cache information: Whether cached content was used and cache hit/miss ratios
This is enough data to reconstruct exactly where every token went. The challenge is that raw JSONL is not human-readable at scale. A single session can be thousands of lines.
What to look for in your usage data
Not all token usage is equal. Some is necessary, some is waste. Here are the patterns that matter:
Context loading ratio
Compare how many tokens go to loading context files (CLAUDE.md, rules, project files) versus actual work (code generation, analysis, edits). If context loading exceeds 30% of your total input tokens, your configuration files are too heavy.
Fix: move reference material out of CLAUDE.md into separate files. Use an index file that points to detailed docs so Claude only loads what the current task needs.
Session length distribution
Plot how many tokens each session uses. You will likely find that a small number of long sessions account for a disproportionate share of total usage. In our data, the longest 10% of sessions consumed 45% of all tokens.
Fix: break long sessions into shorter ones. After 10-15 exchanges, start fresh. The context reload cost is almost always less than the compounding cost of a long conversation.
Cache hit rates
Claude Code uses prompt caching to avoid resending unchanged content. If your cache hit rate is below 50%, you are paying full price for content that has not changed between messages. Common causes: frequently modified context files, or sessions that restart too often without cache warmup.
Model selection patterns
Check which model handles which types of requests. If Opus is processing simple file reads or formatting tasks, that is expensive overkill. On API billing, Opus can cost 20x or more per token than Haiku for the same task.
Failed or retried requests
Look for sequences where the same request appears multiple times. Retries burn tokens without producing value. Common causes: rate limit errors that trigger automatic retries, or tool calls that fail and get re-attempted with slightly different parameters.
Analyzing logs manually (the hard way)
If you want to parse the data yourself, here is a minimal approach:
# Find your session files
find ~/.claude/projects -name "*.jsonl" -mtime -7
# Count tokens per session (rough)
cat session.jsonl | python3 -c "
import sys, json
total_in, total_out = 0, 0
for line in sys.stdin:
try:
obj = json.loads(line)
usage = obj.get('usage', {})
total_in += usage.get('input_tokens', 0)
total_out += usage.get('output_tokens', 0)
except: pass
print(f'Input: {total_in:,} tokens')
print(f'Output: {total_out:,} tokens')
" This gives you a rough total but misses the nuance. You need per-message breakdowns, model attribution, cache analysis, and session-over-session trends to make real optimization decisions.
Analyzing logs automatically (the easy way)
We built CostPilot specifically for this problem. Drop your JSONL file into the browser and get:
- Token breakdown per session and per conversation
- Model usage split (how much goes to Opus vs Sonnet vs Haiku)
- Cache hit/miss analysis
- Waste detection (oversized contexts, retry loops, redundant reads)
- Cost estimation at current API rates
It runs 100% in your browser. The JSONL file never leaves your machine. No account needed, no data collection. We built it because we needed it ourselves and figured others would too.
What to do with the insights
Once you can see your usage patterns, the optimization path is usually obvious:
- Trim context files if context loading exceeds 30% of input tokens
- Shorten sessions if your longest sessions dominate total usage
- Route to cheaper models if Opus is handling simple tasks
- Fix cache misses if your hit rate is below 50%
- Investigate retries if you see repeated identical requests
Most users find one or two changes that cut their usage by 30-50%. The data tells you which changes matter for your specific workflow.
The logs are already on your machine. The only question is whether you look at them.
You might also like
Want to build your own AI OS?
The AI OS Blueprint gives you the complete system: 53-page playbook, working skills, and a clonable repo. Starting at $47.
30-day money-back guarantee. No subscription.