Claude Code session management: how to keep long sessions productive
There is a point in a long Claude Code session where you can feel things getting worse. Responses slow down slightly. Claude starts hedging on things it was confident about earlier. It references something from the beginning of the session as if reminding itself. Then you notice the token counter and realize you have spent $4 on a session that produced three working functions.
This is not a model problem. It is a session management problem. Claude Code is a stateful tool and the state grows with every message. If you do not manage it actively, your sessions become progressively less efficient and more expensive as they run. The fix is not to use Claude less. It is to structure how you use it.
This guide covers the mechanics of how sessions work, the signs that a session is getting stale, and the specific changes that keep long-running work productive without burning through your quota.
How the context window actually works in practice
Every message you send to Claude Code includes the full conversation history from the current session. Message 1 carries your initial context. Message 15 carries messages 1 through 14, all their responses, every file Claude read, every tool call and result, and your CLAUDE.md loaded fresh on top.
This is why session cost grows non-linearly. The first few messages are cheap. By message 20, each message is carrying the weight of everything that came before it. A session that runs to 25 messages does not cost 25x the first message. It costs significantly more because each message is larger than the last.
The context window also has a hard limit. When you hit it, the oldest content gets dropped to make room for new content. For most sessions this means older parts of the conversation disappear. But in a long session with a large CLAUDE.md, parts of your project instructions can fall out of the window too. This is why Claude sometimes stops following rules it was following earlier in the same session. The rules are still in your file. They are just no longer in the active context.
Two numbers matter most: the size of your CLAUDE.md (loaded on every message) and the length of your conversation history (grows with every exchange). Managing both is the foundation of session hygiene.
Signs your session is getting stale
These are the concrete signals that a session has gotten too long and is working against you:
- Claude forgets a convention it was following earlier. You added a rule to CLAUDE.md, it worked for the first several messages, then Claude started doing the old thing again. Context overflow, not a model regression.
- Responses reference the wrong state. Claude mentions a file path or variable name that was true 15 messages ago but has since changed. It is working from stale history, not the current reality.
- Increasing hedging on straightforward questions. Claude starts giving qualified answers to questions it answered directly at the start of the session. This often means the earlier, confident context has been compressed or dropped.
- Noticeably slower response times. Large context windows take longer to process. If responses were fast at the start of a session and have gotten slower, the session is getting heavy.
- The same file gets re-read multiple times. Claude reads a file, you do some work, then a few messages later it reads the same file again. It no longer has the earlier read in reliable context.
Any one of these is a signal to act. More than two is a signal to restart.
When to restart vs when to use /compact
These are two different tools for two different situations.
Use /compact when you are mid-task and cannot afford to lose the current thread. Maybe you are debugging a complex problem and Claude has been accumulating context across several diagnostic steps. Starting fresh would mean reconstructing that context. Running /compact at this point summarizes the conversation into a shorter representation, reducing what gets carried into each subsequent message by 50-70%, while keeping the key facts from the session alive.
The right time to run /compact is before the session gets bloated, not after it has already degraded. Message 10-12 is a good target for most sessions. By that point there is enough history to summarize meaningfully, but not so much that the summary loses important detail.
/compact Restart the session when you have completed a logical unit of work. One bug fixed, one feature merged, one refactor done. A fresh session reloads your CLAUDE.md from the top, starts with no conversation history, and costs you almost nothing per message for the first several exchanges. The productivity hit of a session restart is close to zero if you have tracked your next steps clearly.
The key distinction: /compact extends a session efficiently when you are in the middle of something. A restart gives you a clean start when you are moving to the next thing. Do not use /compact as a substitute for restarting when a logical stopping point exists.
The CLAUDE.md size and session cost tradeoff
Your CLAUDE.md is the single highest fixed cost in every session. It loads on every message regardless of whether the message needs those instructions. A 2,000-word CLAUDE.md costs roughly 3,000 tokens per message. In a 15-message session, that is 45,000 tokens before Claude touches a single file.
Most CLAUDE.md files grow in one direction: bigger. Every debugging session adds a new note. Every code review adds a new convention. Every architectural decision adds a new section. Within six months, a CLAUDE.md that started at 200 words is often sitting at 1,500-2,000 words, most of which Claude did not need today.
The fix is tiered loading. Keep CLAUDE.md to critical runtime instructions only. Move reference material to separate files and tell Claude to load them on demand:
# CLAUDE.md (trimmed)
## Architecture
Monorepo. See docs/ARCHITECTURE.md before working on cross-package concerns.
## Conventions
See docs/CONVENTIONS.md for coding standards and naming rules.
## Testing
Run: npx vitest
See docs/TESTING.md for framework conventions and folder structure.
## Guardrails
- Never delete files. Move to .trash/ using mv.
- Never push to main directly. Always branch.
This version loads in under 200 tokens per message. Claude reads ARCHITECTURE.md when it is working on something architectural. It reads CONVENTIONS.md when writing new code. Most messages do not need either, so most messages do not pay for them.
If you want a quick read on which sections of your CLAUDE.md are essential and which are just adding token weight, the ContextKit Analyzer scores your file and flags sections that are bloating it without meaningfully improving Claude's behavior. Paste your file and look at what is in there that Claude would follow by default anyway.
Structuring work across multiple sessions
The biggest productivity loss in multi-session work is the cost of reconstructing context at the start of each session. If you spend the first 3-4 messages of every session reminding Claude what you were doing, what state the code is in, and what the next step is, you are burning 20-30% of your session budget on orientation.
The solution is a session handoff file. At the end of each session, write a brief state dump that the next session can load in a single message:
# session-state.md
Last updated: 2026-04-12
## Current branch
feature/auth-refactor
## Last completed
Extracted token validation into src/lib/auth/tokens.ts.
All tests passing on this branch.
## Next steps
1. Add refresh token rotation to tokens.ts
2. Update AuthContext to use new token helpers
3. Write integration test for token refresh flow
## Open questions
- Should refresh tokens be stored in httpOnly cookies or localStorage?
(Leaning cookies, need to check mobile client constraints)
## Files in play
- src/lib/auth/tokens.ts (main work file)
- src/contexts/AuthContext.tsx (update after tokens.ts is stable)
- src/lib/auth/tokens.test.ts (keep in sync)
Start the next session with: "Read session-state.md and pick up from next steps." One message, full context. Claude knows exactly where it is and what comes next.
You can also use the TodoWrite tool to manage the task list directly within Claude Code rather than maintaining a separate markdown file. The output persists across sessions as long as you reference it at the start of each new one.
Session cost awareness
Claude Code does not show you a live cost meter while you work. You see the bill after. This disconnect makes it easy to run expensive sessions without realizing it until you check your usage.
The main levers that drive session cost up unexpectedly:
- Reading large files repeatedly. Every file read pulls the full file content into context. If Claude reads a 500-line file at message 3, that 500 lines is part of the history carried in messages 4 through 20. If it reads the file again at message 15, you have paid for it twice and it is now in context twice.
- Running subagents in a session that is already large. Subagents inherit context from the parent session. A subagent spawned at message 20 of a bloated session starts with all that weight. Either spawn subagents early or start fresh before spawning.
- Exploratory conversations with no clear end state. "Let me show you the codebase and you tell me what to do" sessions run long and produce little. They are expensive for what they deliver. Come in with a specific task.
- Leaving a session running "just in case." A session left open with accumulated history costs money on every message, including follow-up questions that could have been a fresh session for a fraction of the price.
Knowing which of your sessions are actually expensive requires reading the session logs. Claude Code writes detailed JSONL logs for every session, including token counts per message and model used. The CostPilot analyzer parses those logs and shows you your most expensive sessions, where tokens went within each session, and how your costs trend over time. If you are on Max and wondering where your quota goes, the logs will tell you.
A practical session workflow
Put these pieces together and you get a session pattern that keeps costs predictable and productivity high:
# Start of session
"Read session-state.md. The next task is [specific task from next steps]."
# During session
- One logical task per session where possible
- Run /compact at message 10-12 if the session needs to continue
- Be specific about files: "edit src/lib/tokens.ts" not "edit the auth module"
# End of session
"Update session-state.md: mark [task] complete, set next steps to [X, Y, Z],
note any open questions."
The whole pattern adds less than two messages per session. The overhead is minimal. What it prevents is the expensive, directionless session that costs four times as much as it should.
The one thing that moves the needle most
If you do nothing else from this guide, do this: treat session restarts as a feature, not a concession. The instinct is to keep a session open as long as possible because starting fresh feels like losing progress. It is not. Context is not progress. Code committed to your branch is progress.
A focused 10-message session on a specific task with a clear starting state is faster, cheaper, and produces better output than a 30-message exploratory session. The math on this is straightforward once you have looked at your actual session logs.
If you have not looked at those logs yet, start there. Run the CostPilot analyzer against your recent sessions and see which ones were actually expensive and why. The pattern becomes obvious quickly, and the fix takes less than an hour to put in place.
You might also like
Want to build your own AI OS?
The AI OS Blueprint gives you the complete system: 53-page playbook, working skills, and a clonable repo. Starting at $47.
30-day money-back guarantee. No subscription.