Managing Context in Claude Code
How Claude Code's context window works, when to compact vs clear, and practical strategies for keeping sessions lean and productive.
This lesson is original educational writing based on this video by Anthropic (published May 18, 2026). All credit for the original content goes to the creators.
1. The context window is Claude’s working memory
Every interaction you have with Claude Code — every file it reads, every shell command it runs, every message you type — consumes space in the context window. Think of it as a finite whiteboard: the more you write on it, the less room you have to keep going. When the whiteboard fills up, something has to be erased.
This is not a bug or a limitation to route around. It is the fundamental architecture of how large language models work. Understanding it lets you make deliberate choices about what stays on the whiteboard and what gets cleared.
What fills the context window?
The context window fills with several categories of content, roughly in order of how fast they grow:
- Tool call results — when Claude searches for files, reads their contents, or runs commands,
the full output lands in context. A single
grepresult can be a few lines; a recursive file read of a large module can be thousands. - Conversation history — every message you send and every reply Claude generates is retained in full, going all the way back to the start of the session.
- System prompt and memory files —
CLAUDE.md, loaded MCP tool schemas, and any injected instructions all occupy context from the moment the session starts.
You can inspect this at any time by running /context inside a session. It shows the total
context used, a breakdown by category, and a visual indicator of how close you are to the limit.
The category breakdown is the useful part: it tells you where the space is going, which points
directly at where you should optimize.
2. Compact vs clear: the decision that matters most
When the context window gets full, Claude Code will automatically compact the session — it summarises what has happened and discards raw tool call output to free space. This happens silently in the background, but you can also trigger it intentionally. That intentional choice is where most of the practical value lies.
You have two commands:
| Command | What it does | When to reach for it |
|---|---|---|
/compact | Summarises the session history and compresses tool results, but retains the important decisions and conclusions | You are over the limit on a feature but want to keep going with memory of what was built |
/clear | Wipes the session entirely — fresh start with no history | You finished one feature and are starting something completely unrelated |
The bias risk of not clearing
This is the subtler point, and it is worth sitting with. When you /compact, Claude retains a
summary of what you were doing. That is exactly what you want when you are continuing the same
work. But if you are now working on a different feature, that summary becomes noise — and
worse, it can bias Claude’s decisions.
Imagine you spent the last hour refactoring a caching layer. The compacted summary mentions the caching module repeatedly and describes a particular architectural pattern you used there. You then ask Claude to add a new API endpoint. Claude has no malicious intent, but it has seen the caching layer dozens of times in the past hour and may subtly steer the new endpoint toward patterns it has been reinforcing — even when those patterns do not fit.
The rule is simple: same feature, going over limit → /compact; new feature, fresh task → /clear.
Check your understanding
3 questions · your answers are saved in this browser only
-
1. You have been building a new payment flow for 45 minutes. The session is nearing the context limit. You want to keep adding error handling to the same flow. What should you do?
-
2. You just finished implementing and testing a new authentication system. You now want to work on a completely unrelated CSV export feature. What is the best approach?
-
3. What does running /context tell you?
3. Writing prompts that don’t waste context
Here is a counterintuitive fact: a shorter, vaguer prompt often uses more context than a longer, specific one.
When you write a vague prompt like “fix the login bug”, Claude has to figure out what you mean. It does that by exploring: searching for files that might be relevant, reading several of them, running commands to understand the current state, and asking clarifying questions. Every one of those exploratory tool calls writes its output into the context window.
A specific two-sentence prompt that says which file, what the expected behaviour is, and what you have already tried lets Claude skip the exploration entirely and go straight to acting. The prompt itself is a few dozen tokens longer. The tool calls it avoids are thousands of tokens shorter.
What a vague vs specific prompt looks like in practice
Vague (triggers exploration):
Fix the login bug.
Claude’s response: searches the codebase for files containing “login”, reads five candidate files, runs the test suite to understand the current failure mode, reads the error output, then starts making changes. Six or more tool calls, each writing to context.
Specific (skips exploration):
In
src/auth/login.ts, thevalidateTokenfunction returnstruefor expired tokens because it checkstoken.expiry > Date.now()but should checktoken.expiry > Date.now() / 1000(the expiry is in seconds, not milliseconds). Fix the comparison.
Claude’s response: opens src/auth/login.ts, applies the fix, runs the relevant test. Two or
three tool calls.
The specific prompt is 3x longer in characters. It uses a fraction of the context.
The discipline required
Writing specific prompts requires you to do some upfront thinking — understanding the problem well enough to state it precisely. That thinking is not wasted effort. It is the work you would have had to do anyway, just shifted earlier. And it has a side benefit: if you cannot state the problem precisely, that is a signal you do not yet understand it well enough to fix it.
4. Structural strategies: MCP servers and sub-agents
The previous sections covered reactive management — what to do when context fills up. This section covers proactive design choices that keep sessions lean from the start.
Disable MCP servers you are not using
MCP servers are powerful: they give Claude access to browsers, databases, issue trackers, and more. But they have a cost that is easy to overlook: when an MCP server is connected, all of its available tools are loaded into the context window at the start of every session. A server with forty tools loads forty tool schemas into context before you have typed a single character.
If you have several MCP servers configured — say, a browser automation server, a database server, and a GitHub server — but you are working on a pure Python refactoring task, disconnect the browser and database servers for that session. You get that tool schema space back immediately.
An alternative worth knowing: skills (Claude Code’s built-in slash commands and prompt files) are loaded on demand, not all at once. If you can accomplish something with a skill rather than an MCP tool, you avoid the upfront context cost entirely.
Delegate point-lookups to sub-agents
Sub-agents are one of the most underused tools for context management. When you spawn a sub-agent, it gets its own completely separate context window. It does its work — reading files, running searches, gathering information — in that separate window. When it is done, it returns a summary to your main session. You get the answer; your context window absorbs a few sentences, not hundreds of lines of exploration.
This pattern is specifically valuable for point-lookups: questions that have a single definitive answer you need in order to continue, but where the process of finding that answer would pollute your main context.
Good candidates for sub-agent delegation:
- “Where is the authentication endpoint defined?”
- “What does the
UserProfiletype look like?” - “Which tests cover the payment module?”
- “What version of the ORM library is this project using?”
Each of these questions, if answered directly in the main session, would involve file reads and searches that write their full output to context. A sub-agent absorbs that exploration cost in its own window and hands you back a single sentence.
Check your understanding
1 question · your answers are saved in this browser only
-
1. Why do connected MCP servers consume context window even before you send your first message?
Build it yourself
Follow these exact steps to reproduce it yourself · estimated time: ~15 minutes
Prerequisites
- Claude Code installed and authenticated
- An existing project you actively develop
- At least one MCP server configured (optional but useful for step 4)
Step 1 — Establish a baseline with /context
Open Claude Code in your project and immediately run:
/contextBefore you have done anything, you will see the system prompt contribution — CLAUDE.md, MCP tool schemas, permission config. Note the number. This is your baseline cost per session.
Step 2 — Do some work and watch context grow
Run a moderately complex task — something that requires reading a few files and running a
command or two. After it completes, run /context again. Observe which category grew the
most. In most sessions, tool call results will be the largest contributor after active work.
Step 3 — Practice the compact/clear decision
Now try both paths deliberately:
Path A — compact: Tell Claude to continue adding to what it just built. Before you send the
prompt, run /compact. Watch the context size drop. Then send your follow-up prompt and
observe that Claude still remembers the key decisions from step 2.
Path B — clear: Open a second terminal window, start a fresh session with claude, and ask
Claude to do something completely unrelated to what you did in step 2. Compare how the session
feels — no inherited assumptions, no residual patterns.
Step 4 — Rewrite a vague prompt as a specific one
Take a task you would normally describe in one sentence. Before sending it, spend two minutes answering these questions in writing:
- Which specific file or function is involved?
- What is the current behaviour and what should it be?
- What have you already tried, if anything?
Weave those answers into the prompt. Send it. Run /context after it completes and compare
tool call usage versus what a vague prompt typically generates.
Step 5 — Disconnect an irrelevant MCP server (if applicable)
If you have MCP servers configured, run /context with all of them connected and note the
system prompt size. Then disconnect a server you do not need for the current task:
claude mcp remove <server-name>Start a new session and run /context immediately. The system prompt should be visibly
smaller. Reconnect the server when you need it:
claude mcp add <server-name> -- <start-command>Step 6 — Delegate a lookup to a sub-agent
In your next session, when you find yourself about to ask “where is X defined?” — pause. Instead, phrase it as a sub-agent task:
Spawn a sub-agent to find where the UserRepository class is defined and return only
the file path and the constructor signature. Do not bring back anything else.Run /context before and after. The answer arrives as a sentence or two; the exploration
stays in the sub-agent’s window.
Expected result: by the end of this workflow you will have a visceral sense of which
activities are cheap versus expensive for context, and a default reflex to reach for /compact,
/clear, specific prompts, and sub-agents at the right moments.