AI Learning
beginner ⏱️ 13 min read · 🎬 ~10 min video

Managing Context in Claude Code

How Claude Code's context window works, when to compact vs clear, and practical strategies for keeping sessions lean and productive.

This lesson is original educational writing based on this video by Anthropic (published May 18, 2026). All credit for the original content goes to the creators.

#claude-code #productivity
Video thumbnail: Managing Context in Claude Code
Original video — all credit to the creators. Watch the original on YouTube ↗

1. The context window is Claude’s working memory

Every interaction you have with Claude Code — every file it reads, every shell command it runs, every message you type — consumes space in the context window. Think of it as a finite whiteboard: the more you write on it, the less room you have to keep going. When the whiteboard fills up, something has to be erased.

This is not a bug or a limitation to route around. It is the fundamental architecture of how large language models work. Understanding it lets you make deliberate choices about what stays on the whiteboard and what gets cleared.

What fills the context window?

The context window fills with several categories of content, roughly in order of how fast they grow:

  • Tool call results — when Claude searches for files, reads their contents, or runs commands, the full output lands in context. A single grep result can be a few lines; a recursive file read of a large module can be thousands.
  • Conversation history — every message you send and every reply Claude generates is retained in full, going all the way back to the start of the session.
  • System prompt and memory filesCLAUDE.md, loaded MCP tool schemas, and any injected instructions all occupy context from the moment the session starts.

You can inspect this at any time by running /context inside a session. It shows the total context used, a breakdown by category, and a visual indicator of how close you are to the limit. The category breakdown is the useful part: it tells you where the space is going, which points directly at where you should optimize.

CONTEXT WINDOWfinite capacity — shared by everything Claude knows about this sessionSystem PromptCLAUDE.mdMCP tool schemaspermissions configloaded once,constant sizeTool Call Resultsfile readsgrep / search outputshell command outputgrows fast —biggest contributorConversation Historyyour messagesClaude’s repliesaccumulated since /cleargrows with everyexchangerun /context to see a live breakdown of all three categories
The context window fills up over a session. Tool call results and conversation history are the biggest contributors, with system prompts loaded once at the start.

2. Compact vs clear: the decision that matters most

When the context window gets full, Claude Code will automatically compact the session — it summarises what has happened and discards raw tool call output to free space. This happens silently in the background, but you can also trigger it intentionally. That intentional choice is where most of the practical value lies.

You have two commands:

CommandWhat it doesWhen to reach for it
/compactSummarises the session history and compresses tool results, but retains the important decisions and conclusionsYou are over the limit on a feature but want to keep going with memory of what was built
/clearWipes the session entirely — fresh start with no historyYou finished one feature and are starting something completely unrelated

The bias risk of not clearing

This is the subtler point, and it is worth sitting with. When you /compact, Claude retains a summary of what you were doing. That is exactly what you want when you are continuing the same work. But if you are now working on a different feature, that summary becomes noise — and worse, it can bias Claude’s decisions.

Imagine you spent the last hour refactoring a caching layer. The compacted summary mentions the caching module repeatedly and describes a particular architectural pattern you used there. You then ask Claude to add a new API endpoint. Claude has no malicious intent, but it has seen the caching layer dozens of times in the past hour and may subtly steer the new endpoint toward patterns it has been reinforcing — even when those patterns do not fit.

The rule is simple: same feature, going over limit → /compact; new feature, fresh task → /clear.

Check your understanding

3 questions · your answers are saved in this browser only

  1. 1. You have been building a new payment flow for 45 minutes. The session is nearing the context limit. You want to keep adding error handling to the same flow. What should you do?

  2. 2. You just finished implementing and testing a new authentication system. You now want to work on a completely unrelated CSV export feature. What is the best approach?

  3. 3. What does running /context tell you?

3. Writing prompts that don’t waste context

Here is a counterintuitive fact: a shorter, vaguer prompt often uses more context than a longer, specific one.

When you write a vague prompt like “fix the login bug”, Claude has to figure out what you mean. It does that by exploring: searching for files that might be relevant, reading several of them, running commands to understand the current state, and asking clarifying questions. Every one of those exploratory tool calls writes its output into the context window.

A specific two-sentence prompt that says which file, what the expected behaviour is, and what you have already tried lets Claude skip the exploration entirely and go straight to acting. The prompt itself is a few dozen tokens longer. The tool calls it avoids are thousands of tokens shorter.

What a vague vs specific prompt looks like in practice

Vague (triggers exploration):

Fix the login bug.

Claude’s response: searches the codebase for files containing “login”, reads five candidate files, runs the test suite to understand the current failure mode, reads the error output, then starts making changes. Six or more tool calls, each writing to context.

Specific (skips exploration):

In src/auth/login.ts, the validateToken function returns true for expired tokens because it checks token.expiry > Date.now() but should check token.expiry > Date.now() / 1000 (the expiry is in seconds, not milliseconds). Fix the comparison.

Claude’s response: opens src/auth/login.ts, applies the fix, runs the relevant test. Two or three tool calls.

The specific prompt is 3x longer in characters. It uses a fraction of the context.

The discipline required

Writing specific prompts requires you to do some upfront thinking — understanding the problem well enough to state it precisely. That thinking is not wasted effort. It is the work you would have had to do anyway, just shifted earlier. And it has a side benefit: if you cannot state the problem precisely, that is a signal you do not yet understand it well enough to fix it.

4. Structural strategies: MCP servers and sub-agents

The previous sections covered reactive management — what to do when context fills up. This section covers proactive design choices that keep sessions lean from the start.

Disable MCP servers you are not using

MCP servers are powerful: they give Claude access to browsers, databases, issue trackers, and more. But they have a cost that is easy to overlook: when an MCP server is connected, all of its available tools are loaded into the context window at the start of every session. A server with forty tools loads forty tool schemas into context before you have typed a single character.

If you have several MCP servers configured — say, a browser automation server, a database server, and a GitHub server — but you are working on a pure Python refactoring task, disconnect the browser and database servers for that session. You get that tool schema space back immediately.

An alternative worth knowing: skills (Claude Code’s built-in slash commands and prompt files) are loaded on demand, not all at once. If you can accomplish something with a skill rather than an MCP tool, you avoid the upfront context cost entirely.

Delegate point-lookups to sub-agents

Sub-agents are one of the most underused tools for context management. When you spawn a sub-agent, it gets its own completely separate context window. It does its work — reading files, running searches, gathering information — in that separate window. When it is done, it returns a summary to your main session. You get the answer; your context window absorbs a few sentences, not hundreds of lines of exploration.

This pattern is specifically valuable for point-lookups: questions that have a single definitive answer you need in order to continue, but where the process of finding that answer would pollute your main context.

Good candidates for sub-agent delegation:

  • “Where is the authentication endpoint defined?”
  • “What does the UserProfile type look like?”
  • “Which tests cover the payment module?”
  • “What version of the ORM library is this project using?”

Each of these questions, if answered directly in the main session, would involve file reads and searches that write their full output to context. A sub-agent absorbs that exploration cost in its own window and hands you back a single sentence.

Check your understanding

1 question · your answers are saved in this browser only

  1. 1. Why do connected MCP servers consume context window even before you send your first message?

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~15 minutes

Prerequisites

  • Claude Code installed and authenticated
  • An existing project you actively develop
  • At least one MCP server configured (optional but useful for step 4)

Step 1 — Establish a baseline with /context

Open Claude Code in your project and immediately run:

/context

Before you have done anything, you will see the system prompt contribution — CLAUDE.md, MCP tool schemas, permission config. Note the number. This is your baseline cost per session.

Step 2 — Do some work and watch context grow

Run a moderately complex task — something that requires reading a few files and running a command or two. After it completes, run /context again. Observe which category grew the most. In most sessions, tool call results will be the largest contributor after active work.

Step 3 — Practice the compact/clear decision

Now try both paths deliberately:

Path A — compact: Tell Claude to continue adding to what it just built. Before you send the prompt, run /compact. Watch the context size drop. Then send your follow-up prompt and observe that Claude still remembers the key decisions from step 2.

Path B — clear: Open a second terminal window, start a fresh session with claude, and ask Claude to do something completely unrelated to what you did in step 2. Compare how the session feels — no inherited assumptions, no residual patterns.

Step 4 — Rewrite a vague prompt as a specific one

Take a task you would normally describe in one sentence. Before sending it, spend two minutes answering these questions in writing:

  1. Which specific file or function is involved?
  2. What is the current behaviour and what should it be?
  3. What have you already tried, if anything?

Weave those answers into the prompt. Send it. Run /context after it completes and compare tool call usage versus what a vague prompt typically generates.

Step 5 — Disconnect an irrelevant MCP server (if applicable)

If you have MCP servers configured, run /context with all of them connected and note the system prompt size. Then disconnect a server you do not need for the current task:

claude mcp remove <server-name>

Start a new session and run /context immediately. The system prompt should be visibly smaller. Reconnect the server when you need it:

claude mcp add <server-name> -- <start-command>

Step 6 — Delegate a lookup to a sub-agent

In your next session, when you find yourself about to ask “where is X defined?” — pause. Instead, phrase it as a sub-agent task:

Spawn a sub-agent to find where the UserRepository class is defined and return only
the file path and the constructor signature. Do not bring back anything else.

Run /context before and after. The answer arrives as a sentence or two; the exploration stays in the sub-agent’s window.

Expected result: by the end of this workflow you will have a visceral sense of which activities are cheap versus expensive for context, and a default reflex to reach for /compact, /clear, specific prompts, and sub-agents at the right moments.

Related lessons

intermediate 🎬 Anthropic · ~25 min

Claude Code Best Practices: The Field Guide

Cal Rueb's field-tested playbook for getting consistently great results from Claude Code: context curation, permission strategy, planning, parallel sessions and knowing when to course-correct.

#claude-code #agentic-coding #productivity #workflows
intermediate 🎬 Anthropic · ~30 min

Mastering Claude Code: The Agentic Coding Workflow

How Claude Code works under the hood, the workflows Anthropic's own engineers use, and how to extend it with memory files, slash commands and headless automation.

#claude-code #agentic-coding #productivity