Prompting for Agents: Steering Models That Act
Agents are models using tools in a loop. This lesson covers when to build one, how to prompt it — heuristics, budgets, guardrails — and how to evaluate something that takes hundreds of steps.
This lesson is original educational writing based on this video by Anthropic (published May 22, 2025). All credit for the original content goes to the creators.
1. What an agent is (and when you want one)
Anthropic’s working definition is refreshingly small: an agent is a model using tools in a loop. Three things define it:
- an environment it operates in (a codebase, a browser, your CRM),
- tools that let it observe and change that environment,
- a system prompt defining its goal, constraints and ideal behavior.
The model decides what to do next based on what the environment returned last. That autonomy is the whole point — and the whole risk. Contrast with a workflow, where you hard-code the sequence (classify → extract → format). Workflows are cheaper, faster and more predictable; agents shine where you can’t know the steps in advance.
Use an agent when the task is:
- Complex — the path can’t be predicted up front (debugging, research, multi-file changes).
- Valuable — the outcome justifies tokens and latency.
- Feasible — the model demonstrably can do the individual subtasks; de-risk with small tests.
- Recoverable — the cost of an error is acceptable or reversible (or gated behind approval).
A task that fails this checklist deserves a workflow, not an agent. “Don’t build agents for everything” is rule number one.
2. Prompting an agent is a different sport
A classic prompt scripts a single response. An agent prompt configures behavior across an unpredictable number of steps. The talk’s core advice: think like your agent. Sit where it sits: it wakes up with your system prompt, sees only what tools return, and must decide everything else itself. Prompts fail when they assume context the agent never has.
What goes into a good agent prompt:
Heuristics, not scripts. You can’t enumerate every situation, so teach judgment. Examples from real agent prompts: “start with broad searches, then narrow down”, “prefer primary sources”, “simple questions need under 5 tool calls; hard ones may justify 15”. Each heuristic generalizes across thousands of situations a script would miss.
Budgets and stopping criteria. Agents over-search and over-iterate by default. Give explicit resource guidance — number of tool calls, when an answer is “good enough”, when to give up and report failure honestly. Unbounded loops are how an agent burns 50× the tokens for 2% better answers.
Tool guidance. When two tools overlap, say which to prefer and when. Describe what each tool is for, not just its signature — most “agent bugs” are really tool-description bugs.
Guardrails for irreversibility. Separate read from write. Anything destructive or customer-visible (sending email, deleting records, pushing code) should require explicit approval or simply not be exposed as a tool. Recoverable-by-design beats “hope it behaves”.
Let it think between steps. Enable extended thinking so the agent plans before acting, and reflects after each tool result (“interleaved thinking”): did that search actually answer the question? Should I change strategy? Reflection between tool calls is where agents recover from dead ends instead of doubling down.
Check your understanding
3 questions · your answers are saved in this browser only
-
1. Which task is the BEST fit for an agent rather than a workflow?
-
2. Why give an agent an explicit tool-call budget?
-
3. What is "interleaved thinking" useful for?
3. Evaluating something that takes 200 steps
You cannot improve an agent you can’t measure, and agents resist naive measurement: two correct runs may take completely different paths. The talk’s guidance:
- Start tiny and real. Twenty tasks drawn from actual usage beat five hundred synthetic ones. In the early phase, even a handful of cases with careful manual review reveals most issues.
- Grade outcomes, not paths. For questions with verifiable answers, use answer-based grading — did it land on the right final answer? Let the path vary.
- Use an LLM judge with a rubric for fuzzy qualities (did it cite sources? was the analysis grounded?). Judges scale your review; spot-check them against your own judgment.
- Watch the transcripts. Aggregate scores tell you that something is wrong; reading the agent’s actual tool-call sequences tells you what. Most fixes turn out to be one new heuristic or one clarified tool description.
Build it yourself
Follow these exact steps to reproduce it yourself · estimated time: ~45 minutes
Prerequisites
- Python 3.10+
- An Anthropic API key
- Completion of the Prompting 101 lesson (recommended)
You’ll build a minimal but real agent: a codebase analyst that answers questions about any local project by listing and reading files in a loop — every concept from this lesson in ~80 lines.
Step 1 — Set up
mkdir mini-agent && cd mini-agent
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."Step 2 — Define tools and guardrails
Create agent.py. Note: both tools are read-only — the guardrail is structural.
import sys
from pathlib import Path
import anthropic
ROOT = Path(sys.argv[1] if len(sys.argv) > 1 else ".").resolve()
TOOLS = [
{
"name": "list_files",
"description": "List files under a relative directory of the project. "
"Use this FIRST to orient yourself before reading anything.",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string", "description": "Relative dir, '' for root"}},
"required": ["path"],
},
},
{
"name": "read_file",
"description": "Read one file's content (truncated to 8000 chars). "
"Prefer reading few, well-chosen files over reading everything.",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
},
},
]
def run_tool(name: str, args: dict) -> str:
target = (ROOT / args["path"]).resolve()
if not target.is_relative_to(ROOT): # guardrail: never escape the project
return "Error: path outside project root."
if name == "list_files":
if not target.is_dir():
return f"Error: {args['path']!r} is not a directory."
entries = [
p.name + ("/" if p.is_dir() else "")
for p in sorted(target.iterdir())
if p.name not in {".git", "node_modules", ".venv", "dist"}
]
return "\n".join(entries) or "(empty)"
if name == "read_file":
if not target.is_file():
return f"Error: {args['path']!r} is not a file."
return target.read_text(errors="replace")[:8000]
return f"Error: unknown tool {name}"Step 3 — Write the agent prompt (heuristics + budget + an out)
SYSTEM = """You are a codebase analyst agent. Answer the user's question about
the project by exploring it with your tools.
Heuristics:
- Orient first: list the root, then drill into promising directories.
- Read selectively. README, config and entry-point files usually answer
structural questions fastest.
- Budget: simple questions should need under 8 tool calls; never exceed 15.
- If you cannot find the answer, say exactly what you looked at and what is
missing. Never invent file contents.
When you have enough evidence, stop exploring and answer concisely, citing
file paths."""Step 4 — The loop itself
client = anthropic.Anthropic()
def agent(question: str) -> str:
messages = [{"role": "user", "content": question}]
for _ in range(15): # hard ceiling backing up the soft budget
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2000,
system=SYSTEM,
tools=TOOLS,
messages=messages,
)
if response.stop_reason != "tool_use":
return "".join(b.text for b in response.content if b.type == "text")
messages.append({"role": "assistant", "content": response.content})
results = [
{
"type": "tool_result",
"tool_use_id": block.id,
"content": run_tool(block.name, block.input),
}
for block in response.content
if block.type == "tool_use"
]
for block in response.content: # watch the transcript while you build
if block.type == "tool_use":
print(f" ⚙ {block.name}({block.input})", file=sys.stderr)
messages.append({"role": "user", "content": results})
return "Stopped: tool-call budget exhausted."
if __name__ == "__main__":
print(agent(input("Question about this codebase: ")))Step 5 — Run it and watch it think
python3 agent.py /path/to/any/project
# Question: How is this project built and deployed?Expected result: stderr shows the loop in action — list_files(''), then targeted reads of
README/config files, then a cited answer. Note how it orients first: that behavior came from one
heuristic line in the system prompt. Delete that line and run again to watch the quality drop —
you just did your first agent-prompt ablation.
Step 6 — Evaluate like the talk says
Write 5 questions about a repo you know well, with expected answers. Run them, grade answer-correctness (right/wrong), and read the worst transcript. Fix it by adding one heuristic, not by scripting steps. That loop — eval, read transcript, refine heuristic — is agent engineering.
Where to go next
- Watch the original talk for the live walkthroughs of real agent prompts.
- Anthropic’s essay Building Effective Agents is the canonical written companion.
- See these ideas productized in Mastering Claude Code — Claude Code is exactly this loop, polished.