AI Learning
intermediate ⏱️ 9 min read · 🎬 ~25 min video

Prompting for Agents: Steering Models That Act

Agents are models using tools in a loop. This lesson covers when to build one, how to prompt it — heuristics, budgets, guardrails — and how to evaluate something that takes hundreds of steps.

This lesson is original educational writing based on this video by Anthropic (published May 22, 2025). All credit for the original content goes to the creators.

#agents #prompting #evaluation
Video thumbnail: Prompting for Agents: Steering Models That Act
Original video — all credit to the creators. Watch the original on YouTube ↗

1. What an agent is (and when you want one)

Anthropic’s working definition is refreshingly small: an agent is a model using tools in a loop. Three things define it:

  • an environment it operates in (a codebase, a browser, your CRM),
  • tools that let it observe and change that environment,
  • a system prompt defining its goal, constraints and ideal behavior.

The model decides what to do next based on what the environment returned last. That autonomy is the whole point — and the whole risk. Contrast with a workflow, where you hard-code the sequence (classify → extract → format). Workflows are cheaper, faster and more predictable; agents shine where you can’t know the steps in advance.

Use an agent when the task is:

  1. Complex — the path can’t be predicted up front (debugging, research, multi-file changes).
  2. Valuable — the outcome justifies tokens and latency.
  3. Feasible — the model demonstrably can do the individual subtasks; de-risk with small tests.
  4. Recoverable — the cost of an error is acceptable or reversible (or gated behind approval).

A task that fails this checklist deserves a workflow, not an agent. “Don’t build agents for everything” is rule number one.

Modelthinks · plans · decidesEnvironmentfiles · web · APIstool calltool resultSystem prompt: goal · heuristics · tool guidance · budgets · guardrailsbounds every iteration of the loopDone? → deliver answer
The agent loop: the model reasons, calls a tool, observes the result, and repeats — bounded by budgets and guardrails — until it can deliver an answer.

2. Prompting an agent is a different sport

A classic prompt scripts a single response. An agent prompt configures behavior across an unpredictable number of steps. The talk’s core advice: think like your agent. Sit where it sits: it wakes up with your system prompt, sees only what tools return, and must decide everything else itself. Prompts fail when they assume context the agent never has.

What goes into a good agent prompt:

Heuristics, not scripts. You can’t enumerate every situation, so teach judgment. Examples from real agent prompts: “start with broad searches, then narrow down”, “prefer primary sources”, “simple questions need under 5 tool calls; hard ones may justify 15”. Each heuristic generalizes across thousands of situations a script would miss.

Budgets and stopping criteria. Agents over-search and over-iterate by default. Give explicit resource guidance — number of tool calls, when an answer is “good enough”, when to give up and report failure honestly. Unbounded loops are how an agent burns 50× the tokens for 2% better answers.

Tool guidance. When two tools overlap, say which to prefer and when. Describe what each tool is for, not just its signature — most “agent bugs” are really tool-description bugs.

Guardrails for irreversibility. Separate read from write. Anything destructive or customer-visible (sending email, deleting records, pushing code) should require explicit approval or simply not be exposed as a tool. Recoverable-by-design beats “hope it behaves”.

Let it think between steps. Enable extended thinking so the agent plans before acting, and reflects after each tool result (“interleaved thinking”): did that search actually answer the question? Should I change strategy? Reflection between tool calls is where agents recover from dead ends instead of doubling down.

Check your understanding

3 questions · your answers are saved in this browser only

  1. 1. Which task is the BEST fit for an agent rather than a workflow?

  2. 2. Why give an agent an explicit tool-call budget?

  3. 3. What is "interleaved thinking" useful for?

3. Evaluating something that takes 200 steps

You cannot improve an agent you can’t measure, and agents resist naive measurement: two correct runs may take completely different paths. The talk’s guidance:

  • Start tiny and real. Twenty tasks drawn from actual usage beat five hundred synthetic ones. In the early phase, even a handful of cases with careful manual review reveals most issues.
  • Grade outcomes, not paths. For questions with verifiable answers, use answer-based grading — did it land on the right final answer? Let the path vary.
  • Use an LLM judge with a rubric for fuzzy qualities (did it cite sources? was the analysis grounded?). Judges scale your review; spot-check them against your own judgment.
  • Watch the transcripts. Aggregate scores tell you that something is wrong; reading the agent’s actual tool-call sequences tells you what. Most fixes turn out to be one new heuristic or one clarified tool description.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~45 minutes

Prerequisites

  • Python 3.10+
  • An Anthropic API key
  • Completion of the Prompting 101 lesson (recommended)

You’ll build a minimal but real agent: a codebase analyst that answers questions about any local project by listing and reading files in a loop — every concept from this lesson in ~80 lines.

Step 1 — Set up

mkdir mini-agent && cd mini-agent
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Step 2 — Define tools and guardrails

Create agent.py. Note: both tools are read-only — the guardrail is structural.

import sys
from pathlib import Path

import anthropic

ROOT = Path(sys.argv[1] if len(sys.argv) > 1 else ".").resolve()

TOOLS = [
    {
        "name": "list_files",
        "description": "List files under a relative directory of the project. "
        "Use this FIRST to orient yourself before reading anything.",
        "input_schema": {
            "type": "object",
            "properties": {"path": {"type": "string", "description": "Relative dir, '' for root"}},
            "required": ["path"],
        },
    },
    {
        "name": "read_file",
        "description": "Read one file's content (truncated to 8000 chars). "
        "Prefer reading few, well-chosen files over reading everything.",
        "input_schema": {
            "type": "object",
            "properties": {"path": {"type": "string"}},
            "required": ["path"],
        },
    },
]

def run_tool(name: str, args: dict) -> str:
    target = (ROOT / args["path"]).resolve()
    if not target.is_relative_to(ROOT):          # guardrail: never escape the project
        return "Error: path outside project root."
    if name == "list_files":
        if not target.is_dir():
            return f"Error: {args['path']!r} is not a directory."
        entries = [
            p.name + ("/" if p.is_dir() else "")
            for p in sorted(target.iterdir())
            if p.name not in {".git", "node_modules", ".venv", "dist"}
        ]
        return "\n".join(entries) or "(empty)"
    if name == "read_file":
        if not target.is_file():
            return f"Error: {args['path']!r} is not a file."
        return target.read_text(errors="replace")[:8000]
    return f"Error: unknown tool {name}"

Step 3 — Write the agent prompt (heuristics + budget + an out)

SYSTEM = """You are a codebase analyst agent. Answer the user's question about
the project by exploring it with your tools.

Heuristics:
- Orient first: list the root, then drill into promising directories.
- Read selectively. README, config and entry-point files usually answer
  structural questions fastest.
- Budget: simple questions should need under 8 tool calls; never exceed 15.
- If you cannot find the answer, say exactly what you looked at and what is
  missing. Never invent file contents.

When you have enough evidence, stop exploring and answer concisely, citing
file paths."""

Step 4 — The loop itself

client = anthropic.Anthropic()

def agent(question: str) -> str:
    messages = [{"role": "user", "content": question}]
    for _ in range(15):  # hard ceiling backing up the soft budget
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2000,
            system=SYSTEM,
            tools=TOOLS,
            messages=messages,
        )
        if response.stop_reason != "tool_use":
            return "".join(b.text for b in response.content if b.type == "text")
        messages.append({"role": "assistant", "content": response.content})
        results = [
            {
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": run_tool(block.name, block.input),
            }
            for block in response.content
            if block.type == "tool_use"
        ]
        for block in response.content:           # watch the transcript while you build
            if block.type == "tool_use":
                print(f"  ⚙ {block.name}({block.input})", file=sys.stderr)
        messages.append({"role": "user", "content": results})
    return "Stopped: tool-call budget exhausted."

if __name__ == "__main__":
    print(agent(input("Question about this codebase: ")))

Step 5 — Run it and watch it think

python3 agent.py /path/to/any/project
# Question: How is this project built and deployed?

Expected result: stderr shows the loop in action — list_files(''), then targeted reads of README/config files, then a cited answer. Note how it orients first: that behavior came from one heuristic line in the system prompt. Delete that line and run again to watch the quality drop — you just did your first agent-prompt ablation.

Step 6 — Evaluate like the talk says

Write 5 questions about a repo you know well, with expected answers. Run them, grade answer-correctness (right/wrong), and read the worst transcript. Fix it by adding one heuristic, not by scripting steps. That loop — eval, read transcript, refine heuristic — is agent engineering.

Where to go next

Related lessons

advanced 🎬 Anthropic · ~30 min

MCP 201: How the Model Context Protocol Really Works

Beyond the hello-world server: why MCP exists, its client–server architecture, the three primitives and who controls them, transports, and where the protocol is heading.

#mcp #integrations #agents
beginner 🎬 Anthropic · ~24 min

Prompting 101: The Anatomy of a Production-Grade Prompt

Anthropic's Applied AI team shows how to evolve a one-line prompt into a reliable, production-quality prompt — structure, XML tags, examples, giving the model an out, and prefills.

#prompting #fundamentals #claude-api