AI Learning
intermediate ⏱️ 8 min read · 📄 ~20 min article

Building Effective Agents: Workflows, Agents and the Patterns Between

Anthropic's foundational essay distilled into a class: the five workflow patterns, what truly counts as an agent, why simplicity wins, and how to design tools your agent can actually use.

This lesson is original educational writing based on this article by Anthropic (published December 19, 2024). All credit for the original content goes to the creators.

#agents #architecture #fundamentals

📄 The original is a written piece:  read it at the source ↗  — this lesson restructures it into an interactive class.

1. One distinction to rule the field

Published in December 2024, Building Effective Agents became the reference everyone cites because it cut through the hype with one clean distinction:

  • Workflows are systems where LLMs and tools are orchestrated through predefined code paths. You decide the steps; the model fills them in.
  • Agents are systems where the model directs its own process — it decides which tools to use, in what order, and when it’s done.

Most “agent” products are workflows, and that’s not an insult — the essay’s strongest advice is to find the simplest solution possible and only add complexity when simpler fails. Autonomy costs latency, tokens and predictability; spend it only where the task genuinely can’t be pre-decomposed. (For that decision checklist, see Prompting for Agents.)

2. The five workflow patterns

The essay catalogs five composable patterns between “single prompt” and “full agent”:

Augmented LLM call — retrieval · tools · memory (start here)1 · Prompt chaining — fixed steps, each LLM call feeds the next2 · Routing — classify input, send to a specialized handler3 · Parallelization — sectioning for speed, voting for confidence4 · Orchestrator–workers — one LLM decomposes, delegates dynamically5 · Evaluator–optimizer — generate, critique, refine in a loopAgent — model plans, acts and self-directs in a loopmaximum flexibility · maximum cost & unpredictability
The escalation ladder: five workflow patterns of increasing flexibility sit between a single augmented LLM call and a fully autonomous agent.

Prompt chaining — decompose a task into fixed steps, optionally with programmatic checks (“gates”) between them. Fit: tasks that decompose cleanly — outline → validate outline → write.

Routing — classify the input first, then dispatch to a specialized prompt/model. Fit: distinct input categories needing different handling — support tickets by type, easy questions to a fast model, hard ones to a stronger one.

Parallelization — run LLM calls concurrently: sectioning (split independent subtasks) or voting (same task multiple times, aggregate for confidence). Fit: speed via independence, or higher-stakes judgments — e.g., several reviewers each checking a different concern.

Orchestrator–workers — a central LLM dynamically decomposes the task and delegates to worker calls, then synthesizes. Unlike parallelization, the subtasks aren’t known in advance. Fit: “make this change across however many files it touches.”

Evaluator–optimizer — one call generates, another critiques against criteria, loop until accepted. Fit: when you can articulate what “better” means — translation polish, search-result refinement.

Only when even orchestrator–workers can’t pre-structure the problem do you reach for an agent: the model operating tools in a loop, gaining ground truth from the environment at each step, with budgets and stopping conditions as guardrails.

Check your understanding

4 questions · your answers are saved in this browser only

  1. 1. What distinguishes an agent from a workflow in the essay's definitions?

  2. 2. Customer emails should be handled differently depending on whether they are refunds, bugs or sales leads. Which pattern fits first?

  3. 3. When does orchestrator–workers beat plain parallelization?

  4. 4. What is the essay's overarching architectural principle?

3. The agent–computer interface

The essay’s most under-appreciated section argues that tool design deserves the same care as human interface design. The model experiences your tools the way users experience your UI — and most “dumb agent” behavior is actually bad ACI:

  • Formats matter. Choose tool input/output formats close to what the model has seen in training, and give it room to think before committing to rigid output.
  • Naming and descriptions matter. Write tool descriptions like docstrings for a junior engineer: when to use it, when not to, what the parameters mean, examples and edge cases.
  • Make misuse hard. If a parameter keeps being filled wrong, don’t add prompt scolding — change the parameter. (Their example: requiring absolute file paths ended a class of errors that relative paths kept causing.)
  • Test tools in isolation. Run many example calls and read what the model actually does with your interface before blaming the model.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~40 minutes

Prerequisites

  • Python 3.10+
  • An Anthropic API key

Implement the two most-used patterns — chaining with a gate, and routing — in plain Python. No framework: the essay’s point is that these are a few dozen lines each.

Step 1 — Set up

mkdir agent-patterns && cd agent-patterns
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Create llm.py, a tiny helper both patterns share:

import anthropic

client = anthropic.Anthropic()

def ask(prompt: str, system: str = "") -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1500,
        system=system or anthropic.NOT_GIVEN,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text

Step 2 — Prompt chaining with a gate

chain.py — outline → programmatic gate → article. The gate is ordinary code, which is the whole point:

from llm import ask

topic = "Why simple LLM workflows beat premature agents"

outline = ask(
    f"Write a 5-section outline for a short technical article about: {topic}. "
    "One line per section, numbered."
)

# Gate: cheap deterministic check between steps — fail fast, don't propagate junk
lines = [line for line in outline.splitlines() if line.strip()]
if not 4 <= len(lines) <= 7:
    raise SystemExit(f"Gate failed: outline has {len(lines)} sections:\n{outline}")

article = ask(
    f"Write the article following this outline exactly. Keep it under 400 words.\n\n{outline}"
)
print(article)

Step 3 — Routing

route.py — classify, then dispatch to specialized handlers (different prompts; in production, often different models — fast/cheap for easy lanes):

import json
from llm import ask

HANDLERS = {
    "refund": "You are a refunds specialist. Be empathetic, confirm the order id, state the 14-day policy.",
    "bug": "You are technical support. Ask for reproduction steps, app version and platform.",
    "sales": "You are a friendly sales rep. Answer briefly and offer a demo call.",
}

def route(email: str) -> str:
    verdict = ask(
        "Classify this email as exactly one of: refund, bug, sales. "
        'Reply with JSON only: {"category": "..."}\n\n<email>' + email + "</email>"
    )
    category = json.loads(verdict)["category"]
    print(f"[router] → {category}")
    return ask(f"<email>{email}</email>\n\nWrite the reply.", system=HANDLERS[category])

print(route("Hey, the app crashes whenever I open settings on Android 15."))

Step 4 — Verify the architecture choice

Run both. Expected result: chain.py either prints a well-structured article or fails loudly at the gate; route.py prints the chosen lane and a noticeably specialized reply. Now stress the boundary: feed route.py an email that’s both a bug report and a refund demand. Watch it force a single lane — that’s a known limit of routing, and the moment you can justify escalating (multi-label routing, orchestrator–workers, or — eventually — an agent).

Step 5 — Practice the ACI lesson

Take the agent you built in Prompting for Agents and rewrite only its tool descriptions (when to use, when not to, one example each) without touching the system prompt. Re-run your eval questions. Tool-description-only improvements are the essay’s thesis made tangible.

Where to go next

  • Read the original essay — short, and the appendix on coding agents is gold.
  • See the decision framework for when to go full agent in Prompting for Agents.
  • When your agent needs standardized tool access, that’s MCP 201.

Related lessons

advanced 🎬 Anthropic · ~30 min

MCP 201: How the Model Context Protocol Really Works

Beyond the hello-world server: why MCP exists, its client–server architecture, the three primitives and who controls them, transports, and where the protocol is heading.

#mcp #integrations #agents
beginner 🎬 Anthropic · ~24 min

Prompting 101: The Anatomy of a Production-Grade Prompt

Anthropic's Applied AI team shows how to evolve a one-line prompt into a reliable, production-quality prompt — structure, XML tags, examples, giving the model an out, and prefills.

#prompting #fundamentals #claude-api
intermediate 🎬 Anthropic · ~25 min

Prompting for Agents: Steering Models That Act

Agents are models using tools in a loop. This lesson covers when to build one, how to prompt it — heuristics, budgets, guardrails — and how to evaluate something that takes hundreds of steps.

#agents #prompting #evaluation