AI Learning
beginner ⏱️ 8 min read · 🎬 ~24 min video

Prompting 101: The Anatomy of a Production-Grade Prompt

Anthropic's Applied AI team shows how to evolve a one-line prompt into a reliable, production-quality prompt — structure, XML tags, examples, giving the model an out, and prefills.

This lesson is original educational writing based on this video by Anthropic (published May 22, 2025). All credit for the original content goes to the creators.

#prompting #fundamentals #claude-api
Video thumbnail: Prompting 101: The Anatomy of a Production-Grade Prompt
Original video — all credit to the creators. Watch the original on YouTube ↗

1. Prompting is engineering, not magic words

In this Code w/ Claude session, Anthropic’s Applied AI team (Hannah Moran and Christian Ryan) build a real prompt live, the way they do with customers: start with a naive one-liner, watch how it fails, and fix each failure with structure. The framing matters:

  • Prompting is clear communication. You’re briefing a very capable new employee who has no context about your business. Everything they need must be in the briefing.
  • Prompting is empirical. You don’t get it right on attempt one. You run it, study the failure, add the missing context or constraint, and run it again. Keep a set of test cases.

The running example

The demo scenario: an insurance company processes Swedish car accident reports. Each claim has two attachments:

  1. A standardized accident report form with 17 rows of checkboxes — one column of facts for Vehicle A, one for Vehicle B (“was turning”, “was changing lanes”, “ignored a red light”…).
  2. A messy hand-drawn sketch of the accident.

The task: determine what happened and who is at fault. A bare prompt like “review this accident report and determine who’s at fault” fails in instructive ways — in early runs Claude even misread what the form itself was, confidently treating checkboxes as filled when they weren’t. Every failure below maps to a missing piece of structure.

2. The anatomy of a production prompt

This is the ordering the team recommends after hundreds of customer engagements. Not every prompt needs every block, but the order is deliberate — models pay strong attention to the beginning and end, and caching favors static content up front.

SYSTEM PROMPT · static · cacheable1 · Task & role context — who Claude is, what job it does2 · Tone & guardrails — confident, factual, stay on task3 · Background data — domain knowledge, docs, in XML tags4 · Detailed rules & step-by-step instructions5 · Examples of good input → outputUSER TURN · dynamic · per request6 · Conversation history (if any)7 · The actual data for this request — again in XML tags8 · Immediate task — what to do right now9 · Think step by step + output format instructions10 · Prefill — start the assistant’s answer for itModels attend most to the start and the end — put data before the final instruction.
The 10 building blocks of a production prompt, in recommended order. Static blocks (top) belong in the system prompt and can be cached; dynamic blocks (bottom) change per request.

Three of these blocks fix the demo’s failures directly:

Task and role context (1). “You are an AI assistant helping a claims adjuster review car accident report forms from Swedish insurance claims.” One sentence eliminated the misidentification problem — Claude now knows what the form is before looking at it.

Background data (3). Instead of letting Claude guess how the form works, the final prompt describes it: 17 rows, what each checkbox means, that a mark can be a cross, a circle, or scribble. The system prompt holds everything true for every claim — which also makes it cacheable with prompt caching, cutting cost and latency.

Rules with an out (4). The single most effective anti-hallucination instruction is telling the model what to do when it can’t answer: “If the form is illegible or the sketch is ambiguous, say that you cannot make a confident determination.” Without an out, the model picks the most plausible answer; with one, it tells you the truth.

Check your understanding

3 questions · your answers are saved in this browser only

  1. 1. Why does the recommended structure put background data BEFORE the detailed instructions and final question?

  2. 2. What is "giving the model an out"?

  3. 3. In the accident-report demo, what fixed Claude misidentifying the form itself?

3. Controlling the output

The last two blocks shape how the answer comes back:

Think step by step (9). For analysis tasks, instruct Claude to reason inside <thinking> tags before answering inside <answer> tags — first establish what happened, then who is at fault. Ordering the reasoning prevents the model from anchoring on a premature verdict. (With modern models you can enable native extended thinking instead, but the principle — reason first, conclude second — is identical.)

Output format + prefill (10). Describe the exact shape you want (headings, JSON schema, tags). Then go one step further: prefill the assistant turn. If you start Claude’s response with { it skips the “Certainly! Here’s the analysis…” preamble and emits pure JSON. A prefill is the strongest formatting lever in the API — the model has no choice but to continue what’s already there.

With the full structure in place, the demo’s final run reads the checkboxes correctly, interprets the sketch, reasons step by step, and produces a confident, correct fault determination with the agreed format — the same inputs that made the naive prompt hallucinate.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~30 minutes

Prerequisites

  • Python 3.10+
  • An Anthropic API key (console.anthropic.com)

You’ll rebuild the talk’s pattern on a task you can run immediately: a structured support-ticket analyst (same anatomy, text-only so you don’t need image inputs).

Step 1 — Set up

mkdir prompt-anatomy && cd prompt-anatomy
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Step 2 — Write the naive version first

Create naive.py — observe the baseline before structuring (this is the empirical loop):

import anthropic

client = anthropic.Anthropic()
ticket = "App crashed AGAIN during checkout?! Third time this week. I'm done."

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=500,
    messages=[{"role": "user", "content": f"Analyze this ticket: {ticket}"}],
)
print(response.content[0].text)

Run it a few times: helpful-ish prose, different shape every time, guesses about facts you never provided.

Step 3 — Apply the full anatomy

Create structured.py:

import anthropic

client = anthropic.Anthropic()

# Blocks 1-5: static, cacheable system prompt
SYSTEM = """You are an AI assistant helping a support team triage tickets
for a mobile e-commerce app.

Stay factual and concise. Never invent details that are not in the ticket.

<background>
Severity levels: P1 = blocks purchases, P2 = degrades experience, P3 = cosmetic.
Known issue KB-114: checkout crash on Android when the cart holds 10+ items.
</background>

Rules:
1. Classify severity using the definitions above.
2. If the ticket may match a known issue, reference its KB id.
3. If the ticket lacks the information to decide, set severity to "unknown"
   and say what is missing. Do not guess.
"""

ticket = "App crashed AGAIN during checkout?! Third time this week. I'm done."

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=500,
    system=SYSTEM,
    messages=[
        {   # Blocks 7-9: dynamic data in tags, immediate task, format
            "role": "user",
            "content": f"""<ticket>{ticket}</ticket>

Analyze the ticket. Think step by step inside <thinking> tags, then output
JSON with keys: severity, sentiment, possible_known_issue, summary.""",
        },
        # Block 10: prefill — forces Claude straight into the thinking tag
        {"role": "assistant", "content": "<thinking>"},
    ],
)
print("<thinking>" + response.content[0].text)

Step 4 — Compare and iterate

Run both scripts on the same ticket, then invent two harder tickets (one ambiguous, one matching KB-114). Expected result: the structured version returns the same JSON shape every run, references KB-114 when appropriate, and answers "severity": "unknown" for the ambiguous ticket instead of guessing — exactly the behaviors the structure was added to produce.

Step 5 — Make it production-ready

Add "cache_control": {"type": "ephemeral"} to the system block to cache blocks 1–5, and build a small list of test tickets you re-run after every prompt change. Congratulations: you now have a prompt and the harness to keep improving it.

Where to go next

Related lessons

intermediate 🎬 Anthropic · ~25 min

Prompting for Agents: Steering Models That Act

Agents are models using tools in a loop. This lesson covers when to build one, how to prompt it — heuristics, budgets, guardrails — and how to evaluate something that takes hundreds of steps.

#agents #prompting #evaluation