Prompting 101: The Anatomy of a Production-Grade Prompt

1. Prompting is engineering, not magic words

In this Code w/ Claude session, Anthropic’s Applied AI team (Hannah Moran and Christian Ryan) build a real prompt live, the way they do with customers: start with a naive one-liner, watch how it fails, and fix each failure with structure. The framing matters:

Prompting is clear communication. You’re briefing a very capable new employee who has no context about your business. Everything they need must be in the briefing.
Prompting is empirical. You don’t get it right on attempt one. You run it, study the failure, add the missing context or constraint, and run it again. Keep a set of test cases.

The running example

The demo scenario: an insurance company processes Swedish car accident reports. Each claim has two attachments:

A standardized accident report form with 17 rows of checkboxes — one column of facts for Vehicle A, one for Vehicle B (“was turning”, “was changing lanes”, “ignored a red light”…).
A messy hand-drawn sketch of the accident.

The task: determine what happened and who is at fault. A bare prompt like “review this accident report and determine who’s at fault” fails in instructive ways — in early runs Claude even misread what the form itself was, confidently treating checkboxes as filled when they weren’t. Every failure below maps to a missing piece of structure.

2. The anatomy of a production prompt

This is the ordering the team recommends after hundreds of customer engagements. Not every prompt needs every block, but the order is deliberate — models pay strong attention to the beginning and end, and caching favors static content up front.

The 10 building blocks of a production prompt, in recommended order. Static blocks (top) belong in the system prompt and can be cached; dynamic blocks (bottom) change per request.

Three of these blocks fix the demo’s failures directly:

Task and role context (1). “You are an AI assistant helping a claims adjuster review car accident report forms from Swedish insurance claims.” One sentence eliminated the misidentification problem — Claude now knows what the form is before looking at it.

Background data (3). Instead of letting Claude guess how the form works, the final prompt describes it: 17 rows, what each checkbox means, that a mark can be a cross, a circle, or scribble. The system prompt holds everything true for every claim — which also makes it cacheable with prompt caching, cutting cost and latency.

Rules with an out (4). The single most effective anti-hallucination instruction is telling the model what to do when it can’t answer: “If the form is illegible or the sketch is ambiguous, say that you cannot make a confident determination.” Without an out, the model picks the most plausible answer; with one, it tells you the truth.

Check your understanding

3 questions · your answers are saved in this browser only

1. Why does the recommended structure put background data BEFORE the detailed instructions and final question?

Long documents go early; the immediate question goes last where attention is high. Static content first also means the cached prefix stays identical across requests.
2. What is "giving the model an out"?

Telling Claude to say "I cannot make a confident determination" when evidence is ambiguous is one of the most effective hallucination preventers.
3. In the accident-report demo, what fixed Claude misidentifying the form itself?

One sentence of role and task context anchored the interpretation of everything that followed.

3. Controlling the output

The last two blocks shape how the answer comes back:

Think step by step (9). For analysis tasks, instruct Claude to reason inside <thinking> tags before answering inside <answer> tags — first establish what happened, then who is at fault. Ordering the reasoning prevents the model from anchoring on a premature verdict. (With modern models you can enable native extended thinking instead, but the principle — reason first, conclude second — is identical.)

Output format + prefill (10). Describe the exact shape you want (headings, JSON schema, tags). Then go one step further: prefill the assistant turn. If you start Claude’s response with { it skips the “Certainly! Here’s the analysis…” preamble and emits pure JSON. A prefill is the strongest formatting lever in the API — the model has no choice but to continue what’s already there.

With the full structure in place, the demo’s final run reads the checkboxes correctly, interprets the sketch, reasons step by step, and produces a confident, correct fault determination with the agreed format — the same inputs that made the naive prompt hallucinate.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~30 minutes

Prerequisites

Python 3.10+
An Anthropic API key (console.anthropic.com)

You’ll rebuild the talk’s pattern on a task you can run immediately: a structured support-ticket analyst (same anatomy, text-only so you don’t need image inputs).

Step 1 — Set up

mkdir prompt-anatomy && cd prompt-anatomy
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Step 2 — Write the naive version first

Create naive.py — observe the baseline before structuring (this is the empirical loop):

import anthropic

client = anthropic.Anthropic()
ticket = "App crashed AGAIN during checkout?! Third time this week. I'm done."

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=500,
    messages=[{"role": "user", "content": f"Analyze this ticket: {ticket}"}],
)
print(response.content[0].text)

Run it a few times: helpful-ish prose, different shape every time, guesses about facts you never provided.

Step 3 — Apply the full anatomy

Create structured.py:

import anthropic

client = anthropic.Anthropic()

# Blocks 1-5: static, cacheable system prompt
SYSTEM = """You are an AI assistant helping a support team triage tickets
for a mobile e-commerce app.

Stay factual and concise. Never invent details that are not in the ticket.

<background>
Severity levels: P1 = blocks purchases, P2 = degrades experience, P3 = cosmetic.
Known issue KB-114: checkout crash on Android when the cart holds 10+ items.
</background>

Rules:
1. Classify severity using the definitions above.
2. If the ticket may match a known issue, reference its KB id.
3. If the ticket lacks the information to decide, set severity to "unknown"
   and say what is missing. Do not guess.
"""

ticket = "App crashed AGAIN during checkout?! Third time this week. I'm done."

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=500,
    system=SYSTEM,
    messages=[
        {   # Blocks 7-9: dynamic data in tags, immediate task, format
            "role": "user",
            "content": f"""<ticket>{ticket}</ticket>

Analyze the ticket. Think step by step inside <thinking> tags, then output
JSON with keys: severity, sentiment, possible_known_issue, summary.""",
        },
        # Block 10: prefill — forces Claude straight into the thinking tag
        {"role": "assistant", "content": "<thinking>"},
    ],
)
print("<thinking>" + response.content[0].text)

Step 4 — Compare and iterate

Run both scripts on the same ticket, then invent two harder tickets (one ambiguous, one matching KB-114). Expected result: the structured version returns the same JSON shape every run, references KB-114 when appropriate, and answers "severity": "unknown" for the ambiguous ticket instead of guessing — exactly the behaviors the structure was added to produce.

Step 5 — Make it production-ready

Add "cache_control": {"type": "ephemeral"} to the system block to cache blocks 1–5, and build a small list of test tickets you re-run after every prompt change. Congratulations: you now have a prompt and the harness to keep improving it.

Where to go next

Watch the original session to see the failures live — they teach more than the fixes.
Continue with Prompting for Agents: how these fundamentals change when Claude can use tools in a loop.
The Anthropic prompt engineering docs formalize every technique here.

Prompting 101: The Anatomy of a Production-Grade Prompt

1. Prompting is engineering, not magic words

The running example

2. The anatomy of a production prompt

Check your understanding

3. Controlling the output

Build it yourself

Step 1 — Set up

Step 2 — Write the naive version first

Step 3 — Apply the full anatomy

Step 4 — Compare and iterate

Step 5 — Make it production-ready

Where to go next

Related lessons

Building Effective Agents: Workflows, Agents and the Patterns Between

Prompting for Agents: Steering Models That Act

1. Prompting is engineering, not magic words

The running example

2. The anatomy of a production prompt

🧠 Check your understanding

3. Controlling the output

🛠️ Build it yourself

Step 1 — Set up

Step 2 — Write the naive version first

Step 3 — Apply the full anatomy

Step 4 — Compare and iterate

Step 5 — Make it production-ready

Where to go next

Related lessons

Building Effective Agents: Workflows, Agents and the Patterns Between

Prompting for Agents: Steering Models That Act

Check your understanding

Build it yourself