Prompting 101: The Anatomy of a Production-Grade Prompt
Anthropic's Applied AI team shows how to evolve a one-line prompt into a reliable, production-quality prompt — structure, XML tags, examples, giving the model an out, and prefills.
This lesson is original educational writing based on this video by Anthropic (published May 22, 2025). All credit for the original content goes to the creators.
1. Prompting is engineering, not magic words
In this Code w/ Claude session, Anthropic’s Applied AI team (Hannah Moran and Christian Ryan) build a real prompt live, the way they do with customers: start with a naive one-liner, watch how it fails, and fix each failure with structure. The framing matters:
- Prompting is clear communication. You’re briefing a very capable new employee who has no context about your business. Everything they need must be in the briefing.
- Prompting is empirical. You don’t get it right on attempt one. You run it, study the failure, add the missing context or constraint, and run it again. Keep a set of test cases.
The running example
The demo scenario: an insurance company processes Swedish car accident reports. Each claim has two attachments:
- A standardized accident report form with 17 rows of checkboxes — one column of facts for Vehicle A, one for Vehicle B (“was turning”, “was changing lanes”, “ignored a red light”…).
- A messy hand-drawn sketch of the accident.
The task: determine what happened and who is at fault. A bare prompt like “review this accident report and determine who’s at fault” fails in instructive ways — in early runs Claude even misread what the form itself was, confidently treating checkboxes as filled when they weren’t. Every failure below maps to a missing piece of structure.
2. The anatomy of a production prompt
This is the ordering the team recommends after hundreds of customer engagements. Not every prompt needs every block, but the order is deliberate — models pay strong attention to the beginning and end, and caching favors static content up front.
Three of these blocks fix the demo’s failures directly:
Task and role context (1). “You are an AI assistant helping a claims adjuster review car accident report forms from Swedish insurance claims.” One sentence eliminated the misidentification problem — Claude now knows what the form is before looking at it.
Background data (3). Instead of letting Claude guess how the form works, the final prompt describes it: 17 rows, what each checkbox means, that a mark can be a cross, a circle, or scribble. The system prompt holds everything true for every claim — which also makes it cacheable with prompt caching, cutting cost and latency.
Rules with an out (4). The single most effective anti-hallucination instruction is telling the model what to do when it can’t answer: “If the form is illegible or the sketch is ambiguous, say that you cannot make a confident determination.” Without an out, the model picks the most plausible answer; with one, it tells you the truth.
Check your understanding
3 questions · your answers are saved in this browser only
-
1. Why does the recommended structure put background data BEFORE the detailed instructions and final question?
-
2. What is "giving the model an out"?
-
3. In the accident-report demo, what fixed Claude misidentifying the form itself?
3. Controlling the output
The last two blocks shape how the answer comes back:
Think step by step (9). For analysis tasks, instruct Claude to reason inside
<thinking> tags before answering inside <answer> tags — first establish what happened,
then who is at fault. Ordering the reasoning prevents the model from anchoring on a premature
verdict. (With modern models you can enable native extended thinking instead, but the principle
— reason first, conclude second — is identical.)
Output format + prefill (10). Describe the exact shape you want (headings, JSON schema,
tags). Then go one step further: prefill the assistant turn. If you start Claude’s response
with { it skips the “Certainly! Here’s the analysis…” preamble and emits pure JSON. A prefill
is the strongest formatting lever in the API — the model has no choice but to continue what’s
already there.
With the full structure in place, the demo’s final run reads the checkboxes correctly, interprets the sketch, reasons step by step, and produces a confident, correct fault determination with the agreed format — the same inputs that made the naive prompt hallucinate.
Build it yourself
Follow these exact steps to reproduce it yourself · estimated time: ~30 minutes
Prerequisites
- Python 3.10+
- An Anthropic API key (console.anthropic.com)
You’ll rebuild the talk’s pattern on a task you can run immediately: a structured support-ticket analyst (same anatomy, text-only so you don’t need image inputs).
Step 1 — Set up
mkdir prompt-anatomy && cd prompt-anatomy
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."Step 2 — Write the naive version first
Create naive.py — observe the baseline before structuring (this is the empirical loop):
import anthropic
client = anthropic.Anthropic()
ticket = "App crashed AGAIN during checkout?! Third time this week. I'm done."
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
messages=[{"role": "user", "content": f"Analyze this ticket: {ticket}"}],
)
print(response.content[0].text)Run it a few times: helpful-ish prose, different shape every time, guesses about facts you never provided.
Step 3 — Apply the full anatomy
Create structured.py:
import anthropic
client = anthropic.Anthropic()
# Blocks 1-5: static, cacheable system prompt
SYSTEM = """You are an AI assistant helping a support team triage tickets
for a mobile e-commerce app.
Stay factual and concise. Never invent details that are not in the ticket.
<background>
Severity levels: P1 = blocks purchases, P2 = degrades experience, P3 = cosmetic.
Known issue KB-114: checkout crash on Android when the cart holds 10+ items.
</background>
Rules:
1. Classify severity using the definitions above.
2. If the ticket may match a known issue, reference its KB id.
3. If the ticket lacks the information to decide, set severity to "unknown"
and say what is missing. Do not guess.
"""
ticket = "App crashed AGAIN during checkout?! Third time this week. I'm done."
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system=SYSTEM,
messages=[
{ # Blocks 7-9: dynamic data in tags, immediate task, format
"role": "user",
"content": f"""<ticket>{ticket}</ticket>
Analyze the ticket. Think step by step inside <thinking> tags, then output
JSON with keys: severity, sentiment, possible_known_issue, summary.""",
},
# Block 10: prefill — forces Claude straight into the thinking tag
{"role": "assistant", "content": "<thinking>"},
],
)
print("<thinking>" + response.content[0].text)Step 4 — Compare and iterate
Run both scripts on the same ticket, then invent two harder tickets (one ambiguous, one matching
KB-114). Expected result: the structured version returns the same JSON shape every run,
references KB-114 when appropriate, and answers "severity": "unknown" for the ambiguous ticket
instead of guessing — exactly the behaviors the structure was added to produce.
Step 5 — Make it production-ready
Add "cache_control": {"type": "ephemeral"} to the system block to cache blocks 1–5, and build a
small list of test tickets you re-run after every prompt change. Congratulations: you now have a
prompt and the harness to keep improving it.
Where to go next
- Watch the original session to see the failures live — they teach more than the fixes.
- Continue with Prompting for Agents: how these fundamentals change when Claude can use tools in a loop.
- The Anthropic prompt engineering docs formalize every technique here.