How We Claude Code: HTML Specs, Agent Interviews, and Verification-Native Artifacts
Anthropic's Applied AI team shares three practices that get more out of longer-running agents: letting Claude interview you instead of writing specs yourself, using HTML over Markdown for richer specs, and embedding verification directly into your artifacts so agents can validate their own work.
This lesson is original educational writing based on this video by Claude (published May 23, 2026). All credit for the original content goes to the creators.
1. The bitter lesson applied to specs
Richard Sutton’s “bitter lesson” in machine learning: researchers spend years encoding human knowledge into systems, but pouring more data and compute beats every handcrafted approach. The lesson keeps repeating.
Arno, from Anthropic’s Applied AI team, applies the same idea to spec-writing: “The models are becoming more capable. You should accept that the model is probably better at extracting requirements from you than you are at defining your requirements.”
Your requirements are latent. You know what you want when you see it, but you’re often not great at articulating it upfront. Claude can be a better interviewer than you are a spec-writer — if you prompt it correctly.
2. Letting Claude interview you
Bad prompt: “Just make it better.”
Good prompt structure:
- Name the domain (not the outcome)
- Ask Claude to iterate turn-by-turn
- Explicitly instruct Claude to use the
ask_user_questiontool to drive the interview
Example starter:
I want to build a bill-splitting app for going out with friends.
Don't build anything yet.
Interview me to understand my requirements.
Focus on: audience, core use cases, edge cases, and what "good" looks like.
Use the ask_user_question tool to ask one question at a time.
Claude will then ask progressively more specific questions — primary audience, secondary audience, how items get added, how splitting rules work — without you having to anticipate those dimensions yourself. Each answer unlocks the next question.
The resulting spec is better than what you’d write from scratch, because the model has seen thousands of similar apps and knows what questions get asked.
3. HTML specs over Markdown
Once you have a spec, the workshop makes a surprising argument: write it as an HTML file, not a Markdown file.
The reason is density. A Markdown spec that reaches 200 lines becomes unlikely to be read carefully — by you or your colleagues. HTML can pack more information into less space, embed interactive examples, and crucially: show you what the thing will look like instead of describing it.
For a UI project, Arno had Claude generate four different design directions — brutalist, Tokyo fintech, minimalist, bold — as interactive HTML files. Each could be clicked through, compared side by side, and fed back to Claude with a screenshot. The feedback loop is much tighter than trying to describe visual preferences in words.
4. Verification-native artifacts
The third practice is the most technically concrete: embed verification into the artifact itself so both humans and agents can run it without external tooling.
The demo uses a to-do app with a key design decision: each component publishes its state to the DOM as data attributes.
<!-- Instead of reading React internal state... -->
<div
data-verify-unit="total-stats"
data-total="7"
data-done="3"
data-active="4"
>
7 items · 3 done · 4 active
</div>
When an agent needs to verify that the totals are correct, it doesn’t have to scrape the visual display or hook into React internals. It reads the data attributes directly — the same way an automated test would, but accessible to any agent with browser access.
Check your understanding
3 questions · your answers are saved in this browser only
-
1. Why should you use data attributes (like data-verify-unit) to publish component state to the DOM?
-
2. What is the primary argument for using HTML files over Markdown for specs?
-
3. The workshop shows verification running three ways. Which of these is NOT one of them?
5. Three verification surfaces, one contract
Once you’ve embedded state in the DOM, verification can run three ways against the same contract:
Surface 1 — Human dashboard: A custom verification page that runs all invariants and displays pass/fail with the specific values that caused failures. The workshop plants a deliberate bug (3+4 ≠ 10) and the dashboard catches it immediately.
Surface 2 — Agent-driven browser: Playwright MCP connects to the running app. The agent reads the manifest of verification steps from the DOM, replays them, and can even record video clips of each step. The recording is the evidence — it can be shared with colleagues or stored in S3 as proof that verification ran.
Surface 3 — Headless CI: bun verify or equivalent runs the same invariants from the command line. No browser needed. This is what runs in CI on every push.
The key insight: you build the verification once (as DOM contracts), and it runs everywhere. The agent-recorded clips also serve a documentation function — when the Claude Code team ships at high velocity, the video clips prove what changed and that it still works.
6. Practical guidance on models and modes
A few specific recommendations from the workshop:
- Use auto mode. If you’re not in auto mode, switch. It removes the constant approval interruptions that break long-running agent flows.
- Set effort to
x-high. For complex specs and verification setup:/effort x-highbefore the task. - Use Opus 4.7 for vision-heavy work. The vision model in Opus 4.7 is significantly better than Sonnet for interpreting screenshots and comparing layouts. For spec generation that involves visual feedback, use Opus.
- Fast mode for spec iteration. Fast mode costs more per token but speeds output — worth it when you’re iterating quickly on design directions or spec content.
Build it yourself
Follow these exact steps to reproduce it yourself · estimated time: ~60 minutes
Prerequisites
- Claude Code with Playwright MCP configured
- A simple web app (React, vanilla JS, anything with a DOM)
- Node.js / bun for running headless verification
Add DOM verification contracts to an existing app and wire three surfaces to them.
Step 1 — Pick a component with verifiable state
Choose one component whose correctness you care about: a cart total, a filter count, a form field validation state, a pagination offset. Anything where you can express invariants like “total = sum of items” or “active = total - done.”
Step 2 — Publish state to the DOM
Add data attributes to the container element. Keep the names consistent and human-readable:
// React example
function CartSummary({ items }) {
const total = items.reduce((sum, i) => sum + i.price, 0);
const itemCount = items.length;
return (
<div
data-verify-unit="cart-summary"
data-item-count={itemCount}
data-total-cents={Math.round(total * 100)}
>
{itemCount} items · ${total.toFixed(2)}
</div>
);
}Step 3 — Write the invariants
Create a verify/invariants.js file:
export const invariants = [
{
name: "cart-total-matches-items",
description: "Cart total equals sum of individual item prices",
check: () => {
const summary = document.querySelector('[data-verify-unit="cart-summary"]');
const items = document.querySelectorAll('[data-verify-unit="cart-item"]');
const expectedCents = [...items].reduce(
(sum, el) => sum + parseInt(el.dataset.priceCents), 0
);
const actualCents = parseInt(summary.dataset.totalCents);
return {
pass: expectedCents === actualCents,
expected: expectedCents,
actual: actualCents,
};
},
},
];Step 4 — Build the human dashboard
Create verify/dashboard.html — a standalone page that imports your app and runs the invariants:
<script type="module">
import { invariants } from './invariants.js';
const results = invariants.map(inv => ({
name: inv.name,
description: inv.description,
result: inv.check(),
}));
// Render a table of pass/fail with values
const html = results.map(r => `
<tr class="${r.result.pass ? 'pass' : 'fail'}">
<td>${r.name}</td>
<td>${r.result.pass ? '✅' : '❌'}</td>
<td>${JSON.stringify(r.result)}</td>
</tr>
`).join('');
document.getElementById('results').innerHTML = html;
</script>Step 5 — Wire headless verification
# verify/run.mjs
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('http://localhost:3000/verify/dashboard.html');
const results = await page.evaluate(() => {
return [...document.querySelectorAll('tr[class]')].map(tr => ({
name: tr.querySelector('td:first-child').textContent,
pass: tr.classList.contains('pass'),
}));
});
const failures = results.filter(r => !r.pass);
console.log(`${results.length - failures.length}/${results.length} passed`);
if (failures.length) {
console.error('FAILED:', failures.map(f => f.name).join(', '));
process.exit(1);
}
await browser.close();Step 6 — Let Claude Code run it
With Playwright MCP connected:
/goal Verify the cart summary invariants. Navigate to localhost:3000/verify/dashboard.html,
read the verification manifest from the DOM, run all invariants, and report any failures.Claude will navigate to the dashboard, read the data attributes, run the checks, and report exactly what failed and what the values were.
What to add next
- More invariants: add one per interaction you care about (filter counts, pagination, form validation state)
- Video recording: in CI, use Playwright’s
recordVideooption to capture each verification run - Agent auto-fix: when an invariant fails, have Claude Code read the failure details and propose a fix
Where to go next
- Claude Code Best Practices — CLAUDE.md, skills, hooks, and the context setup that makes long-running agents reliable
- Evals for Agents — the grader patterns that extend these verification ideas to agent output quality
- Tariq’s repo (referenced in the talk) at the CWC Workshops GitHub org for the full verification framework code