How We Claude Code: HTML Specs, Agent Interviews, and Verification-Native Artifacts

1. The bitter lesson applied to specs

Richard Sutton’s “bitter lesson” in machine learning: researchers spend years encoding human knowledge into systems, but pouring more data and compute beats every handcrafted approach. The lesson keeps repeating.

Arno, from Anthropic’s Applied AI team, applies the same idea to spec-writing: “The models are becoming more capable. You should accept that the model is probably better at extracting requirements from you than you are at defining your requirements.”

Your requirements are latent. You know what you want when you see it, but you’re often not great at articulating it upfront. Claude can be a better interviewer than you are a spec-writer — if you prompt it correctly.

2. Letting Claude interview you

Bad prompt: “Just make it better.”

Good prompt structure:

Name the domain (not the outcome)
Ask Claude to iterate turn-by-turn
Explicitly instruct Claude to use the ask_user_question tool to drive the interview

Example starter:

I want to build a bill-splitting app for going out with friends.
Don't build anything yet.
Interview me to understand my requirements.
Focus on: audience, core use cases, edge cases, and what "good" looks like.
Use the ask_user_question tool to ask one question at a time.

Claude will then ask progressively more specific questions — primary audience, secondary audience, how items get added, how splitting rules work — without you having to anticipate those dimensions yourself. Each answer unlocks the next question.

The resulting spec is better than what you’d write from scratch, because the model has seen thousands of similar apps and knows what questions get asked.

3. HTML specs over Markdown

Once you have a spec, the workshop makes a surprising argument: write it as an HTML file, not a Markdown file.

The reason is density. A Markdown spec that reaches 200 lines becomes unlikely to be read carefully — by you or your colleagues. HTML can pack more information into less space, embed interactive examples, and crucially: show you what the thing will look like instead of describing it.

For a UI project, Arno had Claude generate four different design directions — brutalist, Tokyo fintech, minimalist, bold — as interactive HTML files. Each could be clicked through, compared side by side, and fed back to Claude with a screenshot. The feedback loop is much tighter than trying to describe visual preferences in words.

The spec-to-build-to-verify loop. HTML specs replace long Markdown files at the plan stage, making the spec richer for both humans and agents. Verification is then embedded into the artifact itself.

4. Verification-native artifacts

The third practice is the most technically concrete: embed verification into the artifact itself so both humans and agents can run it without external tooling.

The demo uses a to-do app with a key design decision: each component publishes its state to the DOM as data attributes.

<!-- Instead of reading React internal state... -->
<div 
  data-verify-unit="total-stats"
  data-total="7"
  data-done="3"
  data-active="4"
>
  7 items · 3 done · 4 active
</div>

When an agent needs to verify that the totals are correct, it doesn’t have to scrape the visual display or hook into React internals. It reads the data attributes directly — the same way an automated test would, but accessible to any agent with browser access.

Check your understanding

3 questions · your answers are saved in this browser only

1. Why should you use data attributes (like data-verify-unit) to publish component state to the DOM?

Publishing state to the DOM as data attributes creates a stable contract. You can change the visual presentation entirely and the verification still works, because it reads the contract, not the rendered output.
2. What is the primary argument for using HTML files over Markdown for specs?

Density and reviewability. A long Markdown spec is rarely read carefully. An HTML mockup can be clicked through, screenshotted, and compared visually — all feedback channels that improve spec quality.
3. The workshop shows verification running three ways. Which of these is NOT one of them?

The three surfaces are: human dashboard, agent-driven browser (Playwright MCP reading data attributes), and headless CLI verification. PR comments from CI are a deployment step, not one of the three verification surfaces shown.

5. Three verification surfaces, one contract

Once you’ve embedded state in the DOM, verification can run three ways against the same contract:

Surface 1 — Human dashboard: A custom verification page that runs all invariants and displays pass/fail with the specific values that caused failures. The workshop plants a deliberate bug (3+4 ≠ 10) and the dashboard catches it immediately.

Surface 2 — Agent-driven browser: Playwright MCP connects to the running app. The agent reads the manifest of verification steps from the DOM, replays them, and can even record video clips of each step. The recording is the evidence — it can be shared with colleagues or stored in S3 as proof that verification ran.

Surface 3 — Headless CI: bun verify or equivalent runs the same invariants from the command line. No browser needed. This is what runs in CI on every push.

The key insight: you build the verification once (as DOM contracts), and it runs everywhere. The agent-recorded clips also serve a documentation function — when the Claude Code team ships at high velocity, the video clips prove what changed and that it still works.

6. Practical guidance on models and modes

A few specific recommendations from the workshop:

Use auto mode. If you’re not in auto mode, switch. It removes the constant approval interruptions that break long-running agent flows.
Set effort to x-high. For complex specs and verification setup: /effort x-high before the task.
Use Opus 4.7 for vision-heavy work. The vision model in Opus 4.7 is significantly better than Sonnet for interpreting screenshots and comparing layouts. For spec generation that involves visual feedback, use Opus.
Fast mode for spec iteration. Fast mode costs more per token but speeds output — worth it when you’re iterating quickly on design directions or spec content.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~60 minutes

Prerequisites

Claude Code with Playwright MCP configured
A simple web app (React, vanilla JS, anything with a DOM)
Node.js / bun for running headless verification

Add DOM verification contracts to an existing app and wire three surfaces to them.

Step 1 — Pick a component with verifiable state

Choose one component whose correctness you care about: a cart total, a filter count, a form field validation state, a pagination offset. Anything where you can express invariants like “total = sum of items” or “active = total - done.”

Step 2 — Publish state to the DOM

Add data attributes to the container element. Keep the names consistent and human-readable:

// React example
function CartSummary({ items }) {
  const total = items.reduce((sum, i) => sum + i.price, 0);
  const itemCount = items.length;

  return (
    <div
      data-verify-unit="cart-summary"
      data-item-count={itemCount}
      data-total-cents={Math.round(total * 100)}
    >
      {itemCount} items · ${total.toFixed(2)}
    </div>
  );
}

Step 3 — Write the invariants

Create a verify/invariants.js file:

export const invariants = [
  {
    name: "cart-total-matches-items",
    description: "Cart total equals sum of individual item prices",
    check: () => {
      const summary = document.querySelector('[data-verify-unit="cart-summary"]');
      const items = document.querySelectorAll('[data-verify-unit="cart-item"]');
      const expectedCents = [...items].reduce(
        (sum, el) => sum + parseInt(el.dataset.priceCents), 0
      );
      const actualCents = parseInt(summary.dataset.totalCents);
      return {
        pass: expectedCents === actualCents,
        expected: expectedCents,
        actual: actualCents,
      };
    },
  },
];

Step 4 — Build the human dashboard

Create verify/dashboard.html — a standalone page that imports your app and runs the invariants:

<script type="module">
  import { invariants } from './invariants.js';

  const results = invariants.map(inv => ({
    name: inv.name,
    description: inv.description,
    result: inv.check(),
  }));

  // Render a table of pass/fail with values
  const html = results.map(r => `
    <tr class="${r.result.pass ? 'pass' : 'fail'}">
      <td>${r.name}</td>
      <td>${r.result.pass ? '✅' : '❌'}</td>
      <td>${JSON.stringify(r.result)}</td>
    </tr>
  `).join('');

  document.getElementById('results').innerHTML = html;
</script>

Step 5 — Wire headless verification

# verify/run.mjs
import puppeteer from 'puppeteer';

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('http://localhost:3000/verify/dashboard.html');

const results = await page.evaluate(() => {
  return [...document.querySelectorAll('tr[class]')].map(tr => ({
    name: tr.querySelector('td:first-child').textContent,
    pass: tr.classList.contains('pass'),
  }));
});

const failures = results.filter(r => !r.pass);
console.log(`${results.length - failures.length}/${results.length} passed`);
if (failures.length) {
  console.error('FAILED:', failures.map(f => f.name).join(', '));
  process.exit(1);
}
await browser.close();

Step 6 — Let Claude Code run it

With Playwright MCP connected:

/goal Verify the cart summary invariants. Navigate to localhost:3000/verify/dashboard.html,
read the verification manifest from the DOM, run all invariants, and report any failures.

Claude will navigate to the dashboard, read the data attributes, run the checks, and report exactly what failed and what the values were.

What to add next

More invariants: add one per interaction you care about (filter counts, pagination, form validation state)
Video recording: in CI, use Playwright’s recordVideo option to capture each verification run
Agent auto-fix: when an invariant fails, have Claude Code read the failure details and propose a fix

Where to go next

Claude Code Best Practices — CLAUDE.md, skills, hooks, and the context setup that makes long-running agents reliable
Evals for Agents — the grader patterns that extend these verification ideas to agent output quality
Tariq’s repo (referenced in the talk) at the CWC Workshops GitHub org for the full verification framework code

How We Claude Code: HTML Specs, Agent Interviews, and Verification-Native Artifacts

1. The bitter lesson applied to specs

2. Letting Claude interview you

3. HTML specs over Markdown

4. Verification-native artifacts

Check your understanding

5. Three verification surfaces, one contract

6. Practical guidance on models and modes

Build it yourself

Step 1 — Pick a component with verifiable state

Step 2 — Publish state to the DOM

Step 3 — Write the invariants

Step 4 — Build the human dashboard

Step 5 — Wire headless verification

Step 6 — Let Claude Code run it

What to add next

Where to go next

Related lessons

Running an AI-Native Engineering Org: What Changes When Coding Isn't the Bottleneck

Fable 5 and the AI-Native Company

A Year of Claude Code: Auto Mode, Loops, and What Actually Surprised Us

1. The bitter lesson applied to specs

2. Letting Claude interview you

3. HTML specs over Markdown

4. Verification-native artifacts

🧠 Check your understanding

5. Three verification surfaces, one contract

6. Practical guidance on models and modes

🛠️ Build it yourself

Step 1 — Pick a component with verifiable state

Step 2 — Publish state to the DOM

Step 3 — Write the invariants

Step 4 — Build the human dashboard

Step 5 — Wire headless verification

Step 6 — Let Claude Code run it

What to add next

Where to go next

Related lessons

Running an AI-Native Engineering Org: What Changes When Coding Isn't the Bottleneck

Fable 5 and the AI-Native Company

A Year of Claude Code: Auto Mode, Loops, and What Actually Surprised Us

Check your understanding

Build it yourself