AI Learning
intermediate ⏱️ 10 min read · 🎬 ~18 min video

A Year of Claude Code: Auto Mode, Loops, and What Actually Surprised Us

Boris Cherny and Cat Wu reflect on Claude Code's first year — what changed about verification, why auto mode beat plan mode, how routines became the killer feature, and where engineering orgs are heading.

This lesson is original educational writing based on this video by Claude (published June 8, 2026). All credit for the original content goes to the creators.

#claude-code #best-practices #agentic-workflows
Video thumbnail: A Year of Claude Code: Auto Mode, Loops, and What Actually Surprised Us
Original video — all credit to the creators. Watch the original on YouTube ↗

1. How far we’ve come in a year

When Claude Code shipped, it generated two emoji reactions on Slack. Boris Cherny (Head of Claude Code) posted the announcement and felt “it was pretty good for easy engineering tasks.” Cat Wu (Head of Product) puts it more bluntly: “That was a really nice way to say it wasn’t very good.”

A year later, Boris describes his daily workflow as “armies of agents, trees of thousands, where one agent is prompting agents that are prompting agents.” That’s not hyperbole — it’s the operational reality for the team that built the product. This lesson extracts the mental model shifts that made that leap possible.

2. Auto mode replaced plan mode — and why that matters

The early Claude Code required a human to approve almost every tool call. This was the right call in 2024: models were less aligned, classifiers didn’t exist, and the risk of unreviewed actions was real. So plan mode emerged: Claude proposes, human approves, Claude executes.

But human nature undermined it. When you accept 99% of requests, you stop reading them. Eyes glaze over. The permission prompt that was supposed to be a safety check became theater.

Permission PromptsEvery tool callrequires approval99% accepted→ eyes glaze overLow alignment eraPlan ModeClaude plans first,human reviews plan,then executesBetter for Opus 4–4.5Still requires attentionAuto ModeClassifier routes eachaction to a modelSuspicious → deniedHuman sees onlythe most importantSonnet 4.6+ · red-teamed
The evolution of the Claude Code interaction model. Each generation handed more autonomy to the model — not by removing safety, but by making safety smarter.

Auto mode replaced plan mode for a counterintuitive reason: it’s safer. Instead of a human rubber-stamping 99% of requests, a classifier trained on thousands of transcripts evaluates every action. Red teamers tried to prompt-inject; those attacks became evals; the classifier was hardened against all of them.

Boris’s advice: use auto mode for everything from Sonnet 4.6 onward. The model no longer needs a planning step — it can work autonomously and only surfaces the decisions that actually need you.

3. Verification is not tests or linting

“Whenever we talk about verification, people think unit tests, lint, type check. These were already automated. That’s not what we mean.”

Real agent verification asks a different question: can the agent run the thing? Not “does the code pass CI” — but “can Claude launch the app, click through the new UX, hit the edge case, and report what it saw?”

On the Claude Code desktop app team, this looks like:

  1. A skill teaches Claude how to launch the local desktop app
  2. Claude uses computer use to navigate the new UI
  3. Claude tests edge cases and reports failures
  4. When it hits staging issues, it reads Slack to check whether staging is already known-down before wasting time debugging

The key difference: verification is runtime observation at the product’s actual surface, not a pass/fail signal from test infrastructure. Tests tell you the code does what you wrote. Verification tells you the product does what users experience.

Check your understanding

3 questions · your answers are saved in this browser only

  1. 1. Why is auto mode considered MORE secure than manually reviewing every permission prompt?

  2. 2. What does "verification" mean in the context of agentic coding — according to Boris and Cat?

  3. 3. What is a "routine" in Claude Code?

4. Routines: the killer feature nobody expected

The most surprising moment in the video: Boris is working on a bug when Claude Code tells him another agent already fixed it. He never talked to that engineer about the feature. The engineer had set up a routine — a persistent loop that watches for unresponded bug reports older than 5 hours and opens fixes automatically.

Routines are the first killer application of the Agent SDK. They flip the model from reactive (you ask, Claude answers) to proactive (Claude monitors, Claude acts). Examples from the Anthropic team:

  • Voice mode routine: Every GitHub issue about voice mode → Claude drafts a fix → engineer gets pinged
  • Feedback digest: Every morning, Claude aggregates user feedback from Slack channels and surfaces themes
  • PR babysitter: Claude watches every open PR, responds to review comments, fixes CI, rebases automatically

Boris’s framing: “I used to have to respond to code review comments. I used to have to fix CI. I haven’t done that in a long time.”

5. Context minimalism

There’s a direct evolution in how teams give Claude context:

  • Sonnet 3.5 era: Prompt engineering — carefully craft every word of the system prompt
  • Opus 4 era: Context engineering — feed Claude everything it might need
  • Current models: Context minimalism — give the model only what it needs and let it figure out the rest

Cat Wu’s principle: “Tell the model only what it needs to know. When you give the model too much context, it’s like micromanaging. Sometimes the model knows a better way to get to the same outcome.” The Claude Code harness itself is getting leaner so you have more room for your own prompts.

The practical implication: don’t pre-specify every step. Give Claude the goal, the tools to pull in relevant context, and the verification method. Trust it to figure out the path.

6. Roles are merging — faster than expected

On the Claude Code team:

  • Designers fix button styling directly via PRs
  • PMs prototype features and merge them
  • The finance team runs projections in Claude Code
  • Data scientists have Claude Code open all day

Fiona Fung (Head of Claude Code Engineering) frames it as a bottleneck shift: for years, engineering bandwidth was the scarce resource, so every process was built to protect it. Now that coding bandwidth is abundant, the new bottlenecks are verification, review, and product taste.

Boris adds the sharper point: “What matters a little more is the idea you have. If you have the product context, the business context, and you’re thinking about design and the user — you’ll come up with better ideas.” The people with good judgment about what to build are the constraint.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~45 minutes

Prerequisites

  • Claude Code installed and configured
  • A repo with at least one recurring operational task (code review, bug triage, dependency bumps)

Turn one of your repeating manual tasks into a routine that Claude Code runs autonomously.

Step 1 — Pick your target workflow

Choose something you do repeatedly that fits a pattern:

  • Reviewing PRs at the end of the day
  • Checking for unanswered bug reports each morning
  • Summarizing feedback from a Slack channel or issue tracker

Step 2 — Write it as a skill

Create .claude/skills/my-routine.md describing the task in detail:

# Morning feedback digest

## Purpose
Read all new GitHub issues and PR comments opened since yesterday.
Summarize by theme: bugs vs feature requests vs questions.
Flag anything that looks like a regression.

## Steps
1. Use gh to list issues opened in the last 24h
2. Read each issue body
3. Group by theme (bug / feature / question)
4. Note any that reference functionality that shipped recently
5. Write a summary to stdout

## Output format
- One bullet per theme with count
- Flag regressions with ⚠️
- Keep it under 200 words

Step 3 — Test it manually

Run it once with Claude Code to make sure the output is useful:

claude "Run the morning-feedback-digest skill"

Iterate on the skill file until the output is something you’d actually read.

Step 4 — Wire it to a trigger

For a time-based trigger, add a cron job:

# .github/workflows/feedback-digest.yml
on:
  schedule:
    - cron: '0 8 * * 1-5'   # 8am Monday–Friday

jobs:
  digest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: claude --non-interactive "Run the morning-feedback-digest skill" >> $GITHUB_STEP_SUMMARY
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

For an event trigger (new issue, PR opened), use a push or issues trigger instead of schedule.

Step 5 — Add a verification step

Routines are only trustworthy if they can self-check. After the routine runs, add a line that asks Claude to confirm it did what it was supposed to:

claude "Run morning-feedback-digest, then verify you read at least 1 issue opened after $(date -d yesterday +%Y-%m-%d)"

What to watch for

  • False positives: the routine flags things that aren’t actually problems → tighten the skill’s criteria
  • Silent failures: routine runs but produces nothing → add a check that errors when output is empty
  • Scope creep: the routine starts touching things outside its purpose → add explicit constraints in the skill file

Where to go next

Related lessons

intermediate 🎬 Anthropic · ~30 min

Fable 5 and the AI-Native Company

What Fable 5's capabilities unlock, how dynamic workflows reshape engineering at scale, and what it looks like when a company runs on an AI substrate.

#best-practices #agentic-workflows #claude-code
intermediate 🎬 Anthropic · ~27 min

Agent Harness Engineering: Chasing Friction

AirOps's hard-won lessons from shipping Claude agents to non-technical enterprise users: intentional scoping, specialized tools over primitive exploration, and sub-agents for context isolation.

#agentic-workflows #best-practices