A Year of Claude Code: Auto Mode, Loops, and What Actually Surprised Us

1. How far we’ve come in a year

When Claude Code shipped, it generated two emoji reactions on Slack. Boris Cherny (Head of Claude Code) posted the announcement and felt “it was pretty good for easy engineering tasks.” Cat Wu (Head of Product) puts it more bluntly: “That was a really nice way to say it wasn’t very good.”

A year later, Boris describes his daily workflow as “armies of agents, trees of thousands, where one agent is prompting agents that are prompting agents.” That’s not hyperbole — it’s the operational reality for the team that built the product. This lesson extracts the mental model shifts that made that leap possible.

2. Auto mode replaced plan mode — and why that matters

The early Claude Code required a human to approve almost every tool call. This was the right call in 2024: models were less aligned, classifiers didn’t exist, and the risk of unreviewed actions was real. So plan mode emerged: Claude proposes, human approves, Claude executes.

But human nature undermined it. When you accept 99% of requests, you stop reading them. Eyes glaze over. The permission prompt that was supposed to be a safety check became theater.

The evolution of the Claude Code interaction model. Each generation handed more autonomy to the model — not by removing safety, but by making safety smarter.

Auto mode replaced plan mode for a counterintuitive reason: it’s safer. Instead of a human rubber-stamping 99% of requests, a classifier trained on thousands of transcripts evaluates every action. Red teamers tried to prompt-inject; those attacks became evals; the classifier was hardened against all of them.

Boris’s advice: use auto mode for everything from Sonnet 4.6 onward. The model no longer needs a planning step — it can work autonomously and only surfaces the decisions that actually need you.

3. Verification is not tests or linting

“Whenever we talk about verification, people think unit tests, lint, type check. These were already automated. That’s not what we mean.”

Real agent verification asks a different question: can the agent run the thing? Not “does the code pass CI” — but “can Claude launch the app, click through the new UX, hit the edge case, and report what it saw?”

On the Claude Code desktop app team, this looks like:

A skill teaches Claude how to launch the local desktop app
Claude uses computer use to navigate the new UI
Claude tests edge cases and reports failures
When it hits staging issues, it reads Slack to check whether staging is already known-down before wasting time debugging

The key difference: verification is runtime observation at the product’s actual surface, not a pass/fail signal from test infrastructure. Tests tell you the code does what you wrote. Verification tells you the product does what users experience.

Check your understanding

3 questions · your answers are saved in this browser only

1. Why is auto mode considered MORE secure than manually reviewing every permission prompt?

Human attention is finite. Rubber-stamping 99% of prompts means you miss the 1% that are actually dangerous. A classifier trained on red-team attacks and real trajectories maintains constant vigilance without fatigue.
2. What does "verification" mean in the context of agentic coding — according to Boris and Cat?

Verification is runtime observation at the product surface — clicking around the UI, hitting the API endpoint, running the simulator. Tests and lint were already automated before agents existed.
3. What is a "routine" in Claude Code?

Routines are always-on autonomous loops. The voice mode engineer's routine listens for every issue filed about voice mode and proactively puts up fixes. Another routine fixes unresponded bug reports within 5 hours.

4. Routines: the killer feature nobody expected

The most surprising moment in the video: Boris is working on a bug when Claude Code tells him another agent already fixed it. He never talked to that engineer about the feature. The engineer had set up a routine — a persistent loop that watches for unresponded bug reports older than 5 hours and opens fixes automatically.

Routines are the first killer application of the Agent SDK. They flip the model from reactive (you ask, Claude answers) to proactive (Claude monitors, Claude acts). Examples from the Anthropic team:

Voice mode routine: Every GitHub issue about voice mode → Claude drafts a fix → engineer gets pinged
Feedback digest: Every morning, Claude aggregates user feedback from Slack channels and surfaces themes
PR babysitter: Claude watches every open PR, responds to review comments, fixes CI, rebases automatically

Boris’s framing: “I used to have to respond to code review comments. I used to have to fix CI. I haven’t done that in a long time.”

5. Context minimalism

There’s a direct evolution in how teams give Claude context:

Sonnet 3.5 era: Prompt engineering — carefully craft every word of the system prompt
Opus 4 era: Context engineering — feed Claude everything it might need
Current models: Context minimalism — give the model only what it needs and let it figure out the rest

Cat Wu’s principle: “Tell the model only what it needs to know. When you give the model too much context, it’s like micromanaging. Sometimes the model knows a better way to get to the same outcome.” The Claude Code harness itself is getting leaner so you have more room for your own prompts.

The practical implication: don’t pre-specify every step. Give Claude the goal, the tools to pull in relevant context, and the verification method. Trust it to figure out the path.

6. Roles are merging — faster than expected

On the Claude Code team:

Designers fix button styling directly via PRs
PMs prototype features and merge them
The finance team runs projections in Claude Code
Data scientists have Claude Code open all day

Fiona Fung (Head of Claude Code Engineering) frames it as a bottleneck shift: for years, engineering bandwidth was the scarce resource, so every process was built to protect it. Now that coding bandwidth is abundant, the new bottlenecks are verification, review, and product taste.

Boris adds the sharper point: “What matters a little more is the idea you have. If you have the product context, the business context, and you’re thinking about design and the user — you’ll come up with better ideas.” The people with good judgment about what to build are the constraint.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~45 minutes

Prerequisites

Claude Code installed and configured
A repo with at least one recurring operational task (code review, bug triage, dependency bumps)

Turn one of your repeating manual tasks into a routine that Claude Code runs autonomously.

Step 1 — Pick your target workflow

Choose something you do repeatedly that fits a pattern:

Reviewing PRs at the end of the day
Checking for unanswered bug reports each morning
Summarizing feedback from a Slack channel or issue tracker

Step 2 — Write it as a skill

Create .claude/skills/my-routine.md describing the task in detail:

# Morning feedback digest

## Purpose
Read all new GitHub issues and PR comments opened since yesterday.
Summarize by theme: bugs vs feature requests vs questions.
Flag anything that looks like a regression.

## Steps
1. Use gh to list issues opened in the last 24h
2. Read each issue body
3. Group by theme (bug / feature / question)
4. Note any that reference functionality that shipped recently
5. Write a summary to stdout

## Output format
- One bullet per theme with count
- Flag regressions with ⚠️
- Keep it under 200 words

Step 3 — Test it manually

Run it once with Claude Code to make sure the output is useful:

claude "Run the morning-feedback-digest skill"

Iterate on the skill file until the output is something you’d actually read.

Step 4 — Wire it to a trigger

For a time-based trigger, add a cron job:

# .github/workflows/feedback-digest.yml
on:
  schedule:
    - cron: '0 8 * * 1-5'   # 8am Monday–Friday

jobs:
  digest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: claude --non-interactive "Run the morning-feedback-digest skill" >> $GITHUB_STEP_SUMMARY
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

For an event trigger (new issue, PR opened), use a push or issues trigger instead of schedule.

Step 5 — Add a verification step

Routines are only trustworthy if they can self-check. After the routine runs, add a line that asks Claude to confirm it did what it was supposed to:

claude "Run morning-feedback-digest, then verify you read at least 1 issue opened after $(date -d yesterday +%Y-%m-%d)"

What to watch for

False positives: the routine flags things that aren’t actually problems → tighten the skill’s criteria
Silent failures: routine runs but produces nothing → add a check that errors when output is empty
Scope creep: the routine starts touching things outside its purpose → add explicit constraints in the skill file

Where to go next

Watch the original conversation — Boris and Cat’s back-and-forth is worth experiencing unedited
Claude Code Best Practices — the technical setup that makes autonomous work reliable
Running an AI-native Engineering Org — Fiona Fung on what changes when your whole team runs this way

A Year of Claude Code: Auto Mode, Loops, and What Actually Surprised Us

1. How far we’ve come in a year

2. Auto mode replaced plan mode — and why that matters

3. Verification is not tests or linting

Check your understanding

4. Routines: the killer feature nobody expected

5. Context minimalism

6. Roles are merging — faster than expected

Build it yourself

Step 1 — Pick your target workflow

Step 2 — Write it as a skill

Step 3 — Test it manually

Step 4 — Wire it to a trigger

Step 5 — Add a verification step

What to watch for

Where to go next

Related lessons

Running an AI-Native Engineering Org: What Changes When Coding Isn't the Bottleneck

Fable 5 and the AI-Native Company

Agent Harness Engineering: Chasing Friction

1. How far we’ve come in a year

2. Auto mode replaced plan mode — and why that matters

3. Verification is not tests or linting

🧠 Check your understanding

4. Routines: the killer feature nobody expected

5. Context minimalism

6. Roles are merging — faster than expected

🛠️ Build it yourself

Step 1 — Pick your target workflow

Step 2 — Write it as a skill

Step 3 — Test it manually

Step 4 — Wire it to a trigger

Step 5 — Add a verification step

What to watch for

Where to go next

Related lessons

Running an AI-Native Engineering Org: What Changes When Coding Isn't the Bottleneck

Fable 5 and the AI-Native Company

Agent Harness Engineering: Chasing Friction

Check your understanding

Build it yourself