Stop Babysitting Your Agents: From Approval Mode to Orchestration

1. What “babysitting” actually means

When most people complain about Claude Code being slow, what they mean is that they are slow. The model isn’t blocking them — they are blocking the model. Every “Allow this command?” dialog, every “Does this look right?” pause, every task abandoned mid-run because it took a wrong turn and nobody was watching: these are symptoms of the same root problem.

You’re treating an agent like an intern who needs your eyes on every move.

An intern needs babysitting because you don’t trust their judgment and can’t verify their work without inspecting it yourself. An agent should graduate past that. Not because agents are infallible — they aren’t — but because most of the things you’re approving are already settled: the test command is always safe, prettier is always safe, a read is always safe. Approving those by hand is not risk management. It’s theater.

The shift the talk is about is architectural. You stop giving your agent a raw capability and demanding permission each time, and you start engineering a harness where:

Safe operations are pre-approved at the settings layer — Claude never has to ask.
Risky operations are blocked by hooks — you don’t rely on Claude’s judgment for those.
Verification is embedded in the task — Claude can check its own work and keep going.
Parallelism is the default — independent work streams run concurrently, not sequentially.

The rest of this lesson is how to build each of those four properties.

2. Permission architecture: approve once, not every time

Claude Code’s permission model has three layers, and understanding all three is the foundation of autonomous operation.

Layer 1: settings.json — the policy file

.claude/settings.json (project-level) and ~/.claude/settings.json (user-level) hold permanent permission rules. Anything in allow never prompts. Anything in deny is blocked outright.

{
  "permissions": {
    "allow": [
      "Bash(npm run *)",
      "Bash(git diff:*)",
      "Bash(git log:*)",
      "Bash(git status)",
      "Bash(gh pr view:*)",
      "Edit"
    ],
    "deny": [
      "Bash(git push --force*)",
      "Bash(rm -rf *)",
      "Bash(curl * | bash)"
    ]
  }
}

The pattern syntax matters: Bash(npm run *) allows any npm run subcommand; Bash(npm run build) allows only that exact command. Prefer the wildcard for your test and build runners — they’re always safe, their arguments vary, and you don’t want to add new script names to the allow list manually.

Commit the project settings file. It’s not personal preference; it’s the team’s agreed policy. Personal overrides go in the user-level file, which stays on your machine.

Layer 2: auto mode — eliminate the dialog loop

The single highest-leverage setting for reducing interruptions is not in the settings file — it’s the mode flag. Running Claude Code in auto mode (claude --dangerously-skip-permissions in older docs; now claude --auto) tells it to use the settings file as the complete policy without prompting for anything not already decided.

In practice, the sequence is: write an allow/deny policy, enable auto mode, and monitor the first few runs to confirm the policy is right. After a week you’ll have far more confidence in what to trust than you’d get from approving each action individually.

Layer 3: hooks — enforcement, not advice

Where the settings file handles known-good and known-bad patterns, hooks handle the gray zone. A pre-tool-use hook receives the tool call before it executes and can block it — with a message back to Claude about why.

# .claude/hooks/pre-tool-use.sh
# Block any bash command that includes production environment names
TOOL_INPUT=$(cat)
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.command // ""')

if echo "$COMMAND" | grep -Eq "(prod|production|main-db)"; then
  echo "BLOCK: Commands referencing production environments require manual approval." >&2
  exit 2   # exit code 2 = block the action and tell Claude
fi
exit 0     # exit 0 = allow

The critical distinction: a settings-file rule is a pattern match on the tool name. A hook is executable code that can read the content of what Claude is about to do. You can block based on arguments, file paths, environment variables — anything you can inspect in a shell script.

Three-layer permission architecture. Settings files handle pattern-matched allow/deny. Hooks enforce content-aware rules. The result: Claude acts freely within the zone, and dangerous operations never execute unreviewed.

3. Verification-first task design

Permission architecture answers the question “what can Claude do without asking?” but it doesn’t answer “how does Claude know when it’s done?” Those are separate problems, and the second one drives most babysitting behavior.

If Claude can’t verify its own work, you become the verifier. You watch the output, read the diff, run the tests yourself, and give the thumbs up. That’s exactly babysitting.

The lever is designing tasks so verification is built in.

The test signal

Tests are the canonical example because their pass/fail state is unambiguous. When you frame a task as “implement X until the tests pass”, Claude can run the tests itself, see the failures, fix its work, and re-run — without any input from you. The verification signal is built into the task definition.

Three practices amplify this:

Write failing tests first. When you provide the test suite before the implementation, Claude has a target to run against from step one. The task doesn’t complete until green. You don’t need to be there.

Include the verification command in the task. Don’t just say “add auth”. Say “add JWT auth to the /api/ routes. Run npm test after each file change. The task is done when all tests pass and npm run typecheck exits clean.” Now Claude has an objective pass condition.

Name the build artifacts. “Build the production bundle and verify the output is under 200KB” gives Claude a checkable exit criterion. “Build it” does not.

Beyond tests: other verification signals

Not all work has tests. But most work has something checkable:

Type checks — tsc --noEmit exits 0 or 1. Zero is done.
Lint — eslint src/ --max-warnings 0. Deterministic and silent in CI.
Screenshot comparison — for UI work, Playwright can take before/after screenshots and compare them. Claude can use the Playwright MCP to run the browser itself.
File size / line count — crude but useful constraints (“the refactor should not increase the line count of auth.ts”).
Custom scripts — the npm test slot in your project can be anything. A verification script that checks domain-specific invariants is just as good as a unit test suite.

Check your understanding

3 questions · your answers are saved in this browser only

1. Which task framing is most likely to let Claude finish without human input?

Option B gives Claude a machine-checkable exit condition. Claude can run the commands, see the results, and iterate without asking. The other options either lack a verifiable goal or explicitly require human involvement.
2. Why are pre-approved Bash commands in settings.json better than approving them one by one?

Every manual approval is a context switch for you and a pause in the agent's work. Pre-approving safe operations in the settings file eliminates that overhead — the policy is decided once, not on every invocation.
3. What is the key difference between a settings.json deny rule and a pre-tool-use hook?

A deny rule like `Bash(rm -rf *)` does pattern matching on the command string. A hook runs a shell script that receives the full tool input and can block based on argument values, environment context, or any logic you write.

4. Parallel sessions and git worktrees

Once agents can run unsupervised on a single task, the next bottleneck is running them one at a time. Most real projects have independent work streams — separate features, separate modules, separate branches — that could run concurrently. Running them sequentially means you multiply the wait.

Multiple terminal sessions

The simplest parallel setup is multiple terminal windows. Claude Code sessions are independent; there’s no conflict in running two or three at once against the same codebase, as long as they don’t edit the same files. If you’re backfilling tests for module A in one session and writing a new feature in module B in another, those are genuinely parallel.

The constraint is file-level conflicts. Two sessions editing the same file will produce a mess.

Git worktrees: true isolation

For work that might touch overlapping files, git worktrees are the right tool. A worktree is a separate filesystem checkout of your repo on a separate branch, sharing the same git history but isolated at the working-directory level.

# Create a worktree for a new feature branch
git worktree add ../myproject-feature feature/auth-refresh

# Terminal A: work on the feature
cd ../myproject-feature && claude

# Terminal B: keep working on main
cd ../myproject && claude

Both sessions have Claude Code running. Both have their own working directory, their own branch, their own ability to run tests and build tools. They cannot conflict at the filesystem level.

Two Claude Code sessions using git worktrees — same repo, same git history, fully isolated working directories. Each session can build, test, and commit independently.

The adversarial reviewer pattern

A particularly effective worktree configuration is what Anthropic engineers call the adversarial reviewer: one Claude session implements a change and commits it; a second session in a review-only worktree reads the diff and critiques it — looking for edge cases, missing tests, or simpler alternatives. The reviewer has no attachment to the author’s choices, which is exactly what you want from a code review.

# In the review worktree:
Review the latest commit on feature/auth-refresh.
Act as a skeptical senior engineer. Find: correctness issues, missing edge cases,
violations of our CLAUDE.md conventions, and simpler alternatives.
Be specific — cite line numbers and file names.

This is a fundamentally different way of using agent parallelism: not just for throughput, but for adversarial quality.

5. Headless mode: agents without humans

The purest form of not babysitting is removing yourself from the loop entirely. Claude Code’s headless mode — claude -p "<prompt>" — runs a one-shot task non-interactively. It reads its input, uses its tools, produces output, and exits. No UI. No approvals. No waiting.

Basic headless use

# One-shot analysis
claude -p "Read the last 10 error logs in /var/log/app.log and summarize the root cause" \
  --output-format text

# Piping data through Claude
cat failing-tests.txt | claude -p "Classify each test failure: flaky, logic error, or environment issue"

# Using it as part of a pipeline
git diff HEAD~1 | claude -p "Write a one-paragraph summary of what changed" >> CHANGELOG.md

The --output-format flag controls how the response is emitted. text gives you the raw response; json gives you a structured response you can pipe into jq.

Pre-commit hooks

One of the most powerful headless integrations is the git pre-commit hook. A pre-commit runs synchronously before every commit — if it exits non-zero, the commit is blocked.

#!/bin/bash
# .git/hooks/pre-commit (or via husky)

STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.(ts|tsx|js)$')

if [ -z "$STAGED_FILES" ]; then
  exit 0
fi

echo "Running Claude Code security review on staged files..."
REVIEW=$(echo "$STAGED_FILES" | claude -p "
  Review these staged files for security issues only: hardcoded secrets,
  SQL injection vectors, XSS sinks. List any findings with file:line.
  If none found, output exactly: LGTM
" --output-format text --allowedTools "Read")

if [ "$REVIEW" != "LGTM" ]; then
  echo "Security review flagged:"
  echo "$REVIEW"
  exit 1
fi

This is the permission architecture concept applied to the code pipeline: Claude checks its own work (or the work it’s about to commit), using a narrowly-scoped tool list (--allowedTools "Read"), without any human approval step.

CI and scheduled runs

Headless mode works equally well in CI. A GitHub Actions step can run Claude Code the same way any other shell command runs:

- name: Claude Code triage
  run: |
    claude -p "
      Read the new issue at $ISSUE_URL using the gh tool.
      Add one of these labels: bug, feature, question, duplicate.
      Add a one-sentence comment explaining the classification.
    " \
    --allowedTools "Bash(gh:*)" \
    --output-format text
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

For even less human involvement, cron-based runs handle recurring work — nightly dependency update checks, weekly dead-code scans, daily changelog generation from merged PRs.

Check your understanding

2 questions · your answers are saved in this browser only

1. What is the primary purpose of passing --allowedTools in a headless claude -p call?

In headless/CI contexts there is no human to catch a runaway tool call. Scoping --allowedTools to exactly what the task needs limits what can go wrong if the prompt misbehaves or is injected with unexpected input.
2. Which of these is a valid use of headless mode (claude -p)?

The pre-commit hook runs Claude non-interactively, produces structured output (LGTM or a list of issues), and blocks the commit if issues are found — exactly what headless mode is for.

6. Putting it together: the orchestration mindset

The four patterns in this lesson — permission architecture, verification-first task design, parallel sessions, and headless automation — are each useful individually. But they compound when you apply them together, because they address the same underlying shift: from treating Claude as a tool you supervise to treating it as a team member you’ve equipped and trusted.

That shift requires a change in what you invest time in. You spend less time approving and watching, and more time on:

Writing good task specs — precise exit conditions, verification commands, scoped tool access
Maintaining the policy files — the allow/deny rules that codify your team’s risk tolerance
Reviewing outcomes, not steps — reading diffs and PR descriptions, not approving each action
Routing work — deciding what tasks run in which sessions, in what order, in parallel or sequence

This is what the talk means by “orchestrating” rather than “babysitting”. An orchestrator designs the system, monitors the outcomes, and intervenes at decision points. They don’t watch every tool call.

Babysitting mode vs orchestration mode. In babysitting mode, every agent action requires human approval — the human is in the critical path. In orchestration mode, policies, hooks, and verification signals handle the routine; humans intervene only at decision points.

Check your understanding

1 question · your answers are saved in this browser only

1. The adversarial reviewer pattern runs two Claude Code sessions against the same repo. What does the second session do?

The adversarial reviewer session has no attachment to the first session's code and no knowledge of why it was written that way — which makes it a better reviewer than the author session. It critiques, the author session revises.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~25 min

Prerequisites

Claude Code installed and authenticated
A project with a test command
Basic familiarity with git

Step 1 — Audit your current approval patterns

Start a Claude Code session and run a typical task. Note every time you’re prompted to approve a command. After the task, answer:

Which of these would you always approve without reading? Those go in the allow list.
Which require you to actually look at what Claude is doing? Those stay as-is or get hooks.
Were any dangerous enough that you want them blocked unconditionally? Those go in the deny list.

This audit typically reveals that 80%+ of approvals are reflexive yes-clicks.

Step 2 — Write your settings.json policy

Create .claude/settings.json in your project:

{
  "permissions": {
    "allow": [
      "Bash(npm run *)",
      "Bash(npx tsc *)",
      "Bash(git diff*)",
      "Bash(git log*)",
      "Bash(git status)",
      "Bash(git add *)",
      "Bash(git commit *)",
      "Edit",
      "Read"
    ],
    "deny": [
      "Bash(git push --force*)",
      "Bash(rm -rf *)",
      "Bash(curl * | sh)",
      "Bash(sudo *)"
    ]
  }
}

Adjust the lists based on your step 1 audit. Commit this file.

Step 3 — Write one hook for content-aware blocking

Create .claude/hooks/pre-tool-use.sh and make it executable:

#!/bin/bash
# Block any command that references production database names

TOOL_INPUT=$(cat)
TOOL_NAME=$(echo "$TOOL_INPUT" | jq -r '.tool_name // ""')
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.tool_input.command // ""')

# Only check Bash tool calls
if [ "$TOOL_NAME" != "Bash" ]; then
  exit 0
fi

# Block production references
if echo "$COMMAND" | grep -qiE "(prod-db|production\.db|prod_database)"; then
  echo "Hook blocked: command references production database. Use staging-db instead." >&2
  exit 2
fi

exit 0

Wire it in .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [{"command": "bash .claude/hooks/pre-tool-use.sh"}]
  }
}

Run a test: ask Claude to run a query against prod-db. The hook should block it.

Step 4 — Frame a task with an explicit verification exit condition

Give Claude a task with observable pass criteria:

Add input validation to the signup form in src/components/SignupForm.tsx.
Validation rules: email must match RFC 5321, password must be at least 12
characters and contain one symbol.
After each change, run: npm test -- --testPathPattern=SignupForm
The task is complete when all tests pass and npx tsc --noEmit exits 0.
Don't ask me for input — run autonomously until both commands pass.

Do not watch the session. Come back when it’s done. Review the diff.

Step 5 — Set up a parallel worktree session

While the session from step 4 is running, open a new terminal:

git worktree add ../$(basename $PWD)-review -b review-$(date +%Y%m%d)
cd ../$(basename $PWD)-review && claude

Give this session a completely different task — something independent. Run both in parallel. When the step 4 session finishes and commits, give the review session:

Review the latest commit from the main worktree. Act as a skeptical senior
engineer. Look for: missing edge cases, test gaps, and convention violations
from our CLAUDE.md. Be specific with file names and line numbers.

Step 6 — Write one headless automation

Pick something you do manually on a recurrence: reviewing new issues, checking for console errors in logs, checking if dependencies have updates. Write it as a headless command:

# Save as scripts/daily-triage.sh
#!/bin/bash
claude -p "
  Use the gh tool to list all GitHub issues opened in the last 24 hours in
  this repo. For each: classify it as bug, feature, or question. Add the
  appropriate label. If it's a bug, add a comment asking for reproduction steps.
" \
--allowedTools "Bash(gh:*)" \
--output-format text

Run it once manually to verify it works. Then add it to a cron job or a GitHub Actions scheduled workflow. You’ve just removed yourself from a recurring task entirely.

Where to go next

Watch the original talk for live demos of the workflows described here.
Hooks in Claude Code covers the hook lifecycle in detail — the five events, what input each receives, and how to return structured feedback to Claude.
How We Claude Code from Anthropic’s Applied AI team covers verification-native design in depth, including DOM contracts and Playwright MCP.
Claude Code Best Practices for the context management habits that complement autonomous operation.

Stop Babysitting Your Agents: From Approval Mode to Orchestration

1. What “babysitting” actually means

2. Permission architecture: approve once, not every time

Layer 1: settings.json — the policy file

Layer 2: auto mode — eliminate the dialog loop

Layer 3: hooks — enforcement, not advice

3. Verification-first task design

The test signal

Beyond tests: other verification signals

Check your understanding

4. Parallel sessions and git worktrees

Multiple terminal sessions

Git worktrees: true isolation

The adversarial reviewer pattern

5. Headless mode: agents without humans

Basic headless use

Pre-commit hooks

CI and scheduled runs

Check your understanding

6. Putting it together: the orchestration mindset

Check your understanding

Build it yourself

Step 1 — Audit your current approval patterns

Step 2 — Write your settings.json policy

Step 3 — Write one hook for content-aware blocking

Step 4 — Frame a task with an explicit verification exit condition

Step 5 — Set up a parallel worktree session

Step 6 — Write one headless automation

Where to go next

Related lessons

Claude Code Best Practices: The Field Guide

Beyond the Basics with Claude Code

AI with Claude on AWS: From Code to Orchestration

1. What “babysitting” actually means

2. Permission architecture: approve once, not every time

Layer 1: settings.json — the policy file

Layer 2: auto mode — eliminate the dialog loop

Layer 3: hooks — enforcement, not advice

3. Verification-first task design

The test signal

Beyond tests: other verification signals

🧠 Check your understanding

4. Parallel sessions and git worktrees

Multiple terminal sessions

Git worktrees: true isolation

The adversarial reviewer pattern

5. Headless mode: agents without humans

Basic headless use

Pre-commit hooks

CI and scheduled runs

🧠 Check your understanding

6. Putting it together: the orchestration mindset

🧠 Check your understanding

🛠️ Build it yourself

Step 1 — Audit your current approval patterns

Step 2 — Write your settings.json policy

Step 3 — Write one hook for content-aware blocking

Step 4 — Frame a task with an explicit verification exit condition

Step 5 — Set up a parallel worktree session

Step 6 — Write one headless automation

Where to go next

Related lessons

Claude Code Best Practices: The Field Guide

Beyond the Basics with Claude Code

AI with Claude on AWS: From Code to Orchestration

Check your understanding

Check your understanding

Check your understanding

Build it yourself