Stop Babysitting Your Agents: From Approval Mode to Orchestration
The workflows Claude Code engineers use to stop hand-holding their AI and start orchestrating it — permission architecture, verification-first design, parallel fanout, and headless automation.
This lesson is original educational writing based on this video by Anthropic (published May 20, 2026). All credit for the original content goes to the creators.
1. What “babysitting” actually means
When most people complain about Claude Code being slow, what they mean is that they are slow. The model isn’t blocking them — they are blocking the model. Every “Allow this command?” dialog, every “Does this look right?” pause, every task abandoned mid-run because it took a wrong turn and nobody was watching: these are symptoms of the same root problem.
You’re treating an agent like an intern who needs your eyes on every move.
An intern needs babysitting because you don’t trust their judgment and can’t verify their work without inspecting it yourself. An agent should graduate past that. Not because agents are infallible — they aren’t — but because most of the things you’re approving are already settled: the test command is always safe, prettier is always safe, a read is always safe. Approving those by hand is not risk management. It’s theater.
The shift the talk is about is architectural. You stop giving your agent a raw capability and demanding permission each time, and you start engineering a harness where:
- Safe operations are pre-approved at the settings layer — Claude never has to ask.
- Risky operations are blocked by hooks — you don’t rely on Claude’s judgment for those.
- Verification is embedded in the task — Claude can check its own work and keep going.
- Parallelism is the default — independent work streams run concurrently, not sequentially.
The rest of this lesson is how to build each of those four properties.
2. Permission architecture: approve once, not every time
Claude Code’s permission model has three layers, and understanding all three is the foundation of autonomous operation.
Layer 1: settings.json — the policy file
.claude/settings.json (project-level) and ~/.claude/settings.json (user-level) hold permanent
permission rules. Anything in allow never prompts. Anything in deny is blocked outright.
{
"permissions": {
"allow": [
"Bash(npm run *)",
"Bash(git diff:*)",
"Bash(git log:*)",
"Bash(git status)",
"Bash(gh pr view:*)",
"Edit"
],
"deny": [
"Bash(git push --force*)",
"Bash(rm -rf *)",
"Bash(curl * | bash)"
]
}
}
The pattern syntax matters: Bash(npm run *) allows any npm run subcommand; Bash(npm run build)
allows only that exact command. Prefer the wildcard for your test and build runners — they’re
always safe, their arguments vary, and you don’t want to add new script names to the allow list
manually.
Commit the project settings file. It’s not personal preference; it’s the team’s agreed policy. Personal overrides go in the user-level file, which stays on your machine.
Layer 2: auto mode — eliminate the dialog loop
The single highest-leverage setting for reducing interruptions is not in the settings file — it’s
the mode flag. Running Claude Code in auto mode (claude --dangerously-skip-permissions in
older docs; now claude --auto) tells it to use the settings file as the complete policy without
prompting for anything not already decided.
In practice, the sequence is: write an allow/deny policy, enable auto mode, and monitor the first few runs to confirm the policy is right. After a week you’ll have far more confidence in what to trust than you’d get from approving each action individually.
Layer 3: hooks — enforcement, not advice
Where the settings file handles known-good and known-bad patterns, hooks handle the gray zone. A pre-tool-use hook receives the tool call before it executes and can block it — with a message back to Claude about why.
# .claude/hooks/pre-tool-use.sh
# Block any bash command that includes production environment names
TOOL_INPUT=$(cat)
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.command // ""')
if echo "$COMMAND" | grep -Eq "(prod|production|main-db)"; then
echo "BLOCK: Commands referencing production environments require manual approval." >&2
exit 2 # exit code 2 = block the action and tell Claude
fi
exit 0 # exit 0 = allow
The critical distinction: a settings-file rule is a pattern match on the tool name. A hook is executable code that can read the content of what Claude is about to do. You can block based on arguments, file paths, environment variables — anything you can inspect in a shell script.
3. Verification-first task design
Permission architecture answers the question “what can Claude do without asking?” but it doesn’t answer “how does Claude know when it’s done?” Those are separate problems, and the second one drives most babysitting behavior.
If Claude can’t verify its own work, you become the verifier. You watch the output, read the diff, run the tests yourself, and give the thumbs up. That’s exactly babysitting.
The lever is designing tasks so verification is built in.
The test signal
Tests are the canonical example because their pass/fail state is unambiguous. When you frame a task as “implement X until the tests pass”, Claude can run the tests itself, see the failures, fix its work, and re-run — without any input from you. The verification signal is built into the task definition.
Three practices amplify this:
Write failing tests first. When you provide the test suite before the implementation, Claude has a target to run against from step one. The task doesn’t complete until green. You don’t need to be there.
Include the verification command in the task. Don’t just say “add auth”. Say “add JWT auth
to the /api/ routes. Run npm test after each file change. The task is done when all tests
pass and npm run typecheck exits clean.” Now Claude has an objective pass condition.
Name the build artifacts. “Build the production bundle and verify the output is under 200KB” gives Claude a checkable exit criterion. “Build it” does not.
Beyond tests: other verification signals
Not all work has tests. But most work has something checkable:
- Type checks —
tsc --noEmitexits 0 or 1. Zero is done. - Lint —
eslint src/ --max-warnings 0. Deterministic and silent in CI. - Screenshot comparison — for UI work, Playwright can take before/after screenshots and compare them. Claude can use the Playwright MCP to run the browser itself.
- File size / line count — crude but useful constraints (“the refactor should not increase the line count of auth.ts”).
- Custom scripts — the
npm testslot in your project can be anything. A verification script that checks domain-specific invariants is just as good as a unit test suite.
Check your understanding
3 questions · your answers are saved in this browser only
-
1. Which task framing is most likely to let Claude finish without human input?
-
2. Why are pre-approved Bash commands in settings.json better than approving them one by one?
-
3. What is the key difference between a settings.json deny rule and a pre-tool-use hook?
4. Parallel sessions and git worktrees
Once agents can run unsupervised on a single task, the next bottleneck is running them one at a time. Most real projects have independent work streams — separate features, separate modules, separate branches — that could run concurrently. Running them sequentially means you multiply the wait.
Multiple terminal sessions
The simplest parallel setup is multiple terminal windows. Claude Code sessions are independent; there’s no conflict in running two or three at once against the same codebase, as long as they don’t edit the same files. If you’re backfilling tests for module A in one session and writing a new feature in module B in another, those are genuinely parallel.
The constraint is file-level conflicts. Two sessions editing the same file will produce a mess.
Git worktrees: true isolation
For work that might touch overlapping files, git worktrees are the right tool. A worktree is a separate filesystem checkout of your repo on a separate branch, sharing the same git history but isolated at the working-directory level.
# Create a worktree for a new feature branch
git worktree add ../myproject-feature feature/auth-refresh
# Terminal A: work on the feature
cd ../myproject-feature && claude
# Terminal B: keep working on main
cd ../myproject && claude
Both sessions have Claude Code running. Both have their own working directory, their own branch, their own ability to run tests and build tools. They cannot conflict at the filesystem level.
The adversarial reviewer pattern
A particularly effective worktree configuration is what Anthropic engineers call the adversarial reviewer: one Claude session implements a change and commits it; a second session in a review-only worktree reads the diff and critiques it — looking for edge cases, missing tests, or simpler alternatives. The reviewer has no attachment to the author’s choices, which is exactly what you want from a code review.
# In the review worktree:
Review the latest commit on feature/auth-refresh.
Act as a skeptical senior engineer. Find: correctness issues, missing edge cases,
violations of our CLAUDE.md conventions, and simpler alternatives.
Be specific — cite line numbers and file names.
This is a fundamentally different way of using agent parallelism: not just for throughput, but for adversarial quality.
5. Headless mode: agents without humans
The purest form of not babysitting is removing yourself from the loop entirely. Claude Code’s
headless mode — claude -p "<prompt>" — runs a one-shot task non-interactively. It reads its
input, uses its tools, produces output, and exits. No UI. No approvals. No waiting.
Basic headless use
# One-shot analysis
claude -p "Read the last 10 error logs in /var/log/app.log and summarize the root cause" \
--output-format text
# Piping data through Claude
cat failing-tests.txt | claude -p "Classify each test failure: flaky, logic error, or environment issue"
# Using it as part of a pipeline
git diff HEAD~1 | claude -p "Write a one-paragraph summary of what changed" >> CHANGELOG.md
The --output-format flag controls how the response is emitted. text gives you the raw
response; json gives you a structured response you can pipe into jq.
Pre-commit hooks
One of the most powerful headless integrations is the git pre-commit hook. A pre-commit runs synchronously before every commit — if it exits non-zero, the commit is blocked.
#!/bin/bash
# .git/hooks/pre-commit (or via husky)
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.(ts|tsx|js)$')
if [ -z "$STAGED_FILES" ]; then
exit 0
fi
echo "Running Claude Code security review on staged files..."
REVIEW=$(echo "$STAGED_FILES" | claude -p "
Review these staged files for security issues only: hardcoded secrets,
SQL injection vectors, XSS sinks. List any findings with file:line.
If none found, output exactly: LGTM
" --output-format text --allowedTools "Read")
if [ "$REVIEW" != "LGTM" ]; then
echo "Security review flagged:"
echo "$REVIEW"
exit 1
fi
This is the permission architecture concept applied to the code pipeline: Claude checks its own
work (or the work it’s about to commit), using a narrowly-scoped tool list (--allowedTools "Read"), without any human approval step.
CI and scheduled runs
Headless mode works equally well in CI. A GitHub Actions step can run Claude Code the same way any other shell command runs:
- name: Claude Code triage
run: |
claude -p "
Read the new issue at $ISSUE_URL using the gh tool.
Add one of these labels: bug, feature, question, duplicate.
Add a one-sentence comment explaining the classification.
" \
--allowedTools "Bash(gh:*)" \
--output-format text
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
For even less human involvement, cron-based runs handle recurring work — nightly dependency update checks, weekly dead-code scans, daily changelog generation from merged PRs.
Check your understanding
2 questions · your answers are saved in this browser only
-
1. What is the primary purpose of passing --allowedTools in a headless claude -p call?
-
2. Which of these is a valid use of headless mode (claude -p)?
6. Putting it together: the orchestration mindset
The four patterns in this lesson — permission architecture, verification-first task design, parallel sessions, and headless automation — are each useful individually. But they compound when you apply them together, because they address the same underlying shift: from treating Claude as a tool you supervise to treating it as a team member you’ve equipped and trusted.
That shift requires a change in what you invest time in. You spend less time approving and watching, and more time on:
- Writing good task specs — precise exit conditions, verification commands, scoped tool access
- Maintaining the policy files — the allow/deny rules that codify your team’s risk tolerance
- Reviewing outcomes, not steps — reading diffs and PR descriptions, not approving each action
- Routing work — deciding what tasks run in which sessions, in what order, in parallel or sequence
This is what the talk means by “orchestrating” rather than “babysitting”. An orchestrator designs the system, monitors the outcomes, and intervenes at decision points. They don’t watch every tool call.
Check your understanding
1 question · your answers are saved in this browser only
-
1. The adversarial reviewer pattern runs two Claude Code sessions against the same repo. What does the second session do?
Build it yourself
Follow these exact steps to reproduce it yourself · estimated time: ~25 min
Prerequisites
- Claude Code installed and authenticated
- A project with a test command
- Basic familiarity with git
Step 1 — Audit your current approval patterns
Start a Claude Code session and run a typical task. Note every time you’re prompted to approve a command. After the task, answer:
- Which of these would you always approve without reading? Those go in the allow list.
- Which require you to actually look at what Claude is doing? Those stay as-is or get hooks.
- Were any dangerous enough that you want them blocked unconditionally? Those go in the deny list.
This audit typically reveals that 80%+ of approvals are reflexive yes-clicks.
Step 2 — Write your settings.json policy
Create .claude/settings.json in your project:
{
"permissions": {
"allow": [
"Bash(npm run *)",
"Bash(npx tsc *)",
"Bash(git diff*)",
"Bash(git log*)",
"Bash(git status)",
"Bash(git add *)",
"Bash(git commit *)",
"Edit",
"Read"
],
"deny": [
"Bash(git push --force*)",
"Bash(rm -rf *)",
"Bash(curl * | sh)",
"Bash(sudo *)"
]
}
}Adjust the lists based on your step 1 audit. Commit this file.
Step 3 — Write one hook for content-aware blocking
Create .claude/hooks/pre-tool-use.sh and make it executable:
#!/bin/bash
# Block any command that references production database names
TOOL_INPUT=$(cat)
TOOL_NAME=$(echo "$TOOL_INPUT" | jq -r '.tool_name // ""')
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.tool_input.command // ""')
# Only check Bash tool calls
if [ "$TOOL_NAME" != "Bash" ]; then
exit 0
fi
# Block production references
if echo "$COMMAND" | grep -qiE "(prod-db|production\.db|prod_database)"; then
echo "Hook blocked: command references production database. Use staging-db instead." >&2
exit 2
fi
exit 0Wire it in .claude/settings.json:
{
"hooks": {
"PreToolUse": [{"command": "bash .claude/hooks/pre-tool-use.sh"}]
}
}Run a test: ask Claude to run a query against prod-db. The hook should block it.
Step 4 — Frame a task with an explicit verification exit condition
Give Claude a task with observable pass criteria:
Add input validation to the signup form in src/components/SignupForm.tsx.
Validation rules: email must match RFC 5321, password must be at least 12
characters and contain one symbol.
After each change, run: npm test -- --testPathPattern=SignupForm
The task is complete when all tests pass and npx tsc --noEmit exits 0.
Don't ask me for input — run autonomously until both commands pass.Do not watch the session. Come back when it’s done. Review the diff.
Step 5 — Set up a parallel worktree session
While the session from step 4 is running, open a new terminal:
git worktree add ../$(basename $PWD)-review -b review-$(date +%Y%m%d)
cd ../$(basename $PWD)-review && claudeGive this session a completely different task — something independent. Run both in parallel. When the step 4 session finishes and commits, give the review session:
Review the latest commit from the main worktree. Act as a skeptical senior
engineer. Look for: missing edge cases, test gaps, and convention violations
from our CLAUDE.md. Be specific with file names and line numbers.Step 6 — Write one headless automation
Pick something you do manually on a recurrence: reviewing new issues, checking for console errors in logs, checking if dependencies have updates. Write it as a headless command:
# Save as scripts/daily-triage.sh
#!/bin/bash
claude -p "
Use the gh tool to list all GitHub issues opened in the last 24 hours in
this repo. For each: classify it as bug, feature, or question. Add the
appropriate label. If it's a bug, add a comment asking for reproduction steps.
" \
--allowedTools "Bash(gh:*)" \
--output-format textRun it once manually to verify it works. Then add it to a cron job or a GitHub Actions scheduled workflow. You’ve just removed yourself from a recurring task entirely.
Where to go next
- Watch the original talk for live demos of the workflows described here.
- Hooks in Claude Code covers the hook lifecycle in detail — the five events, what input each receives, and how to return structured feedback to Claude.
- How We Claude Code from Anthropic’s Applied AI team covers verification-native design in depth, including DOM contracts and Playwright MCP.
- Claude Code Best Practices for the context management habits that complement autonomous operation.