Giving Agents Their Own Computers
How Cursor gave cloud agents onboarding, dev environments, and the ability to self-report problems — and what the 'agent experience' means for teams shipping parallel agents at scale.
This lesson is original educational writing based on this video by Anthropic (published May 8, 2026). All credit for the original content goes to the creators.
1. The onboarding gap
When a new developer joins Cursor, the company makes sure they have everything they need to be productive: a computer, a working dev environment, credentials, and documentation. It sounds obvious. But Cursor’s engineering team realised they were not doing the equivalent for their cloud agents — and the consequences were stark.
Their agents were:
- Site-reading code with no ability to run it. They could read files but couldn’t start services, observe outputs, or verify their own changes.
- Spinning up fresh environments on every run. No persistence meant repeating expensive setup steps rather than building on prior work.
- Unable to test their own changes. Every change shipped blind, leaving human engineers to do the verification that agents should have done themselves.
- Frustrating the developers meant to collaborate with them. When agents fail at basics, people stop reaching for them.
The insight is simple but easy to miss: agents are junior colleagues, not tools. You wouldn’t hand a new hire a laptop with no OS, no logins, and no documentation, then blame them for being slow.
2. Stage 1: Give agents tools and context
Cursor’s first move was to build an onboarding agent — available at cursor.com/onboard — whose sole job is to explore a codebase and figure out how to run it, before any feature work happens.
The onboarding agent does not make changes. Instead it:
- Reads the codebase to understand its structure.
- Discovers what environment variables are needed.
- Figures out which services to start, and in what order.
- Works interactively with the developer to confirm everything is running correctly.
This is the equivalent of giving an agent its own computer and sitting with it through the first day. The output is a living document — a machine-readable spec of how to boot the project — that every subsequent cloud agent can use instead of rediscovering the same setup steps from scratch.
The principle generalises beyond Cursor: the onboarding artifact is infrastructure, not a one-time convenience. It compounds. Every new agent that runs against your codebase benefits from every discovery the onboarding agent made.
3. Stage 2: Learn to leverage more capable models
Once agents could boot the codebase, the next bottleneck was workflow: agents were writing sleep 5 and hoping a service had started, creating test accounts by hand on every run, and making API calls that assumed third-party services were already signed in.
Cursor built anydev — a CLI they describe as a “Swiss Army knife for cloud agents.” Key capabilities:
- Start services and wait for them to be ready. Not
sleep 5, but a real readiness check — the CLI polls the service and unblocks the agent only when the healthcheck passes. - Check service status. Agents can inspect the state of the system they’re working in rather than guessing.
- Create test accounts and sign in to third-party services. Bookkeeping that used to fall to human developers is now handled automatically, with appropriate security constraints.
The effect was a positive feedback loop: as the agent development experience improved, more developers ran more cloud agents. More agents meant more edge cases discovered. More edge cases meant more improvements to the tooling. Repeat.
4. Stage 3: Build the system that builds the system
The third stage is the most important shift in mindset. Stages 1 and 2 solved specific, known problems. Stage 3 asks: what is the infrastructure that solves unknown problems as they arise?
Cursor’s formulation: instead of hand-holding agents from task A to task D, build the system that handles A through Z.
This maps to three principles they derived from running many agents in production:
Principle 1: Give agents eyes
Agents need to see what they are doing. If an agent is running an application, it should be able to observe that application — its UI, its logs, its state. If the developer changes something in the app, the agent should see the change. If multiple agents are collaborating, their conversations should be visible to each other.
“Eyes” is not a metaphor for documentation. It means live, structured observability over the system the agent is operating in.
Principle 2: Give agents the same tools humans have
Agents should be able to run the applications developers run and access the services developers can access — subject to appropriate security boundaries. If a developer debugs by opening a browser and clicking through a flow, the agent should be able to do the same.
This sounds obvious until you check your own setup: how many things can your developers do that your agents cannot?
Principle 3: Computer use as a foundational primitive
For GUI workflows, computer use goes beyond “click in the right place.” The comparison Cursor uses is instructive: chess versus a video game. Chess is fully observable — you always know the complete game state. A video game is partially observable — you see only a slice at any time, there are irreversible actions (one-way doors), and there are game-over states.
Agents navigating GUIs operate in video-game mode. This demands metacognition: the agent must model its own uncertainty, plan for irreversibility, and know when to pause and backtrack rather than charging forward.
5. The “Work on the Factory” skill
The single most important capability Cursor built into their cloud agents is called WCF: Work on the Factory (or “Work on the Factory Floor,” depending on context). Every cloud agent at Cursor has it.
The metaphor comes from manufacturing: a factory worker who only runs the machines will eventually be blocked by a broken machine. A factory worker who also improves the factory compounds their productivity over time.
How WCF works
When an agent encounters something annoying, broken, or confusing during a task — rather than grinding through it silently — it pauses and reports the issue before continuing.
Reports are written to a shared system of record: structured entries describing the problem, the context in which it was encountered, and the agent’s best guess at the cause.
Humans (and increasingly agents) then triage the incoming reports into three categories:
- Technical problem — something in the codebase or infrastructure is broken and needs a fix.
- Permission issue — the agent lacked access to something it needed.
- Knowledge gap — the agent didn’t know how to do something it needed to do.
Once categorised, fixes are assigned. What makes this powerful is the validation step: when a fix PR comes back, the solving agent doesn’t just review it — it spins up multiple test agents to validate the fix across different scenarios before the PR is reviewed by a human. The human reviewer receives a high-trust, pre-validated solution.
Why this matters at scale
WCF turns every agent run into a source of compound improvement rather than a source of noise. Without it, failures accumulate silently: developers notice agents are unreliable and trust erodes. With it, every failure surfaces as a structured ticket, gets fixed, and the next agent that encounters the same situation succeeds where the last one failed.
The validation mechanism — fix agents spawning test agents — also means the quality bar rises over time. Early on, humans must review most fixes carefully. Later, agent-validated PRs arrive with a rich test suite, and the human’s job becomes oversight rather than verification.
Check your understanding
5 questions · your answers are saved in this browser only
-
1. Cursor's onboarding agent's primary job is to:
-
2. Why did Cursor build the `anydev` CLI rather than just using `sleep N` in agent scripts?
-
3. Which analogy does Cursor use to explain why computer use requires metacognition?
-
4. In the WCF loop, what happens after a fix PR is created?
-
5. What is the core claim about "agent experience" in the talk?
6. Agent experience as product
The talk closes with a claim that should land differently for engineering leaders than it might for individual developers: you need to care just as much, if not more, about the agent experience as about the developer experience.
This is a product argument, not a technical one.
When an agent fails at a task a developer was counting on, two things happen:
- The developer does the task themselves (no net gain).
- The developer’s mental model of agents updates: “agents can’t be trusted with this kind of work.”
That second effect is the dangerous one. Trust is a threshold — once developers conclude that agents are unreliable for a class of task, they stop reaching for agents on that class of task. The potential leverage disappears.
The inverse is equally powerful. When agents handle a hard task well, developers expand what they are willing to delegate. They try harder tasks. They build richer tooling for agents to use. The trust compounds.
This means that every agent failure is a product failure — not just a reliability number to track in a dashboard. And it means that investing in agent experience (onboarding, environment tooling, observability, WCF) is not just engineering housekeeping. It is the foundation of how much leverage your team gets from AI.
Build it yourself
Follow these exact steps to reproduce it yourself · estimated time: ~45 minutes
Prerequisites
- A codebase you want agents to work in (any language)
- Claude Code or another agent CLI installed
- An Anthropic API key or Pro/Max subscription
Step 1 — Audit your current agent onboarding
Before building, assess what you have. Ask: if a new cloud agent started in your repo right now with no human help, could it:
- Install dependencies?
- Set required environment variables?
- Start the application?
- Run the test suite?
- Verify a change it made actually works?
Write down every “no” as a gap. These are your onboarding tasks.
Step 2 — Write an onboarding prompt
Create a file at .claude/commands/onboard.md:
You are an onboarding agent. Your job is NOT to make changes to the codebase.
Your job is to figure out how to run it.
Work through the following, asking me for confirmation at each step:
1. Read the repo structure and identify the main entry points.
2. Find all environment variables required to run the app (check .env.example, README, config files).
3. Determine which services need to be started, and in what order.
4. Write a step-by-step boot sequence that another agent could follow without human help.
5. Run the boot sequence and confirm the app is healthy.
Output a file at .claude/agent-onboarding.md with the full, machine-readable setup spec.Step 3 — Run the onboarding agent
claude /onboardWork through it interactively. When it asks for confirmation, verify manually. At the end you should have .claude/agent-onboarding.md — commit this file.
Step 4 — Add a readiness wrapper
If your stack has services that need to start before agents can work, add a shell script at .claude/wait-for-services.sh:
#!/bin/bash
# Wait for the dev server to be ready before proceeding
MAX_WAIT=60
ELAPSED=0
until curl -sf http://localhost:3000/health > /dev/null 2>&1; do
sleep 2
ELAPSED=$((ELAPSED + 2))
if [ $ELAPSED -ge $MAX_WAIT ]; then
echo "Service did not start within ${MAX_WAIT}s" && exit 1
fi
done
echo "Service ready after ${ELAPSED}s"Reference this in your agent system prompt or CLAUDE.md: Before making changes, run .claude/wait-for-services.sh.
Step 5 — Set up a WCF issues file
Create .claude/agent-issues.md and add this instruction to your CLAUDE.md:
## When you hit friction
If you encounter something broken, confusing, or missing during a task,
add an entry to .claude/agent-issues.md before continuing:
- **What:** one-line description of the problem
- **When:** what task you were doing when it happened
- **Type:** technical problem / permission issue / knowledge gap
- **Impact:** how much it slowed you down (low / medium / high)Step 6 — Review the issues log weekly
At the end of each week, read .claude/agent-issues.md and pick the highest-impact item. Fix it. Over four weeks you will have resolved the four biggest blockers your agents face — compounding improvements with minimal overhead.
Expected result: agents that can boot your project without human help, that wait for services to be ready rather than guessing, and that surface their own blockers rather than silently failing. The onboarding document and issues log are infrastructure that pays forward to every future agent run.
Where to go next
- Watch the original talk — the live demos of
anydevand the WCF system are worth seeing directly. - Read Mastering Claude Code for the foundational agentic loop that underpins everything described here.
- Explore the Claude computer use documentation for the metacognition and backtracking patterns the talk references.