Giving Agents Their Own Computers

1. The onboarding gap

When a new developer joins Cursor, the company makes sure they have everything they need to be productive: a computer, a working dev environment, credentials, and documentation. It sounds obvious. But Cursor’s engineering team realised they were not doing the equivalent for their cloud agents — and the consequences were stark.

Their agents were:

Site-reading code with no ability to run it. They could read files but couldn’t start services, observe outputs, or verify their own changes.
Spinning up fresh environments on every run. No persistence meant repeating expensive setup steps rather than building on prior work.
Unable to test their own changes. Every change shipped blind, leaving human engineers to do the verification that agents should have done themselves.
Frustrating the developers meant to collaborate with them. When agents fail at basics, people stop reaching for them.

The insight is simple but easy to miss: agents are junior colleagues, not tools. You wouldn’t hand a new hire a laptop with no OS, no logins, and no documentation, then blame them for being slow.

2. Stage 1: Give agents tools and context

Cursor’s first move was to build an onboarding agent — available at cursor.com/onboard — whose sole job is to explore a codebase and figure out how to run it, before any feature work happens.

The onboarding agent does not make changes. Instead it:

Reads the codebase to understand its structure.
Discovers what environment variables are needed.
Figures out which services to start, and in what order.
Works interactively with the developer to confirm everything is running correctly.

This is the equivalent of giving an agent its own computer and sitting with it through the first day. The output is a living document — a machine-readable spec of how to boot the project — that every subsequent cloud agent can use instead of rediscovering the same setup steps from scratch.

The principle generalises beyond Cursor: the onboarding artifact is infrastructure, not a one-time convenience. It compounds. Every new agent that runs against your codebase benefits from every discovery the onboarding agent made.

3. Stage 2: Learn to leverage more capable models

Once agents could boot the codebase, the next bottleneck was workflow: agents were writing sleep 5 and hoping a service had started, creating test accounts by hand on every run, and making API calls that assumed third-party services were already signed in.

Cursor built anydev — a CLI they describe as a “Swiss Army knife for cloud agents.” Key capabilities:

Start services and wait for them to be ready. Not sleep 5, but a real readiness check — the CLI polls the service and unblocks the agent only when the healthcheck passes.
Check service status. Agents can inspect the state of the system they’re working in rather than guessing.
Create test accounts and sign in to third-party services. Bookkeeping that used to fall to human developers is now handled automatically, with appropriate security constraints.

The effect was a positive feedback loop: as the agent development experience improved, more developers ran more cloud agents. More agents meant more edge cases discovered. More edge cases meant more improvements to the tooling. Repeat.

4. Stage 3: Build the system that builds the system

The third stage is the most important shift in mindset. Stages 1 and 2 solved specific, known problems. Stage 3 asks: what is the infrastructure that solves unknown problems as they arise?

Cursor’s formulation: instead of hand-holding agents from task A to task D, build the system that handles A through Z.

This maps to three principles they derived from running many agents in production:

Principle 1: Give agents eyes

Agents need to see what they are doing. If an agent is running an application, it should be able to observe that application — its UI, its logs, its state. If the developer changes something in the app, the agent should see the change. If multiple agents are collaborating, their conversations should be visible to each other.

“Eyes” is not a metaphor for documentation. It means live, structured observability over the system the agent is operating in.

Principle 2: Give agents the same tools humans have

Agents should be able to run the applications developers run and access the services developers can access — subject to appropriate security boundaries. If a developer debugs by opening a browser and clicking through a flow, the agent should be able to do the same.

This sounds obvious until you check your own setup: how many things can your developers do that your agents cannot?

Principle 3: Computer use as a foundational primitive

For GUI workflows, computer use goes beyond “click in the right place.” The comparison Cursor uses is instructive: chess versus a video game. Chess is fully observable — you always know the complete game state. A video game is partially observable — you see only a slice at any time, there are irreversible actions (one-way doors), and there are game-over states.

Agents navigating GUIs operate in video-game mode. This demands metacognition: the agent must model its own uncertainty, plan for irreversibility, and know when to pause and backtrack rather than charging forward.

The three stages of agent maturity: from context-only to full environment ownership to a self-improving system.

5. The “Work on the Factory” skill

The single most important capability Cursor built into their cloud agents is called WCF: Work on the Factory (or “Work on the Factory Floor,” depending on context). Every cloud agent at Cursor has it.

The metaphor comes from manufacturing: a factory worker who only runs the machines will eventually be blocked by a broken machine. A factory worker who also improves the factory compounds their productivity over time.

How WCF works

When an agent encounters something annoying, broken, or confusing during a task — rather than grinding through it silently — it pauses and reports the issue before continuing.

Reports are written to a shared system of record: structured entries describing the problem, the context in which it was encountered, and the agent’s best guess at the cause.

Humans (and increasingly agents) then triage the incoming reports into three categories:

Technical problem — something in the codebase or infrastructure is broken and needs a fix.
Permission issue — the agent lacked access to something it needed.
Knowledge gap — the agent didn’t know how to do something it needed to do.

Once categorised, fixes are assigned. What makes this powerful is the validation step: when a fix PR comes back, the solving agent doesn’t just review it — it spins up multiple test agents to validate the fix across different scenarios before the PR is reviewed by a human. The human reviewer receives a high-trust, pre-validated solution.

The WCF self-improvement loop: agent issues compound into a progressively better agent development environment.

Why this matters at scale

WCF turns every agent run into a source of compound improvement rather than a source of noise. Without it, failures accumulate silently: developers notice agents are unreliable and trust erodes. With it, every failure surfaces as a structured ticket, gets fixed, and the next agent that encounters the same situation succeeds where the last one failed.

The validation mechanism — fix agents spawning test agents — also means the quality bar rises over time. Early on, humans must review most fixes carefully. Later, agent-validated PRs arrive with a rich test suite, and the human’s job becomes oversight rather than verification.

Check your understanding

5 questions · your answers are saved in this browser only

1. Cursor's onboarding agent's primary job is to:

The onboarding agent does not make changes. It explores the codebase to discover env vars, service startup order, and required permissions — producing a spec that every subsequent agent can reuse.
2. Why did Cursor build the `anydev` CLI rather than just using `sleep N` in agent scripts?

`sleep N` is a guess at how long setup takes. A readiness check polls for actual service health, making the agent faster and more reliable simultaneously.
3. Which analogy does Cursor use to explain why computer use requires metacognition?

Chess has full board state. A video game shows only a slice at a time, has irreversible actions, and has failure states. GUIs are more like video games — agents need to model uncertainty and plan for backtracking.
4. In the WCF loop, what happens after a fix PR is created?

Fix agents spin up test agents to validate the solution. The human reviewer receives a pre-validated, high-trust PR rather than having to do the verification work themselves.
5. What is the core claim about "agent experience" in the talk?

If agents fail repeatedly, developers lose trust and stop using them. If you invest in the agent experience, developers tackle bigger and bigger tasks. The feedback loop amplifies in both directions.

6. Agent experience as product

The talk closes with a claim that should land differently for engineering leaders than it might for individual developers: you need to care just as much, if not more, about the agent experience as about the developer experience.

This is a product argument, not a technical one.

When an agent fails at a task a developer was counting on, two things happen:

The developer does the task themselves (no net gain).
The developer’s mental model of agents updates: “agents can’t be trusted with this kind of work.”

That second effect is the dangerous one. Trust is a threshold — once developers conclude that agents are unreliable for a class of task, they stop reaching for agents on that class of task. The potential leverage disappears.

The inverse is equally powerful. When agents handle a hard task well, developers expand what they are willing to delegate. They try harder tasks. They build richer tooling for agents to use. The trust compounds.

This means that every agent failure is a product failure — not just a reliability number to track in a dashboard. And it means that investing in agent experience (onboarding, environment tooling, observability, WCF) is not just engineering housekeeping. It is the foundation of how much leverage your team gets from AI.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~45 minutes

Prerequisites

A codebase you want agents to work in (any language)
Claude Code or another agent CLI installed
An Anthropic API key or Pro/Max subscription

Step 1 — Audit your current agent onboarding

Before building, assess what you have. Ask: if a new cloud agent started in your repo right now with no human help, could it:

Install dependencies?
Set required environment variables?
Start the application?
Run the test suite?
Verify a change it made actually works?

Write down every “no” as a gap. These are your onboarding tasks.

Step 2 — Write an onboarding prompt

Create a file at .claude/commands/onboard.md:

You are an onboarding agent. Your job is NOT to make changes to the codebase.
Your job is to figure out how to run it.

Work through the following, asking me for confirmation at each step:

1. Read the repo structure and identify the main entry points.
2. Find all environment variables required to run the app (check .env.example, README, config files).
3. Determine which services need to be started, and in what order.
4. Write a step-by-step boot sequence that another agent could follow without human help.
5. Run the boot sequence and confirm the app is healthy.

Output a file at .claude/agent-onboarding.md with the full, machine-readable setup spec.

Step 3 — Run the onboarding agent

claude /onboard

Work through it interactively. When it asks for confirmation, verify manually. At the end you should have .claude/agent-onboarding.md — commit this file.

Step 4 — Add a readiness wrapper

If your stack has services that need to start before agents can work, add a shell script at .claude/wait-for-services.sh:

#!/bin/bash
# Wait for the dev server to be ready before proceeding
MAX_WAIT=60
ELAPSED=0
until curl -sf http://localhost:3000/health > /dev/null 2>&1; do
  sleep 2
  ELAPSED=$((ELAPSED + 2))
  if [ $ELAPSED -ge $MAX_WAIT ]; then
    echo "Service did not start within ${MAX_WAIT}s" && exit 1
  fi
done
echo "Service ready after ${ELAPSED}s"

Reference this in your agent system prompt or CLAUDE.md: Before making changes, run .claude/wait-for-services.sh.

Step 5 — Set up a WCF issues file

Create .claude/agent-issues.md and add this instruction to your CLAUDE.md:

## When you hit friction
If you encounter something broken, confusing, or missing during a task,
add an entry to .claude/agent-issues.md before continuing:

- **What:** one-line description of the problem
- **When:** what task you were doing when it happened
- **Type:** technical problem / permission issue / knowledge gap
- **Impact:** how much it slowed you down (low / medium / high)

Step 6 — Review the issues log weekly

At the end of each week, read .claude/agent-issues.md and pick the highest-impact item. Fix it. Over four weeks you will have resolved the four biggest blockers your agents face — compounding improvements with minimal overhead.

Expected result: agents that can boot your project without human help, that wait for services to be ready rather than guessing, and that surface their own blockers rather than silently failing. The onboarding document and issues log are infrastructure that pays forward to every future agent run.

Where to go next

Watch the original talk — the live demos of anydev and the WCF system are worth seeing directly.
Read Mastering Claude Code for the foundational agentic loop that underpins everything described here.
Explore the Claude computer use documentation for the metacognition and backtracking patterns the talk references.

Giving Agents Their Own Computers

1. The onboarding gap

2. Stage 1: Give agents tools and context

3. Stage 2: Learn to leverage more capable models

4. Stage 3: Build the system that builds the system

Principle 1: Give agents eyes

Principle 2: Give agents the same tools humans have

Principle 3: Computer use as a foundational primitive

5. The “Work on the Factory” skill

How WCF works

Why this matters at scale

Check your understanding

6. Agent experience as product

Build it yourself

Step 1 — Audit your current agent onboarding

Step 2 — Write an onboarding prompt

Step 3 — Run the onboarding agent

Step 4 — Add a readiness wrapper

Step 5 — Set up a WCF issues file

Step 6 — Review the issues log weekly

Where to go next

Related lessons

Routines, CI Autofix, and the Advisor Strategy

Ship Your First Managed Agent: Agent, Environment, Session

Trustworthy Agentic Workflows with a Custom DSL

1. The onboarding gap

2. Stage 1: Give agents tools and context

3. Stage 2: Learn to leverage more capable models

4. Stage 3: Build the system that builds the system

Principle 1: Give agents eyes

Principle 2: Give agents the same tools humans have

Principle 3: Computer use as a foundational primitive

5. The “Work on the Factory” skill

How WCF works

Why this matters at scale

🧠 Check your understanding

6. Agent experience as product

🛠️ Build it yourself

Step 1 — Audit your current agent onboarding

Step 2 — Write an onboarding prompt

Step 3 — Run the onboarding agent

Step 4 — Add a readiness wrapper

Step 5 — Set up a WCF issues file

Step 6 — Review the issues log weekly

Where to go next

Related lessons

Routines, CI Autofix, and the Advisor Strategy

Ship Your First Managed Agent: Agent, Environment, Session

Trustworthy Agentic Workflows with a Custom DSL

Check your understanding

Build it yourself