AI Learning
intermediate ⏱️ 11 min read · 🎬 ~37 min video

Ship Your First Managed Agent: Agent, Environment, Session

Claude Managed Agents is the fastest path from prototype to production-ready agent. This lesson walks through the three core primitives — Agent (brain), Environment (hands), Session (the binding) — and shows how to wire them into a working incident-response agent.

This lesson is original educational writing based on this video by Claude (published May 26, 2026). All credit for the original content goes to the creators.

#managed-agents #agentic-workflows #production
Video thumbnail: Ship Your First Managed Agent: Agent, Environment, Session
Original video — all credit to the creators. Watch the original on YouTube ↗

1. The evolution from API to managed infrastructure

When Claude first launched, the only interface was the Messages API: tokens in, tokens out. Developers who wanted an agent loop had to build it themselves — context management, tool-call routing, compaction, caching, retry logic. When agents could only do simple tasks, this was manageable. As models became capable of multi-step, multi-tool tasks, the primitive complexity exploded.

The Agent SDK gave developers a programmatic way to run Claude Code as an agent. But developers still had to manage hosting and container scaling themselves.

Claude Managed Agents is the first harness where Anthropic manages the infrastructure: sandboxing, compaction, caching, reliability, and scaling are handled for you. You configure the task and tools; the platform runs the loop.

The claim from the workshop: teams are building 10–15x faster to production using Managed Agents compared to rolling their own agent harness.

2. The three primitives

Every Managed Agent is composed of three resources:

Agent”The Brain”ModelSystem promptMCP serversSkillsTool definitionsThinking + planning + tool callsSession”The Binding”Ties Agent + EnvironmentStreams events to appEnvironment”The Hands”Sandbox containerTool executionNetwork access rulesFilesystem mountsBYO compute (new)Where actions execute
The three Managed Agents primitives. The Agent (brain) defines what Claude knows and can do. The Environment (hands) is where tool execution happens. The Session binds them together and streams events to your application.

Agent — the brain

The Agent resource defines what Claude is and what it can do:

  • Model: which Claude model to use
  • System prompt: the agent’s persona, task, and constraints
  • Tool definitions: what tools the agent can call (described in JSON schema)
  • MCP servers: connected model context protocol servers
  • Skills: reusable task templates

The agent loop runs server-side on Anthropic’s infrastructure. This is the key architectural decision: when you close your laptop, the agent keeps running. No durability management, no process-hanging-on-your-machine.

Environment — the hands

The Environment resource is where tool execution actually happens. This is deliberately decoupled from the agent loop.

Why the decoupling? Two concrete benefits:

  1. Security: Credentials never touch the same container as the agent’s reasoning. The agent can call a “get_secret” tool; the environment handler fetches the secret from a vault and returns only what the agent needs — the agent process itself never has filesystem access to your secrets.

  2. Latency: Previously, spinning up the agent loop and the tool execution environment in the same container meant every new session waited for both. With the decoupled design, Anthropic saw >90% reduction in P95 time-to-first-token because the agent loop can start immediately.

Session — the binding

A Session ties an Agent to an Environment and starts the agent loop. Sessions:

  • Persist in the cloud (hard refresh = no data loss)
  • Stream events to your application
  • Have states: idle → running → rescheduling (on retry) → terminated
  • Can be resumed from any state via webhooks or direct API calls

3. Events, not responses

The Messages API gives you tokens in/tokens out. Managed Agents gives you events.

Every meaningful action in a session produces an event: user message received, tool called, tool result returned, agent response started, agent response completed. Events are appended to the session log and streamed to your application in real time.

Why this matters: if a container goes down mid-task, the session log is intact. The harness spins up a fresh container and continues from the last event. No lost work, no “start over.” This is what production-grade durability looks like.

Your application receives these events via streaming and can surface them to users — progress updates, intermediate results, tool outputs — as they happen instead of waiting for the full response.

Check your understanding

3 questions · your answers are saved in this browser only

  1. 1. Why is the agent loop deliberately decoupled from tool execution in Claude Managed Agents?

  2. 2. What happens to a Managed Agent session when a container goes down mid-task?

  3. 3. What is the purpose of "local tool" functions in the workshop's architecture?

4. Local tools: connecting your infrastructure

In the workshop, the incident-response agent calls tools like get_metrics, get_recent_deploys, and get_diff. These execute locally — simulated from JSON files in the demo, but in production they’d call DataDog, PagerDuty, or GitHub.

The pattern: you define the tools in the Agent (their JSON schema, what they do), and you handle their execution locally in your application. When the agent calls a tool, an event arrives in your stream; you execute the function, return the result; the harness continues the loop.

def handle_tool_call(event):
    tool_name = event.tool_name
    tool_input = event.tool_input

    if tool_name == "get_metrics":
        return fetch_from_datadog(tool_input["service"], tool_input["window"])
    elif tool_name == "get_recent_deploys":
        return fetch_from_github_releases(tool_input["repo"])
    elif tool_name == "get_diff":
        return fetch_git_diff(tool_input["commit_sha"])

    raise ValueError(f"Unknown tool: {tool_name}")

This is the migration path from prototype to production: replace JSON file reads with real client calls, one tool at a time. The agent’s system prompt and behavior don’t change; only the tool implementations do.

5. What the platform handles for you

The workshop emphasizes what you don’t have to build:

  • Compaction: when context gets long, the harness compacts it automatically — you don’t manage context windows
  • Caching: prompt caching is handled at the harness level for cost and latency
  • Retry logic: rescheduling state handles transient failures without your intervention
  • Session persistence: the conversation is in the cloud; hard refresh, laptop close, network blip — all transparent to the user
  • Observability: the Managed Agents console shows every event, tool call, and response — the full agent trajectory, inspectable at any time

What you control: the task definition, the system prompt, the tools the agent can call, and the custom tool logic that connects to your infrastructure.

6. Beyond the basics

The workshop briefly covers what’s available beyond the core primitives:

  • Subagents: an orchestrator agent spins up sub-agents with their own context windows for parallelism
  • Memory: persistent memory stores that bridge information across sessions (see Agents That Remember)
  • Outcomes: define a rubric for what success looks like; the agent drives toward the outcome, not just executes steps
  • Vaults: encrypted credential storage decoupled from agent access — credentials are fetched by the environment on demand, never persisted in the agent’s context
  • Webhooks: external events (a GitHub webhook, a PagerDuty alert) can wake a session or trigger a new one

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~90 minutes

Prerequisites

  • An Anthropic API key with Claude Managed Agents access
  • Python 3.10+ and pip
  • One real data source you care about (could be a GitHub repo, a log file, a database)

Build a minimal agent that connects to one real data source and can answer questions about it.

Step 1 — Set up

mkdir my-managed-agent && cd my-managed-agent
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic streamlit
cp .env.example .env   # add ANTHROPIC_API_KEY

Step 2 — Define your agent

import anthropic

client = anthropic.Anthropic()

agent = client.beta.managed_agents.agents.create(
    name="my-data-agent",
    model="claude-opus-4-8",
    system_prompt="""You are an assistant that helps engineers understand system state.
    
You have access to tools that fetch real data. Always fetch data before answering;
do not speculate about current state. When you find issues, explain root cause 
and suggest remediation steps.""",
    tools=[
        {
            "name": "get_recent_logs",
            "description": "Fetch the last N log lines from the service",
            "input_schema": {
                "type": "object",
                "properties": {
                    "service": {"type": "string", "description": "Service name"},
                    "lines": {"type": "integer", "default": 50},
                },
                "required": ["service"],
            },
        }
    ],
)
print(f"Agent created: {agent.id}")

Step 3 — Define your environment

env = client.beta.managed_agents.environments.create(
    name="my-env",
    networking={"allowed_domains": ["*"]},  # restrict in production
)
print(f"Environment: {env.id}")

Step 4 — Implement your local tool

def get_recent_logs(service: str, lines: int = 50) -> str:
    # Replace with your real data source:
    # - Read from a log file
    # - Query CloudWatch, Datadog, or Grafana
    # - Hit an internal API
    
    # Minimal example: read a local log file
    log_path = f"/var/log/{service}.log"
    try:
        with open(log_path) as f:
            content = f.readlines()
        return "".join(content[-lines:])
    except FileNotFoundError:
        return f"[No log file found for {service}]"

TOOL_HANDLERS = {
    "get_recent_logs": lambda inp: get_recent_logs(**inp),
}

Step 5 — Create a session and stream events

def run_query(agent_id: str, env_id: str, user_question: str):
    session = client.beta.managed_agents.sessions.create(
        agent_id=agent_id,
        environment_id=env_id,
        title=f"Query: {user_question[:50]}",
    )

    # Send the user's question
    client.beta.managed_agents.sessions.messages.create(
        session_id=session.id,
        content=user_question,
    )

    # Stream events
    for event in client.beta.managed_agents.sessions.stream(session.id):
        if event.type == "tool_call":
            handler = TOOL_HANDLERS.get(event.tool_name)
            if handler:
                result = handler(event.tool_input)
                client.beta.managed_agents.sessions.tool_results.create(
                    session_id=session.id,
                    tool_call_id=event.tool_call_id,
                    result=result,
                )
        elif event.type == "agent_response":
            yield event.content   # stream to the UI
        elif event.type == "session_completed":
            break

# Usage
for chunk in run_query(agent.id, env.id, "What's happening in the auth service?"):
    print(chunk, end="", flush=True)

Step 6 — Add session persistence

def list_sessions(agent_id: str):
    sessions = client.beta.managed_agents.sessions.list(agent_id=agent_id)
    return [(s.id, s.title, s.status) for s in sessions]

def resume_session(session_id: str, new_message: str):
    client.beta.managed_agents.sessions.messages.create(
        session_id=session_id,
        content=new_message,
    )
    for event in client.beta.managed_agents.sessions.stream(session_id):
        if event.type == "agent_response":
            yield event.content
        elif event.type == "session_completed":
            break

Step 7 — Extend with a second tool

Add a second tool to the Agent definition and a matching handler. Common next tools:

  • search_deployments — query your deployment history
  • get_error_rate — fetch error rate from your APM
  • list_open_incidents — query PagerDuty or OpsGenie

Each new tool makes the agent more capable without changing the session or environment.

Production checklist

Before going live:

  • Restrict allowed_domains in the environment to only what your tools need
  • Store credentials in Vaults, not in tool handler code
  • Add session deletion on user logout: client.beta.managed_agents.sessions.delete(session_id)
  • Set up webhook triggers for event-driven activation (new incident → new session)
  • Review session logs via the console observability dashboard before go-live

Where to go next

Related lessons

intermediate 🎬 Anthropic · ~15 min

Giving Agents Their Own Computers

How Cursor gave cloud agents onboarding, dev environments, and the ability to self-report problems — and what the 'agent experience' means for teams shipping parallel agents at scale.

#agentic-workflows #managed-agents
intermediate 🎬 Anthropic · ~46 min

Routines, CI Autofix, and the Advisor Strategy

The biggest Claude Code platform updates from London 2026: routines that trigger on schedules and webhooks, CI that fixes its own failures, the advisor pattern for frontier-quality at lower cost, and self-hosted agent sandboxes.

#claude-code #agentic-workflows #managed-agents
advanced 🎬 Anthropic · ~30 min

Trustworthy Agentic Workflows with a Custom DSL

How Elicit built AshPL — a Turing-incomplete, purely functional DSL — to make their AI research assistant legible, auditable, and faithfully executable.

#agentic-workflows #managed-agents