AI Learning
advanced ⏱️ 10 min read · 🎬 ~25 min video

Memory and Dreaming: Building Self-Improving Agents

Design production memory systems for multi-agent architectures using filesystem-based memory stores, optimistic concurrency, and the dreaming feedback loop.

This lesson is original educational writing based on this video by Anthropic (published May 21, 2026). All credit for the original content goes to the creators.

#managed-agents #memory #multi-agent
Video thumbnail: Memory and Dreaming: Building Self-Improving Agents
Original video — all credit to the creators. Watch the original on YouTube ↗

The Isolation Problem

Every time an agent starts a session without memory, it starts from the same blank slate. Performance on each new task mirrors the last — there’s no learning curve, just repetition. Agents make the same mistakes independently, display the same inefficiencies, and duplicate effort that other agents have already done.

The goal is different: performance should improve from task to task, and from agent to agent. Memory is the mechanism that makes this possible. It lets agents carry forward learnings, avoid known pitfalls, and build on a shared understanding of the organisation they work in.

Without memoryWith memoryAgent 1session AAgent 2session BAgent 3session CFlat performance — no learningAgent 1session AAgent 2session BAgent 3session CSharedMemoryStoreCumulative improvement
Without memory, each agent starts fresh. With memory, learnings compound across sessions and agents.

Why a File System?

Earlier memory implementations focused on capability in the harness — custom tools, CLAUDE.md files, SDK-level memory primitives. These worked but required careful engineering to keep in sync.

The shift in Anthropic’s managed-agent memory is simpler: model memory as a file system. Claude already excels at navigating virtual environments, using bash and grep, reading, updating, and organising files. Rather than building a bespoke memory interface, the design leans into what Claude already does well. The memory store mounts as a filesystem the agent can read and write freely.

This “get out of Claude’s way” principle also applied to skills. A flexible, minimal format turned out to create endless possibilities precisely because the model already understood how to work with it.

Multi-Agent Memory Architecture

Single-agent memory is straightforward. Multi-agent memory introduces new requirements:

  • Multiple sessions reading and writing the same store simultaneously
  • Different scopes: org-wide knowledge vs. task-specific state
  • Write conflicts when two agents update the same file concurrently
  • Enterprise controls: version history, attribution, audit trails

The solution is a layered store hierarchy:

ScopeAccessContent
Organisation-wideRead-only (for agents)SLO policies, runbooks, on-call mappings — stable reference material
Task-specificRead-writeFindings, decisions, fix status, in-flight state

To prevent one agent from clobbering another’s writes, Anthropic uses optimistic concurrency control: an agent reads the current version, makes its update, and commits with the expected version. If a conflicting write happened in between, the commit fails and the agent retries — no locks, no blocking, good throughput under concurrent writes.

Dreaming: The Feedback Loop

Agents writing to memory as they work is locally optimal — like taking notes while doing a task. But scaled across many sessions, locally optimal becomes globally fragmented: agents independently learn from the same mistakes, duplicate findings, create overlapping entries.

Dreaming is Anthropic’s answer. It’s a batch process, completely decoupled from the agent loop, that:

  1. Reads session transcripts from past runs
  2. Inspects the current state of memory
  3. Proposes curated, consolidated updates
  4. Produces a verified new memory snapshot the next agents can adopt
Active AgentsWork on tasksWrite findingsto memoryin real timeMemory StoreFiles, structuredVersion historyAttributionConcurrency ctrlDreamingReads transcriptsFinds patternsConsolidatesmemory globallywritesreads + updatesasync — decoupled from hot path
Dreaming runs out-of-band: agents write memory in real time; dreaming refines it asynchronously.

Why out-of-band matters

Three benefits come from dreaming’s decoupled architecture:

  1. Cross-session pattern detection. A single agent can only see its own history. Dreaming analyzes transcripts across all agents and sessions, which is where recurring mistakes and systemic inefficiencies become visible. In the SRE demo, dreaming discovered that a CPU spike was always followed 60 seconds later by an alert — a pattern no individual agent session could have noticed.

  2. No objective conflict. An agent running in production must balance improving its memory quality against completing its actual task. Dreaming runs independently, so it can focus entirely on memory quality without trading off against task performance.

  3. Zero latency added. Dreaming is completely off the hot path. It can run nightly, hourly, ad hoc, or triggered by events like end-of-session — all via API.

Production Patterns

Scheduling. Dreaming can be triggered via API on any cadence. Common patterns: nightly consolidation, end-of-sprint review, or event-triggered after a significant incident closes.

Memory API. Memory has a standalone CRUD API, so teams can manage it from anywhere — not just from within agents. This includes exports (for audits), redactions (for compliance), and diffs between versions.

Enterprise controls. Every write is version-controlled with attribution: which session wrote which part. Teams can inspect how memory evolved over time and roll back bad updates — critical for production trust.

Harvey’s result. With dreaming enabled on their legal benchmark, Harvey saw a 6× increase in agent completion rates. The cause: agents were independently learning from the same failures, each storing a fragmented lesson. Dreaming consolidated those into shared, high-quality guidance that every subsequent agent benefited from.

Check your understanding

4 questions · your answers are saved in this browser only

  1. 1. Why does Anthropic model memory as a file system rather than as a custom database?

  2. 2. What is the purpose of the organisation-wide read-only memory store in a multi-agent system?

  3. 3. Why is dreaming's out-of-band, decoupled design important?

  4. 4. What mechanism prevents two agents from overwriting each other's memory writes simultaneously?

Build it yourself

Follow these exact steps to reproduce it yourself

Try it yourself: persistent memory with the Managed Agents API

  1. Provision a memory store via the Claude platform console or API. Start with a single read-write store.

  2. Attach it to a session. Pass the memory store ID when creating a managed agent session. The store mounts as a virtual filesystem the agent can read and write using standard file tools.

  3. Run two sessions sequentially on related tasks. In the second session, add to the system prompt: “Before starting, review the memory store for relevant prior findings.” Check whether session 2 references session 1’s work.

  4. Trigger a dream. After three or more sessions, kick off dreaming via the API (or the console Dreams tab). Inspect the diff in the memory store — look for consolidation, deduplication, and new cross-session insights.

  5. Add a read-only org store. Create a second store, populate it with a reference document (e.g. a coding style guide or an on-call runbook), attach it read-only to your sessions. Confirm agents reference it without modifying it.

Related lessons

intermediate 🎬 Anthropic · ~15 min

Giving Agents Their Own Computers

How Cursor gave cloud agents onboarding, dev environments, and the ability to self-report problems — and what the 'agent experience' means for teams shipping parallel agents at scale.

#agentic-workflows #managed-agents
intermediate 🎬 Anthropic · ~46 min

Routines, CI Autofix, and the Advisor Strategy

The biggest Claude Code platform updates from London 2026: routines that trigger on schedules and webhooks, CI that fixes its own failures, the advisor pattern for frontier-quality at lower cost, and self-hosted agent sandboxes.

#claude-code #agentic-workflows #managed-agents