Memory and Dreaming: Building Self-Improving Agents
Design production memory systems for multi-agent architectures using filesystem-based memory stores, optimistic concurrency, and the dreaming feedback loop.
This lesson is original educational writing based on this video by Anthropic (published May 21, 2026). All credit for the original content goes to the creators.
The Isolation Problem
Every time an agent starts a session without memory, it starts from the same blank slate. Performance on each new task mirrors the last — there’s no learning curve, just repetition. Agents make the same mistakes independently, display the same inefficiencies, and duplicate effort that other agents have already done.
The goal is different: performance should improve from task to task, and from agent to agent. Memory is the mechanism that makes this possible. It lets agents carry forward learnings, avoid known pitfalls, and build on a shared understanding of the organisation they work in.
Why a File System?
Earlier memory implementations focused on capability in the harness — custom tools, CLAUDE.md files, SDK-level memory primitives. These worked but required careful engineering to keep in sync.
The shift in Anthropic’s managed-agent memory is simpler: model memory as a file system. Claude already excels at navigating virtual environments, using bash and grep, reading, updating, and organising files. Rather than building a bespoke memory interface, the design leans into what Claude already does well. The memory store mounts as a filesystem the agent can read and write freely.
This “get out of Claude’s way” principle also applied to skills. A flexible, minimal format turned out to create endless possibilities precisely because the model already understood how to work with it.
Multi-Agent Memory Architecture
Single-agent memory is straightforward. Multi-agent memory introduces new requirements:
- Multiple sessions reading and writing the same store simultaneously
- Different scopes: org-wide knowledge vs. task-specific state
- Write conflicts when two agents update the same file concurrently
- Enterprise controls: version history, attribution, audit trails
The solution is a layered store hierarchy:
| Scope | Access | Content |
|---|---|---|
| Organisation-wide | Read-only (for agents) | SLO policies, runbooks, on-call mappings — stable reference material |
| Task-specific | Read-write | Findings, decisions, fix status, in-flight state |
To prevent one agent from clobbering another’s writes, Anthropic uses optimistic concurrency control: an agent reads the current version, makes its update, and commits with the expected version. If a conflicting write happened in between, the commit fails and the agent retries — no locks, no blocking, good throughput under concurrent writes.
Dreaming: The Feedback Loop
Agents writing to memory as they work is locally optimal — like taking notes while doing a task. But scaled across many sessions, locally optimal becomes globally fragmented: agents independently learn from the same mistakes, duplicate findings, create overlapping entries.
Dreaming is Anthropic’s answer. It’s a batch process, completely decoupled from the agent loop, that:
- Reads session transcripts from past runs
- Inspects the current state of memory
- Proposes curated, consolidated updates
- Produces a verified new memory snapshot the next agents can adopt
Why out-of-band matters
Three benefits come from dreaming’s decoupled architecture:
-
Cross-session pattern detection. A single agent can only see its own history. Dreaming analyzes transcripts across all agents and sessions, which is where recurring mistakes and systemic inefficiencies become visible. In the SRE demo, dreaming discovered that a CPU spike was always followed 60 seconds later by an alert — a pattern no individual agent session could have noticed.
-
No objective conflict. An agent running in production must balance improving its memory quality against completing its actual task. Dreaming runs independently, so it can focus entirely on memory quality without trading off against task performance.
-
Zero latency added. Dreaming is completely off the hot path. It can run nightly, hourly, ad hoc, or triggered by events like end-of-session — all via API.
Production Patterns
Scheduling. Dreaming can be triggered via API on any cadence. Common patterns: nightly consolidation, end-of-sprint review, or event-triggered after a significant incident closes.
Memory API. Memory has a standalone CRUD API, so teams can manage it from anywhere — not just from within agents. This includes exports (for audits), redactions (for compliance), and diffs between versions.
Enterprise controls. Every write is version-controlled with attribution: which session wrote which part. Teams can inspect how memory evolved over time and roll back bad updates — critical for production trust.
Harvey’s result. With dreaming enabled on their legal benchmark, Harvey saw a 6× increase in agent completion rates. The cause: agents were independently learning from the same failures, each storing a fragmented lesson. Dreaming consolidated those into shared, high-quality guidance that every subsequent agent benefited from.
Check your understanding
4 questions · your answers are saved in this browser only
-
1. Why does Anthropic model memory as a file system rather than as a custom database?
-
2. What is the purpose of the organisation-wide read-only memory store in a multi-agent system?
-
3. Why is dreaming's out-of-band, decoupled design important?
-
4. What mechanism prevents two agents from overwriting each other's memory writes simultaneously?
Build it yourself
Follow these exact steps to reproduce it yourself
Try it yourself: persistent memory with the Managed Agents API
-
Provision a memory store via the Claude platform console or API. Start with a single read-write store.
-
Attach it to a session. Pass the memory store ID when creating a managed agent session. The store mounts as a virtual filesystem the agent can read and write using standard file tools.
-
Run two sessions sequentially on related tasks. In the second session, add to the system prompt: “Before starting, review the memory store for relevant prior findings.” Check whether session 2 references session 1’s work.
-
Trigger a dream. After three or more sessions, kick off dreaming via the API (or the console Dreams tab). Inspect the diff in the memory store — look for consolidation, deduplication, and new cross-session insights.
-
Add a read-only org store. Create a second store, populate it with a reference document (e.g. a coding style guide or an on-call runbook), attach it read-only to your sessions. Confirm agents reference it without modifying it.