Claude Code's Memory Is Simpler Than You Think — And That's a Problem

I read the leaked source code. Claude Code's memory system is just Markdown files + an LLM picker. No vector search, no embeddings, no RAG. Here's why that matters.

April 1, 2026
Harrison Guo
6 min read

The Hype vs. The Source Code

After Claude Code’s source leak, one of the most talked-about discoveries was Kairos — a “permanent memory” system that consolidates your notes while you sleep. The AI community described it as a breakthrough in AI memory.

I read the actual code in memdir/. It’s not a breakthrough. It’s Markdown files and a Sonnet side query.

What Claude Code’s Memory Actually Is

The entire memory system lives in 8 TypeScript files totaling 1,736 lines. Here’s how it works:

Storage: Plain Markdown Files

Every memory is a .md file with YAML frontmatter:

---
name: user prefers terse responses
description: no trailing summaries after code changes
type: feedback
---

User wants terse responses with no trailing summaries.
**Why:** They can read the diff themselves.
**How to apply:** Skip recap at end of every response.

There are exactly four types: user, feedback, project, reference. That’s it.

A single MEMORY.md file serves as the index — capped at 200 lines and 25KB. Exceed that and it gets truncated with a warning.

Retrieval: Ask Sonnet to Pick 5

When Claude Code needs to recall memories, it calls findRelevantMemories() in findRelevantMemories.ts. Here’s what it does:

  1. Scan the memory directory for all .md files
  2. Read each file’s filename and description (not the full content)
  3. Send this list to a Sonnet side query with the user’s current query
  4. Sonnet picks up to 5 files it thinks are relevant
  5. Those 5 files get loaded into context

From the source:

const result = await sideQuery({
  model: getDefaultSonnetModel(),
  system: SELECT_MEMORIES_SYSTEM_PROMPT,
  messages: [{
    role: 'user',
    content: `Query: ${query}\n\nAvailable memories:\n${manifest}`,
  }],
  max_tokens: 256,
  output_format: { type: 'json_schema', ... },
})

No vector database. No embeddings. No cosine similarity. No hybrid search. Just an LLM reading filenames and guessing which ones might be relevant.

Consolidation: Kairos = LLM Rewriting Markdown

The much-hyped Kairos “sleep memory” system (AutoDream) runs between sessions. What it actually does:

  • Reads all your memory files
  • Asks the LLM to find duplicates, outdated info, and conflicts
  • Rewrites the files

That’s it. It’s an LLM editing Markdown. There’s no algorithmic compression, no knowledge graph construction, no embedding-based deduplication.

What’s Missing

Compared to modern memory architectures, Claude Code’s approach has significant gaps:

No semantic search. If you saved a memory about “React component testing patterns” and later ask about “frontend unit tests,” Claude Code’s filename-based picker might miss it entirely. There’s no embedding similarity to bridge the semantic gap.

No hybrid retrieval. Production memory systems use weighted fusion of vector similarity + keyword search (BM25/FTS5). Claude Code uses neither.

Hard cap of 5 memories per query. If you have 200 memory files, only 5 get loaded per turn. The other 195 are invisible, no matter how relevant.

Index truncation. MEMORY.md is hard-capped at 200 lines / 25KB. Once your memory grows beyond that, the index itself gets truncated — meaning the LLM picker can’t even see those entries.

How OpenClaw Does It Better

OpenClaw’s memory system uses a fundamentally different architecture:

Claude CodeOpenClaw
StorageMarkdown files onlyMarkdown + vector index
RetrievalLLM reads filenames, picks 5Embedding cosine similarity + FTS5 full-text search, weighted fusion
PersistenceSession-based, needs Kairos to consolidateDaemon mode, continuous context across months
Scalability200-line index capSQLite-vec scales to thousands of memories
Semantic searchNoneEmbedding-based similarity matching

OpenClaw stores memories as plain Markdown (human-readable, editable) but also maintains a vector index using sqlite-vec. When you query, it runs both embedding similarity and full-text search, then fuses the scores with configurable weights.

The result: OpenClaw can retrieve semantically related memories even when the keywords don’t match. Claude Code can’t.

Community extensions like Mem0, Cognee, and LanceDB plugins push this further — offering cross-agent memory sharing, memory graphs, and automatic knowledge extraction.

The Real-World Impact

This isn’t theoretical. Users are hitting these limitations daily:

  • “Claude repeatedly fails to apply its own memory”GitHub Issue #37314 reports that memory files are correctly written but consistently ignored across sessions
  • Context loss after 20 messages — auto-compaction summarizes away critical details
  • Cross-project contamination — global memory causes suggestions from Project A to bleed into Project B
  • “Automatic memory is not learning” — storing facts is not the same as understanding patterns

The core architectural issue: Claude Code’s memory is write-optimized, not read-optimized. It’s easy to save a memory. It’s unreliable to retrieve the right one at the right time.

Why Anthropic Chose This Approach

To be fair, there are engineering reasons for the simplicity:

  1. Transparency — every memory is a readable .md file. No opaque embedding databases. Users can see, edit, and delete exactly what Claude remembers.
  2. No infrastructure — no vector database to install, no embeddings API to call, no index corruption to debug.
  3. Predictability — the system’s behavior is deterministic from the user’s perspective. You write a file, it’s in the index.
  4. Privacy — everything stays local. No embeddings sent to external services.

These are legitimate tradeoffs. But the result is a memory system that’s easy to understand and hard to rely on.

The Deeper Pattern: Program Walks

This connects back to the fundamental AI agent architecture: LLM talks, program walks.

The LLM’s job is to generate text. The program’s job is to manage state, retrieve context, and orchestrate execution. Memory is squarely on the program side — and Claude Code’s program side is weak here.

OpenClaw invests in the program layer: vector indices, hybrid retrieval, daemon persistence. Claude Code invests in the LLM layer: just ask Sonnet to figure it out.

When your memory system’s retrieval strategy is “ask another LLM to guess,” you’ve outsourced the hardest part of the problem to the part of the stack that’s least reliable at it.

What Would Make It Better

If Anthropic wanted to upgrade without abandoning their Markdown-first philosophy:

  1. Add local embeddings — run a small embedding model on each memory file, store vectors in SQLite-vec. Keep Markdown as source of truth.
  2. Hybrid retrieval — combine embedding similarity with FTS5 keyword search, like OpenClaw does.
  3. Raise the 5-file cap — or make it dynamic based on query complexity.
  4. Project-scoped memory — stop the cross-project contamination.
  5. Verification loop — after loading memories, verify they still apply to the current codebase state before acting on them.

The foundation is there. The type taxonomy (user/feedback/project/reference) and the drift caveats are well-designed. The storage layer just needs a real retrieval engine underneath it.


This analysis is based on the Claude Code source leaked via npm package v2.1.88. For the full leak breakdown, see: Claude Code Source Leaked: 5 Hidden Features

For the AI agent architecture that explains why memory sits on the “program” side: The AI Stack Explained — LLM Talks, Program Walks

🎧 More Ways to Consume This Content

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.