Agent Memory Is a Cache Coherence Problem

Agent Memory Is a Cache Coherence Problem

Lossless curated notes vs lossy auto-compression with vector recall: two AI-memory designs that fail differently. One fails like a cache — classical systems already mapped it.

May 28, 2026
Harrison Guo
20 min read
Runtime & Distributed Systems AI Agent Production Engineering

This post is one half of a pair. The other half — Agent Retrieval Is a Cost Curve Problem — argues that Claude Code’s within-session code retrieval avoids RAG because the cost curve says it should. This piece argues something parallel about cross-session memory: the lossy auto-capture systems being marketed as “AI memory” are, in classical distributed-systems vocabulary, caches. They inherit every problem caches have always had, and the hype around them is mostly arguing for one side of a write-back vs write-through trade as if the other side didn’t exist.

Sequel to Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works. If you remember that piece, you’ll recognize the move: take a problem space the AI community is debating with fresh vocabulary, and notice that the database community already mapped the failure modes thirty years ago.

tl;dr — Cross-session agent memory varies on two independent axes, not one: fidelity (lossless vs lossy) and retrieval (exact lookup vs approximate vector). Claude Code’s built-in memory plus a hand-written CLAUDE.md lives at lossless + exact. The currently-trending claude-mem (70k+ GitHub stars as of May 2026) lives at lossy + approximate — auto-capture passed through a Haiku compression step and recalled via SQLite-FTS5 + Chroma vectors. The second is, structurally, a cache: a derived lossy view of the source of truth, retrieved approximately. It inherits every cache problem the distributed-systems literature already named — staleness, wrong-row retrieval, no coherence with the source. I ran claude-mem under controlled conditions and compared it against the deterministic CLAUDE.md baseline; the numbers (and the kinds of failures) line up with the classical cache framing. The most interesting failure isn’t tokens or latency. It’s that compression flattens modality — a hedged hypothetical becomes a flat fact, indistinguishable from a firm decision. An agent confidently acting on a maybe-it-said-yes is worse than an agent with no memory at all.


The Two Axes (Don’t Collapse Them Into One)

Most takes on agent memory collapse the design space onto a single axis: “lossless and limited” vs “lossy and powerful.” That framing hides the failure modes.

The real space is two-dimensional:

  1. Fidelitylossless (verbatim, what-you-wrote-is-what-was-stored) vs lossy (LLM-compressed: a summary written by a smaller model over the raw events).
  2. Retrievalexact / curated (you wrote an index entry; the system reads it back) vs approximate / semantic (vector embeddings; cosine similarity; top-K nearest neighbors).
flowchart TD
    subgraph axes["Two axes, four quadrants"]
        direction LR
        subgraph col1["Exact retrieval"]
            q1["Lossless + Exact
CLAUDE.md / hand-curated
Precision: high
Coverage: low
Upkeep: manual"] q3["Lossy + Exact
Compressed but keyword-indexed
(unusual in practice)"] end subgraph col2["Approximate retrieval"] q2["Lossless + Approximate
Raw logs + vector search
(huge storage; weak signal)"] q4["Lossy + Approximate
claude-mem and most 'AI memory' tools
Auto-compress + vector recall
Two layers of approximation"] end end classDef good fill:#f0fff4,stroke:#2f855a classDef warn fill:#fef5e7,stroke:#b7791f classDef bad fill:#fed7d7,stroke:#c53030 class q1 good class q2 warn class q3 warn class q4 bad

The interesting failures live in the bottom-right quadrant — lossy + approximate — because the failures of one axis are invisible to a user evaluating along the other. The system loses information at write time and approximates at read time, and the user sees a single “answer” that fused both losses. Debugging means asking: was that wrong because the original event was corrupted in compression, or because retrieval surfaced the wrong row? You usually can’t tell.

Most takes conflate “lossy = unreliable” with “vector = powerful.” They’re orthogonal. You can have lossless + vector (raw logs, vector-indexed — fine but storage-heavy and signal-weak). You can have lossy + exact (compressed summaries, FTS-indexed — works for some applications). Lossy + approximate is what’s being marketed as “AI memory,” and it’s the quadrant most exposed to compounding failure modes.

The Baseline: Lossless + Exact (CLAUDE.md + built-in memory)

Claude Code’s built-in memory system, paired with a hand-written CLAUDE.md, sits firmly in the lossless + exact quadrant. The design choices, made explicit:

  • Verbatim storage. What the author wrote is what gets stored. Markdown in, Markdown out. There’s no compression step.
  • Always-loaded index + on-demand body. MEMORY.md (the index) gets injected into every session — capped, deliberately, around 200 lines to avoid context bloat. Individual memory files are read on demand, when the index entry suggests one is relevant.
  • Curated, not a firehose. A human (or a structured prompt) decides what is worth storing. Not every tool call. Not every file read. Only the durable, surprising, cross-session-useful facts.
  • Exact recall. The model reads a specific file. Either it’s there and is read verbatim, or it isn’t. No fuzzy near-matches; no confidence score.

Mental model: a hand-written WAL (write-ahead log) plus a curated index. Both are close to a source of truth — the author’s deliberate decision — and they recall exactly what was committed.

Tradeoffs are visible from this framing:

  • Precision: 100%. What you stored is what you get back.
  • Auditability: you can read the file yourself. No black box.
  • Token economics: index sits in context; bodies fetched only when needed. Cheap.
  • Coverage: limited to what the author bothered to write down.
  • Upkeep: manual. Memories rot; updating them is a chore.

The earlier post in this series, Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs, walks through the specific design choices in the publicly circulated build snapshot. Here I’ll focus on what happens when you put the lossless+exact baseline next to the lossy+approximate contender.

The Contender: Lossy + Approximate (claude-mem)

claude-mem is among the highest-starred entries in the agent-memory category right now (70k+ GitHub stars as of May 2026). I tested v13.2.0. The architecture, summarized:

  • Auto-capture firehose. Lifecycle hooks (SessionStart, UserPromptSubmit, PostToolUse, Stop, SessionEnd) fire on essentially everything the model does. The hooks pipe events to a Bun worker on localhost:37701.
  • Compression to facts/narrative. At session boundaries the worker invokes Haiku 4.5 to compress raw observations into structured facts (a JSON array) and a narrative (a paragraph). This compression runs on your own Claude subscription — billed to your quota, ~5,150 compression tokens per session in my test.
  • Hybrid index. Compressed observations are written to two indexes simultaneously: SQLite-FTS5 (full-text keyword) and Chroma (vector embeddings). Recall is hybrid — keyword and ANN together.
  • Disables built-in memory. Installation sets CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 in ~/.claude/settings.json. The built-in CLAUDE.md path is turned off; claude-mem is meant to replace, not augment.
  • Hardcoded data dir. Despite respecting CLAUDE_CONFIG_DIR for plugin config, the data store path (~/.claude-mem) is hardcoded. Sandboxing is partial.

Mental model: a derived, lossy materialized view of session events, plus a similarity cache for retrieval. Two layers of approximation: a lossy write transform (Haiku compression) and an approximate read transform (ANN). Each compounds the other.

This is exactly what the bottom-right quadrant looks like in deployment.

The Test (Real Numbers, 2026-05-20)

To make the comparison concrete I built a small URL-shortener as the test bed: simple enough that the “right answer” was unambiguous, structured enough that real architectural decisions had to be recorded. The setup:

  • Test arm: claude-mem v13.2.0, sandboxed via CLAUDE_CONFIG_DIR=/tmp/cmem-test/dot-claude. Built-in memory disabled (per install default). Session 1 established four decisions about the codebase; Session 2 asked for them back. This is the arm I measured.
  • Baseline: the same four decisions written into a hand-curated CLAUDE.md — 1,075 chars, ~269 tokens. Built-in memory intact. The baseline numbers in the table below are deterministic properties of how the built-in CLAUDE.md path works (verbatim recall, no extra round-trip, no compression bill), not a separately measured session.

The decisions in Session 1 (so the comparison is fair):

  1. Use Redis for URL storage.
  2. Generate short codes with base62.
  3. Add a 30-day TTL.
  4. “We could refresh the TTL on each access if we want sliding expiration.” (Note the hedge — this is the modality test that matters.)

Numbers, side by side:

Measureclaude-mem (v13.2.0)bare CLAUDE.md
Per recall-cycle tokens~6,700~280
↳ Passive context injection on session start~1,050~269 (full file)
mcp-search retrieval round-trip~5020
↳ Haiku 4.5 compression cost (charged to your quota)~5,150 / session0
Extra round-trip for detailsyes (~22s)no
Fidelity of recalllossy (see below)100% verbatim
Built-in memory statedisabled (CLAUDE_CODE_DISABLE_AUTO_MEMORY=1)intact
Compression coston user’s Claude subscriptionnone
Headless capture (claude -p)zero eventsn/a
Upkeepautomaticmanual edit
Measure Per recall-cycle tokens
claude-mem (v13.2.0) ~6,700
bare CLAUDE.md ~280
Measure ↳ Passive context injection on session start
claude-mem (v13.2.0) ~1,050
bare CLAUDE.md ~269 (full file)
Measure mcp-search retrieval round-trip
claude-mem (v13.2.0) ~502
bare CLAUDE.md 0
Measure ↳ Haiku 4.5 compression cost (charged to your quota)
claude-mem (v13.2.0) ~5,150 / session
bare CLAUDE.md 0
Measure Extra round-trip for details
claude-mem (v13.2.0) yes (~22s)
bare CLAUDE.md no
Measure Fidelity of recall
claude-mem (v13.2.0) lossy (see below)
bare CLAUDE.md 100% verbatim
Measure Built-in memory state
claude-mem (v13.2.0) disabled (CLAUDE_CODE_DISABLE_AUTO_MEMORY=1)
bare CLAUDE.md intact
Measure Compression cost
claude-mem (v13.2.0) on user’s Claude subscription
bare CLAUDE.md none
Measure Headless capture (claude -p)
claude-mem (v13.2.0) zero events
bare CLAUDE.md n/a
Measure Upkeep
claude-mem (v13.2.0) automatic
bare CLAUDE.md manual edit

The token gap — 6,700 vs 280 — is meaningful but not the headline. The headline is the fidelity row.

The Sharpest Failure: Compression Flattens Modality

The four decisions written in Session 1 included three firm choices and one hedge:

“We use Redis… base62 short codes… 30-day TTL… we could refresh the TTL on each access if we want sliding expiration.”

When I read the raw observation row that claude-mem wrote, the four items appeared as a flat JSON array — facts: [...] — with no modal marker distinguishing the hedge from the decisions. The hedge had been flattened into the same shape as the firm choices.

Session 2 confirmed it. I asked the recalling agent to describe the TTL design. It cheerfully reported “we refresh the TTL on each access for sliding expiration” — as though that had been decided. When I challenged it directly, its own reply was that it could not distinguish firm decisions from options that had merely been considered. The compressed facts[] row it was reading from preserved the content of each item but not its modal status — what I’ll call its epistemic status throughout the rest of this post.

That’s the failure. The lossy layer loses epistemic status, not just bytes. A maybe becomes a decision. The recalling agent has no way to know it shouldn’t trust the row.

This is worse than no memory. An agent with no memory has to ask, or reread, or check the code. An agent with confident-wrong memory acts. The cost of acting on a fabricated decision compounds: now there’s code (or a PR, or an architectural note) committed under the false premise, and that will be the next round’s input.

The generalization: any LLM compression step that maps “speech-act varieties” (decisions, hypotheses, questions, jokes, hedges) onto a single typed structure — like a facts[] array — loses the modal axis. To preserve it, you’d need to compress into a richer schema (with kind: 'decision' | 'option' | 'question' per item), and you’d need the compression model to reliably tag the modality. Haiku 4.5 didn’t tag it. Whether a more careful prompt or schema would is an open question, but it’s a design question the current tool doesn’t even pose.

Six Measured Findings, Versioned to v13.2.0

In one place, six things I measured. Versioned because tool behavior changes:

  1. Replaces, not augments. Install sets CLAUDE_CODE_DISABLE_AUTO_MEMORY=1. Built-in memory is turned off. Default deployment is single-system, not hybrid.
  2. Partial sandbox. CLAUDE_CONFIG_DIR redirects plugin config, but the data store path ~/.claude-mem is hardcoded. Multi-tenant isolation is incomplete.
  3. Compression runs on your subscription. Haiku 4.5 compresses observations to ~5,150 tokens per session, billed to your Anthropic quota. Free tools that consume your paid quota deserve a footnote.
  4. Invisible to headless mode. claude -p runs (non-interactive) emit zero capture events in my tests. The lifecycle hooks fire only in interactive sessions. CI users and automation pipelines get no memory at all.
  5. Compression flattens modality (the sharpest finding, detailed above). A hedge becomes a flat fact, indistinguishable from a firm decision in the compressed facts[] schema.
  6. Token economics lose on small projects. ~6,700 tokens per recall cycle (passive inject + mcp-search round-trip + Haiku compression) versus ~280 deterministic, 100%-faithful tokens for the CLAUDE.md baseline. On a 1,000-line project, the per-token cost gap is wider than any retrieval benefit it provides.

Tool-version drift is real — by the time you read this, some of these may have been fixed. The cache-coherence framing in the next section is version-independent and was the actual reason I wrote the post.

Why This Is a Cache Problem, Precisely

The distributed-systems vocabulary for this design is materialized view of a source, served from a similarity cache.

  • Write-time lossy transform = a materialized view that can drift from the source of truth (the actual codebase, the actual decisions). The source is the user’s intent and the live code; the view is the compressed facts/narrative. Each write step can lose information that the view will never recover.
  • Read-time ANN = approximate retrieval. Top-K nearest neighbors. False positives are structural, not a bug — a sufficiently-similar wrong row will be returned with confidence indistinguishable from the right one.
  • No coherence with the source. Classical caches have invalidation hooks — write-through, write-back, snoop protocols, MESI states. They tie cache lines back to the canonical source so that writes propagate and stale lines get evicted or rewritten. claude-mem has no tie to the codebase or to user-issued corrections. You reverse a decision in conversation, the memory still believes the original.
  • Staleness without expiry. Even without explicit invalidation, classical caches use TTLs to bound staleness. claude-mem has no TTL on facts. A fact written six months ago competes for retrieval with one written yesterday, and the older one might win the vector hop.

When the cache-coherence frame is the right frame, the literature is rich and useful. Pat Helland’s Immutability Changes Everything (ACM Queue, 2015) and the broader databases-and-OS literature on cache-coherence protocols (MESI / MOESI), materialized-view invalidation, and write-through vs write-back are the right starting reading. The trades they describe — staleness vs cost, eventual vs strong coherence, when to flush, when to invalidate — are the same trades the agent-memory community is rediscovering with fresh names.

flowchart LR
    subgraph src["Source of truth"]
        s1["User intent
Live code
Conversation"] end subgraph write["Write path (lossy)"] w1["Haiku 4.5
compression"] w2["facts[] · narrative
(no modality, no provenance)"] end subgraph store["Materialized view"] st1["SQLite FTS5"] st2["Chroma vectors"] end subgraph read["Read path (approximate)"] r1["query vector
+ keyword"] r2["top-K hits"] end s1 -->|hook captures| w1 w1 --> w2 w2 --> st1 w2 --> st2 st1 --> r2 st2 --> r2 r1 --> r2 r2 -->|"injected as fact"| s1 classDef src fill:#e6fffa,stroke:#319795 classDef lossy fill:#fed7d7,stroke:#c53030 classDef store fill:#fef5e7,stroke:#b7791f classDef read fill:#ebf4ff,stroke:#5a67d8 class s1 src class w1,w2 lossy class st1,st2 store class r1,r2 read

What’s missing from this diagram, and crucially: a backedge from “source of truth” to the materialized view that fires when the source changes. That’s the invalidation arrow. Its absence is the structural reason claude-mem gets wrong-row retrieval on decisions the user has reversed. Until something supplies that arrow, the system is best understood as a write-only cache.

When Each Wins

Cost-curve thinking (the same frame used in the companion piece) gives a clean answer:

Lossless + Exact wins when:

  • Project size is small or scope is clear. Curation is cheap; the manual upkeep budget is small.
  • Fidelity matters. You need to recall the exact decision, not a vibe of it. Coding agents, design decisions, security policy.
  • The author exists and is engaged. Someone is willing to write three lines into MEMORY.md when a decision is made.

Lossy + Approximate wins when:

  • History is too big to hand-curate. A year of conversations across multiple contributors, none of whom can be expected to maintain a MEMORY.md.
  • Coverage matters more than precision. You’d rather have a fuzzy memory that something was discussed than no memory at all. Customer-support agents over a year of tickets; team retrospectives over a quarter of standups.
  • The cost of acting on a fabricated fact is low. A confident-wrong recall in a support agent says “sorry let me check”; the user corrects it. The same recall in a coding agent ships broken code to production. The blast radius of a false positive determines the budget for accepting one.

The rule of thumb: fuzzy-but-present beats precise-but-absent, but only when the false-positive cost is low enough to absorb. For coding work on a 5,000-LOC project, it isn’t.

A Decision Framework

SignalLossless + ExactLossy + ApproximateHybrid
Single project, < 50k LOC
Multi-project / multi-year history
Decisions need exact recall
Vague-but-present recall is acceptable
Author is engaged (willing to curate)
No human curator available
Cost of confident-wrong is high (production code, money)
Cost of confident-wrong is low (suggestion, search)
You can pay the Haiku compression bill from your quotan/a
You operate headlessly or via CI
Signal Single project, < 50k LOC
Lossless + Exact
Lossy + Approximate
Hybrid
Signal Multi-project / multi-year history
Lossless + Exact
Lossy + Approximate
Hybrid
Signal Decisions need exact recall
Lossless + Exact
Lossy + Approximate
Hybrid
Signal Vague-but-present recall is acceptable
Lossless + Exact
Lossy + Approximate
Hybrid
Signal Author is engaged (willing to curate)
Lossless + Exact
Lossy + Approximate
Hybrid
Signal No human curator available
Lossless + Exact
Lossy + Approximate
Hybrid
Signal Cost of confident-wrong is high (production code, money)
Lossless + Exact
Lossy + Approximate
Hybrid
Signal Cost of confident-wrong is low (suggestion, search)
Lossless + Exact
Lossy + Approximate
Hybrid
Signal You can pay the Haiku compression bill from your quota
Lossless + Exact n/a
Lossy + Approximate
Hybrid
Signal You operate headlessly or via CI
Lossless + Exact
Lossy + Approximate
Hybrid

For most readers of this blog — engineers working on a single non-trivial codebase, where decisions matter and confident-wrong is expensive — the columns lean hard left.

What a Coherent Agent Memory System Would Need

The interesting question, once you accept the cache-coherence frame, is: what would the lossy + approximate corner look like if it were built like a real cache instead of a write-only one?

A short list of capabilities the current generation of “AI memory” tools is missing, and which any serious system in this space will eventually have to ship:

  1. Source pointer (provenance). Every fact carries a back-pointer to the originating event: timestamp, session ID, the raw transcript turn or tool result it was derived from. Without this, you can’t audit a wrong recall — you only see the fact, never its lineage.
  2. Modality tagging. Every fact tagged with epistemic status — decision | option_considered | hypothesis | question | observation — at write time, by the compression model. Without this, the system loses what the failure section above showed: the difference between we will and we could.
  3. Supersedes / invalidation chain. A later fact can declare an earlier fact superseded (“decision A was reversed on date T by B”). Recall surfaces the latest applicable fact, not the most semantically similar one. This is the in-band invalidation classical caches use; agent memory currently has none.
  4. Expiry / TTL by class. Decisions might be permanent; observations rot fast (“the build was passing this morning” should not influence behavior at 4 PM). Different fact classes get different TTLs.
  5. Invalidation hook tied to the source of truth. When the underlying codebase or document changes in a way that contradicts a stored fact, the fact gets flagged for re-validation. This is the write-through arrow in the cache diagram earlier — currently absent.
  6. Confidence surfaced to the caller. Instead of returning a flat string, return {value, confidence, provenance}. The recalling agent then knows when to trust, when to double-check, when to ignore.

None of these is novel. All of them are standard in production cache systems, query planners, and event-sourcing stores. They’re hard to retrofit onto a system that wasn’t designed with provenance and invalidation as first-class concerns. They’re not hard to design in from the start — but doing so means giving up the “just drop a hook on everything, ship next week” simplicity that makes the current crop of tools accumulate stars.

If you’re building an agent that has to act on its memory rather than just display it, this list is the spec.

Closing — “Persistent Memory” Is a Trade, Not a Feature

“Persistent AI memory” gets talked about like a feature you turn on. It isn’t. It’s a choice on the two-axis design space above, and every position on that space has known failure modes. The lossless-and-exact corner has the upkeep cost and the coverage limit. The lossy-and-approximate corner has the staleness, the wrong-row retrieval, and — the finding I came away most surprised by — the loss of modality.

Two takeaways worth carrying out:

  1. If someone is selling you “automatic AI memory,” ask which quadrant. If the answer is lossy + approximate, ask the six questions from the section above: provenance, modality, supersedes, TTL, invalidation, confidence. If the answer to most of them is “the embeddings handle it,” you’re being sold a cache wearing the word memory.
  2. The classical literature is the right starting point. Cache coherence, write-back vs write-through, eventual vs strong consistency, materialized-view invalidation — the database and OS communities have spent forty years working through these tradeoffs. Reading their writing is more useful than reading the latest agent-memory thread.

The companion to this post — Agent Retrieval Is a Cost Curve Problem — argues a parallel thing about within-session code retrieval: that Claude Code’s “use grep, not RAG” choice isn’t romance (“trust the model”) but math (cost curves), and that the source code shows Anthropic A/B-testing alternative retrieval architectures (Explore vs Fork) in production. Read together, the two pieces add up to a coherent stance about Anthropic’s bets across the fidelity × retrieval design space: in both within-session code search and cross-session memory, the default is lossless + exact, and the alternative branches are kept gated behind feature flags so the decisions can flip when the cost curves do. The memory side alone has at least four such gates visible in the snapshot I reviewed — tengu_coral_fern, tengu_herring_clock, tengu_passport_quail, tengu_slate_thimble — plus the build-time KAIROS, TEAMMEM, and EXTRACT_MEMORIES gates in src/memdir/.

That stance is not “we trust the model.” It’s: read the cost curves, build for the curves’ current shape, leave the toggles in for when they shift.

Appendix: How I Measured This

For the reader who wants to reproduce — or, more usefully, who wants to know exactly what was and wasn’t measured.

Versions and environment.

  • claude-mem v13.2.0 (npm install via the project’s standard install script)
  • Claude Code: current public release at time of test (2026-05-20)
  • macOS 25.4.0, zsh
  • Sandbox: CLAUDE_CONFIG_DIR=/tmp/cmem-test/dot-claude for the plugin config; data store at ~/.claude-mem (this directory location is hardcoded inside the tool — see Finding #2)

Test project. A small URL-shortener spec written from scratch, with four decisions in Session 1: (1) Redis for URL storage, (2) base62 short-code generation, (3) 30-day TTL, (4) the hedged sliding-expiration option that became the modality test.

Commands and protocol.

  1. Fresh claude-mem install into the sandboxed CLAUDE_CONFIG_DIR. Verified install behavior set CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 in settings.json (Finding #1).
  2. Session 1: interactive Claude Code session, walked through the four decisions with the tool actively capturing via lifecycle hooks. Watched the Bun worker on localhost:37701 accept events.
  3. Session ended; session-boundary compression fired; Haiku-compressed facts[] and narrative written to SQLite-FTS5 + Chroma. Token count for the compression call read from the API trace.
  4. Session 2: fresh interactive session; queried for each of the four decisions; observed the recall path (mcp-search round-trip; ~22s extra latency).
  5. Compared retrieved content against the original Session 1 transcript byte-by-byte to identify the modality flattening (Finding #5).
  6. Repeated the install + Session 1 pattern in headless claude -p mode to confirm Finding #4 (no events captured).

What I’m explicitly not claiming.

  • This is a single test run on a deliberately small project. The N is 1.
  • The CLAUDE.md baseline column in the test table is not a separately measured comparison session. It reflects deterministic properties of the built-in CLAUDE.md path (verbatim recall, no compression bill, no extra round-trip) that follow from the design — not a measured outcome.
  • I didn’t benchmark Chroma vector recall quality across many queries. The modality finding came from a single targeted probe (the TTL question); the cache-coherence framing predicts the same class of failure across many queries, but predicting and measuring are different.
  • I tested v13.2.0. The tool is actively developed; specific findings may have been addressed in later releases by the time you read this. The cache-coherence framing is version-independent.

Artifacts. Session 1 / Session 2 transcripts, the raw claude-mem SQLite + Chroma snapshots, and the token-cost API traces from the test run are kept locally and can be made available on reasonable request — get in touch.


Companion piece: Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn’t Use RAG Background: Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works For the design rationale behind Claude Code’s built-in memory in particular: Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs

🎧 More Ways to Consume This Content

I occasionally advise small teams on backend reliability, Go performance, and production AI systems. Learn more: /services

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.