Is claude-mem a scam?

No. It's well-built (Bun worker, SQLite-FTS5 + Chroma hybrid, lifecycle hooks done correctly) and the team behind it ships quickly. It is solving a real problem — automatic capture of conversational state across sessions — that the built-in memory does not solve. The argument here is not 'don't use it.' It's 'understand what you're trading for.' On a small, clear project, the trade is bad. On a sprawling, year-long, multi-contributor codebase, the trade is good.

Can't I just use both — CLAUDE.md for important things, claude-mem for the rest?

In principle yes. In practice, claude-mem installs with `CLAUDE_CODE_DISABLE_AUTO_MEMORY=1`, which turns off the built-in memory path entirely. So 'both' isn't the default; it would require a custom configuration. The product wants to replace, not augment.

Doesn't a reranker fix the wrong-row retrieval problem?

A reranker reorders top-K vector hits. It can promote a more-relevant chunk to the top. It does not change what was *written* — and the write step is where modality ("could do X if we want") was already collapsed into a flat fact. The reranker is reading from a corrupted view; reordering doesn't uncorrupt it.

If 'lossy memory acts like a cache' is the right framing, what's the equivalent of cache invalidation?

It's the central unsolved problem. In classical caches you have a coherence protocol — write-through, write-back, MESI-style state machines — tying cache lines back to the source of truth so writes propagate. Lossy agent memory has *no* tie to the source of truth (the codebase or the user's actual decisions). When you reverse a decision, the memory still believes the original. The closest fix is periodic full re-compression, which is just a manual cache flush, expensive and lossy in its own way.

Agent Memory Is a Cache Coherence Problem

Lossless curated notes vs lossy auto-compression with vector recall: two AI-memory designs that fail differently. One fails like a cache — classical systems already mapped it.

May 28, 2026

Harrison Guo

20 min read

Runtime & Distributed Systems AI Agent Production Engineering

This post is one half of a pair. The other half — Agent Retrieval Is a Cost Curve Problem — argues that Claude Code’s within-session code retrieval avoids RAG because the cost curve says it should. This piece argues something parallel about cross-session memory: the lossy auto-capture systems being marketed as “AI memory” are, in classical distributed-systems vocabulary, caches. They inherit every problem caches have always had, and the hype around them is mostly arguing for one side of a write-back vs write-through trade as if the other side didn’t exist.

Sequel to Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works. If you remember that piece, you’ll recognize the move: take a problem space the AI community is debating with fresh vocabulary, and notice that the database community already mapped the failure modes thirty years ago.

tl;dr — Cross-session agent memory varies on two independent axes, not one: fidelity (lossless vs lossy) and retrieval (exact lookup vs approximate vector). Claude Code’s built-in memory plus a hand-written CLAUDE.md lives at lossless + exact. The currently-trending claude-mem (70k+ GitHub stars as of May 2026) lives at lossy + approximate — auto-capture passed through a Haiku compression step and recalled via SQLite-FTS5 + Chroma vectors. The second is, structurally, a cache: a derived lossy view of the source of truth, retrieved approximately. It inherits every cache problem the distributed-systems literature already named — staleness, wrong-row retrieval, no coherence with the source. I ran claude-mem under controlled conditions and compared it against the deterministic CLAUDE.md baseline; the numbers (and the kinds of failures) line up with the classical cache framing. The most interesting failure isn’t tokens or latency. It’s that compression flattens modality — a hedged hypothetical becomes a flat fact, indistinguishable from a firm decision. An agent confidently acting on a maybe-it-said-yes is worse than an agent with no memory at all.

The Two Axes (Don’t Collapse Them Into One)

Most takes on agent memory collapse the design space onto a single axis: “lossless and limited” vs “lossy and powerful.” That framing hides the failure modes.

The real space is two-dimensional:

Fidelity — lossless (verbatim, what-you-wrote-is-what-was-stored) vs lossy (LLM-compressed: a summary written by a smaller model over the raw events).
Retrieval — exact / curated (you wrote an index entry; the system reads it back) vs approximate / semantic (vector embeddings; cosine similarity; top-K nearest neighbors).

flowchart TD
    subgraph axes["Two axes, four quadrants"]
        direction LR
        subgraph col1["Exact retrieval"]
            q1["Lossless + Exact
CLAUDE.md / hand-curated
Precision: high
Coverage: low
Upkeep: manual"]
            q3["Lossy + Exact
Compressed but keyword-indexed
(unusual in practice)"]
        end
        subgraph col2["Approximate retrieval"]
            q2["Lossless + Approximate
Raw logs + vector search
(huge storage; weak signal)"]
            q4["Lossy + Approximate
claude-mem and most 'AI memory' tools
Auto-compress + vector recall
Two layers of approximation"]
        end
    end

    classDef good fill:#f0fff4,stroke:#2f855a
    classDef warn fill:#fef5e7,stroke:#b7791f
    classDef bad fill:#fed7d7,stroke:#c53030
    class q1 good
    class q2 warn
    class q3 warn
    class q4 bad

The interesting failures live in the bottom-right quadrant — lossy + approximate — because the failures of one axis are invisible to a user evaluating along the other. The system loses information at write time and approximates at read time, and the user sees a single “answer” that fused both losses. Debugging means asking: was that wrong because the original event was corrupted in compression, or because retrieval surfaced the wrong row? You usually can’t tell.

Most takes conflate “lossy = unreliable” with “vector = powerful.” They’re orthogonal. You can have lossless + vector (raw logs, vector-indexed — fine but storage-heavy and signal-weak). You can have lossy + exact (compressed summaries, FTS-indexed — works for some applications). Lossy + approximate is what’s being marketed as “AI memory,” and it’s the quadrant most exposed to compounding failure modes.

The Baseline: Lossless + Exact (`CLAUDE.md` + built-in memory)

Claude Code’s built-in memory system, paired with a hand-written CLAUDE.md, sits firmly in the lossless + exact quadrant. The design choices, made explicit:

Verbatim storage. What the author wrote is what gets stored. Markdown in, Markdown out. There’s no compression step.
Always-loaded index + on-demand body. MEMORY.md (the index) gets injected into every session — capped, deliberately, around 200 lines to avoid context bloat. Individual memory files are read on demand, when the index entry suggests one is relevant.
Curated, not a firehose. A human (or a structured prompt) decides what is worth storing. Not every tool call. Not every file read. Only the durable, surprising, cross-session-useful facts.
Exact recall. The model reads a specific file. Either it’s there and is read verbatim, or it isn’t. No fuzzy near-matches; no confidence score.

Mental model: a hand-written WAL (write-ahead log) plus a curated index. Both are close to a source of truth — the author’s deliberate decision — and they recall exactly what was committed.

Tradeoffs are visible from this framing:

✅ Precision: 100%. What you stored is what you get back.
✅ Auditability: you can read the file yourself. No black box.
✅ Token economics: index sits in context; bodies fetched only when needed. Cheap.
❌ Coverage: limited to what the author bothered to write down.
❌ Upkeep: manual. Memories rot; updating them is a chore.

The earlier post in this series, Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs, walks through the specific design choices in the publicly circulated build snapshot. Here I’ll focus on what happens when you put the lossless+exact baseline next to the lossy+approximate contender.

The Contender: Lossy + Approximate (`claude-mem`)

claude-mem is among the highest-starred entries in the agent-memory category right now (70k+ GitHub stars as of May 2026). I tested v13.2.0. The architecture, summarized:

Auto-capture firehose. Lifecycle hooks (SessionStart, UserPromptSubmit, PostToolUse, Stop, SessionEnd) fire on essentially everything the model does. The hooks pipe events to a Bun worker on localhost:37701.
Compression to facts/narrative. At session boundaries the worker invokes Haiku 4.5 to compress raw observations into structured facts (a JSON array) and a narrative (a paragraph). This compression runs on your own Claude subscription — billed to your quota, ~5,150 compression tokens per session in my test.
Hybrid index. Compressed observations are written to two indexes simultaneously: SQLite-FTS5 (full-text keyword) and Chroma (vector embeddings). Recall is hybrid — keyword and ANN together.
Disables built-in memory. Installation sets CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 in ~/.claude/settings.json. The built-in CLAUDE.md path is turned off; claude-mem is meant to replace, not augment.
Hardcoded data dir. Despite respecting CLAUDE_CONFIG_DIR for plugin config, the data store path (~/.claude-mem) is hardcoded. Sandboxing is partial.

Mental model: a derived, lossy materialized view of session events, plus a similarity cache for retrieval. Two layers of approximation: a lossy write transform (Haiku compression) and an approximate read transform (ANN). Each compounds the other.

This is exactly what the bottom-right quadrant looks like in deployment.

The Test (Real Numbers, 2026-05-20)

To make the comparison concrete I built a small URL-shortener as the test bed: simple enough that the “right answer” was unambiguous, structured enough that real architectural decisions had to be recorded. The setup:

Test arm: claude-mem v13.2.0, sandboxed via CLAUDE_CONFIG_DIR=/tmp/cmem-test/dot-claude. Built-in memory disabled (per install default). Session 1 established four decisions about the codebase; Session 2 asked for them back. This is the arm I measured.
Baseline: the same four decisions written into a hand-curated CLAUDE.md — 1,075 chars, ~269 tokens. Built-in memory intact. The baseline numbers in the table below are deterministic properties of how the built-in CLAUDE.md path works (verbatim recall, no extra round-trip, no compression bill), not a separately measured session.

The decisions in Session 1 (so the comparison is fair):

Use Redis for URL storage.
Generate short codes with base62.
Add a 30-day TTL.
“We could refresh the TTL on each access if we want sliding expiration.” (Note the hedge — this is the modality test that matters.)

Numbers, side by side:

Measure	claude-mem (v13.2.0)	bare CLAUDE.md
Per recall-cycle tokens	~6,700	~280
↳ Passive context injection on session start	~1,050	~269 (full file)
↳ `mcp-search` retrieval round-trip	~502	0
↳ Haiku 4.5 compression cost (charged to your quota)	~5,150 / session	0
Extra round-trip for details	yes (~22s)	no
Fidelity of recall	lossy (see below)	100% verbatim
Built-in memory state	disabled (`CLAUDE_CODE_DISABLE_AUTO_MEMORY=1`)	intact
Compression cost	on user’s Claude subscription	none
Headless capture (`claude -p`)	zero events	n/a
Upkeep	automatic	manual edit

Measure Per recall-cycle tokens

claude-mem (v13.2.0) ~6,700

bare CLAUDE.md ~280

Measure ↳ Passive context injection on session start

claude-mem (v13.2.0) ~1,050

bare CLAUDE.md ~269 (full file)

Measure ↳ mcp-search retrieval round-trip

claude-mem (v13.2.0) ~502

bare CLAUDE.md 0

Measure ↳ Haiku 4.5 compression cost (charged to your quota)

claude-mem (v13.2.0) ~5,150 / session

bare CLAUDE.md 0

Measure Extra round-trip for details

claude-mem (v13.2.0) yes (~22s)

bare CLAUDE.md no

Measure Fidelity of recall

claude-mem (v13.2.0) lossy (see below)

bare CLAUDE.md 100% verbatim

Measure Built-in memory state

claude-mem (v13.2.0) disabled (CLAUDE_CODE_DISABLE_AUTO_MEMORY=1)

bare CLAUDE.md intact

Measure Compression cost

claude-mem (v13.2.0) on user’s Claude subscription

bare CLAUDE.md none

Measure Headless capture (claude -p)

claude-mem (v13.2.0) zero events

bare CLAUDE.md n/a

Measure Upkeep

claude-mem (v13.2.0) automatic

bare CLAUDE.md manual edit

The token gap — 6,700 vs 280 — is meaningful but not the headline. The headline is the fidelity row.

The Sharpest Failure: Compression Flattens Modality

The four decisions written in Session 1 included three firm choices and one hedge:

“We use Redis… base62 short codes… 30-day TTL… we could refresh the TTL on each access if we want sliding expiration.”

When I read the raw observation row that claude-mem wrote, the four items appeared as a flat JSON array — facts: [...] — with no modal marker distinguishing the hedge from the decisions. The hedge had been flattened into the same shape as the firm choices.

Session 2 confirmed it. I asked the recalling agent to describe the TTL design. It cheerfully reported “we refresh the TTL on each access for sliding expiration” — as though that had been decided. When I challenged it directly, its own reply was that it could not distinguish firm decisions from options that had merely been considered. The compressed facts[] row it was reading from preserved the content of each item but not its modal status — what I’ll call its epistemic status throughout the rest of this post.

That’s the failure. The lossy layer loses epistemic status, not just bytes. A maybe becomes a decision. The recalling agent has no way to know it shouldn’t trust the row.

This is worse than no memory. An agent with no memory has to ask, or reread, or check the code. An agent with confident-wrong memory acts. The cost of acting on a fabricated decision compounds: now there’s code (or a PR, or an architectural note) committed under the false premise, and that will be the next round’s input.

The generalization: any LLM compression step that maps “speech-act varieties” (decisions, hypotheses, questions, jokes, hedges) onto a single typed structure — like a facts[] array — loses the modal axis. To preserve it, you’d need to compress into a richer schema (with kind: 'decision' | 'option' | 'question' per item), and you’d need the compression model to reliably tag the modality. Haiku 4.5 didn’t tag it. Whether a more careful prompt or schema would is an open question, but it’s a design question the current tool doesn’t even pose.

Six Measured Findings, Versioned to v13.2.0

In one place, six things I measured. Versioned because tool behavior changes:

Replaces, not augments. Install sets CLAUDE_CODE_DISABLE_AUTO_MEMORY=1. Built-in memory is turned off. Default deployment is single-system, not hybrid.
Partial sandbox. CLAUDE_CONFIG_DIR redirects plugin config, but the data store path ~/.claude-mem is hardcoded. Multi-tenant isolation is incomplete.
Compression runs on your subscription. Haiku 4.5 compresses observations to ~5,150 tokens per session, billed to your Anthropic quota. Free tools that consume your paid quota deserve a footnote.
Invisible to headless mode. claude -p runs (non-interactive) emit zero capture events in my tests. The lifecycle hooks fire only in interactive sessions. CI users and automation pipelines get no memory at all.
Compression flattens modality (the sharpest finding, detailed above). A hedge becomes a flat fact, indistinguishable from a firm decision in the compressed facts[] schema.
Token economics lose on small projects. ~6,700 tokens per recall cycle (passive inject + mcp-search round-trip + Haiku compression) versus ~280 deterministic, 100%-faithful tokens for the CLAUDE.md baseline. On a 1,000-line project, the per-token cost gap is wider than any retrieval benefit it provides.

Tool-version drift is real — by the time you read this, some of these may have been fixed. The cache-coherence framing in the next section is version-independent and was the actual reason I wrote the post.

Why This Is a Cache Problem, Precisely

The distributed-systems vocabulary for this design is materialized view of a source, served from a similarity cache.

Write-time lossy transform = a materialized view that can drift from the source of truth (the actual codebase, the actual decisions). The source is the user’s intent and the live code; the view is the compressed facts/narrative. Each write step can lose information that the view will never recover.
Read-time ANN = approximate retrieval. Top-K nearest neighbors. False positives are structural, not a bug — a sufficiently-similar wrong row will be returned with confidence indistinguishable from the right one.
No coherence with the source. Classical caches have invalidation hooks — write-through, write-back, snoop protocols, MESI states. They tie cache lines back to the canonical source so that writes propagate and stale lines get evicted or rewritten. claude-mem has no tie to the codebase or to user-issued corrections. You reverse a decision in conversation, the memory still believes the original.
Staleness without expiry. Even without explicit invalidation, classical caches use TTLs to bound staleness. claude-mem has no TTL on facts. A fact written six months ago competes for retrieval with one written yesterday, and the older one might win the vector hop.

When the cache-coherence frame is the right frame, the literature is rich and useful. Pat Helland’s Immutability Changes Everything (ACM Queue, 2015) and the broader databases-and-OS literature on cache-coherence protocols (MESI / MOESI), materialized-view invalidation, and write-through vs write-back are the right starting reading. The trades they describe — staleness vs cost, eventual vs strong coherence, when to flush, when to invalidate — are the same trades the agent-memory community is rediscovering with fresh names.

flowchart LR
    subgraph src["Source of truth"]
        s1["User intent
Live code
Conversation"]
    end
    subgraph write["Write path (lossy)"]
        w1["Haiku 4.5
compression"]
        w2["facts[] · narrative
(no modality, no provenance)"]
    end
    subgraph store["Materialized view"]
        st1["SQLite FTS5"]
        st2["Chroma vectors"]
    end
    subgraph read["Read path (approximate)"]
        r1["query vector
+ keyword"]
        r2["top-K hits"]
    end

    s1 -->|hook captures| w1
    w1 --> w2
    w2 --> st1
    w2 --> st2
    st1 --> r2
    st2 --> r2
    r1 --> r2
    r2 -->|"injected as fact"| s1

    classDef src fill:#e6fffa,stroke:#319795
    classDef lossy fill:#fed7d7,stroke:#c53030
    classDef store fill:#fef5e7,stroke:#b7791f
    classDef read fill:#ebf4ff,stroke:#5a67d8
    class s1 src
    class w1,w2 lossy
    class st1,st2 store
    class r1,r2 read

What’s missing from this diagram, and crucially: a backedge from “source of truth” to the materialized view that fires when the source changes. That’s the invalidation arrow. Its absence is the structural reason claude-mem gets wrong-row retrieval on decisions the user has reversed. Until something supplies that arrow, the system is best understood as a write-only cache.

When Each Wins

Cost-curve thinking (the same frame used in the companion piece) gives a clean answer:

Lossless + Exact wins when:

Project size is small or scope is clear. Curation is cheap; the manual upkeep budget is small.
Fidelity matters. You need to recall the exact decision, not a vibe of it. Coding agents, design decisions, security policy.
The author exists and is engaged. Someone is willing to write three lines into MEMORY.md when a decision is made.

Lossy + Approximate wins when:

History is too big to hand-curate. A year of conversations across multiple contributors, none of whom can be expected to maintain a MEMORY.md.
Coverage matters more than precision. You’d rather have a fuzzy memory that something was discussed than no memory at all. Customer-support agents over a year of tickets; team retrospectives over a quarter of standups.
The cost of acting on a fabricated fact is low. A confident-wrong recall in a support agent says “sorry let me check”; the user corrects it. The same recall in a coding agent ships broken code to production. The blast radius of a false positive determines the budget for accepting one.

The rule of thumb: fuzzy-but-present beats precise-but-absent, but only when the false-positive cost is low enough to absorb. For coding work on a 5,000-LOC project, it isn’t.

A Decision Framework

Signal	Lossless + Exact	Lossy + Approximate	Hybrid
Single project, < 50k LOC	✓
Multi-project / multi-year history		✓	✓
Decisions need exact recall	✓
Vague-but-present recall is acceptable		✓
Author is engaged (willing to curate)	✓		✓
No human curator available		✓
Cost of confident-wrong is high (production code, money)	✓
Cost of confident-wrong is low (suggestion, search)		✓
You can pay the Haiku compression bill from your quota	n/a	✓	✓
You operate headlessly or via CI	✓

Signal Single project, < 50k LOC

Lossless + Exact ✓

Lossy + Approximate

Hybrid

Signal Multi-project / multi-year history

Lossless + Exact

Lossy + Approximate ✓

Hybrid ✓

Signal Decisions need exact recall

Lossless + Exact ✓

Lossy + Approximate

Hybrid

Signal Vague-but-present recall is acceptable

Lossless + Exact

Lossy + Approximate ✓

Hybrid

Signal Author is engaged (willing to curate)

Lossless + Exact ✓

Lossy + Approximate

Hybrid ✓

Signal No human curator available

Lossless + Exact

Lossy + Approximate ✓

Hybrid

Signal Cost of confident-wrong is high (production code, money)

Lossless + Exact ✓

Lossy + Approximate

Hybrid

Signal Cost of confident-wrong is low (suggestion, search)

Lossless + Exact

Lossy + Approximate ✓

Hybrid

Signal You can pay the Haiku compression bill from your quota

Lossless + Exact n/a

Lossy + Approximate ✓

Hybrid ✓

Signal You operate headlessly or via CI

Lossless + Exact ✓

Lossy + Approximate

Hybrid

For most readers of this blog — engineers working on a single non-trivial codebase, where decisions matter and confident-wrong is expensive — the columns lean hard left.

What a Coherent Agent Memory System Would Need

The interesting question, once you accept the cache-coherence frame, is: what would the lossy + approximate corner look like if it were built like a real cache instead of a write-only one?

A short list of capabilities the current generation of “AI memory” tools is missing, and which any serious system in this space will eventually have to ship:

Source pointer (provenance). Every fact carries a back-pointer to the originating event: timestamp, session ID, the raw transcript turn or tool result it was derived from. Without this, you can’t audit a wrong recall — you only see the fact, never its lineage.
Modality tagging. Every fact tagged with epistemic status — decision | option_considered | hypothesis | question | observation — at write time, by the compression model. Without this, the system loses what the failure section above showed: the difference between we will and we could.
Supersedes / invalidation chain. A later fact can declare an earlier fact superseded (“decision A was reversed on date T by B”). Recall surfaces the latest applicable fact, not the most semantically similar one. This is the in-band invalidation classical caches use; agent memory currently has none.
Expiry / TTL by class. Decisions might be permanent; observations rot fast (“the build was passing this morning” should not influence behavior at 4 PM). Different fact classes get different TTLs.
Invalidation hook tied to the source of truth. When the underlying codebase or document changes in a way that contradicts a stored fact, the fact gets flagged for re-validation. This is the write-through arrow in the cache diagram earlier — currently absent.
Confidence surfaced to the caller. Instead of returning a flat string, return {value, confidence, provenance}. The recalling agent then knows when to trust, when to double-check, when to ignore.

None of these is novel. All of them are standard in production cache systems, query planners, and event-sourcing stores. They’re hard to retrofit onto a system that wasn’t designed with provenance and invalidation as first-class concerns. They’re not hard to design in from the start — but doing so means giving up the “just drop a hook on everything, ship next week” simplicity that makes the current crop of tools accumulate stars.

If you’re building an agent that has to act on its memory rather than just display it, this list is the spec.

Closing — “Persistent Memory” Is a Trade, Not a Feature

“Persistent AI memory” gets talked about like a feature you turn on. It isn’t. It’s a choice on the two-axis design space above, and every position on that space has known failure modes. The lossless-and-exact corner has the upkeep cost and the coverage limit. The lossy-and-approximate corner has the staleness, the wrong-row retrieval, and — the finding I came away most surprised by — the loss of modality.

Two takeaways worth carrying out:

If someone is selling you “automatic AI memory,” ask which quadrant. If the answer is lossy + approximate, ask the six questions from the section above: provenance, modality, supersedes, TTL, invalidation, confidence. If the answer to most of them is “the embeddings handle it,” you’re being sold a cache wearing the word memory.
The classical literature is the right starting point. Cache coherence, write-back vs write-through, eventual vs strong consistency, materialized-view invalidation — the database and OS communities have spent forty years working through these tradeoffs. Reading their writing is more useful than reading the latest agent-memory thread.

The companion to this post — Agent Retrieval Is a Cost Curve Problem — argues a parallel thing about within-session code retrieval: that Claude Code’s “use grep, not RAG” choice isn’t romance (“trust the model”) but math (cost curves), and that the source code shows Anthropic A/B-testing alternative retrieval architectures (Explore vs Fork) in production. Read together, the two pieces add up to a coherent stance about Anthropic’s bets across the fidelity × retrieval design space: in both within-session code search and cross-session memory, the default is lossless + exact, and the alternative branches are kept gated behind feature flags so the decisions can flip when the cost curves do. The memory side alone has at least four such gates visible in the snapshot I reviewed — tengu_coral_fern, tengu_herring_clock, tengu_passport_quail, tengu_slate_thimble — plus the build-time KAIROS, TEAMMEM, and EXTRACT_MEMORIES gates in src/memdir/.

That stance is not “we trust the model.” It’s: read the cost curves, build for the curves’ current shape, leave the toggles in for when they shift.

Appendix: How I Measured This

For the reader who wants to reproduce — or, more usefully, who wants to know exactly what was and wasn’t measured.

Versions and environment.

claude-mem v13.2.0 (npm install via the project’s standard install script)
Claude Code: current public release at time of test (2026-05-20)
macOS 25.4.0, zsh
Sandbox: CLAUDE_CONFIG_DIR=/tmp/cmem-test/dot-claude for the plugin config; data store at ~/.claude-mem (this directory location is hardcoded inside the tool — see Finding #2)

Test project. A small URL-shortener spec written from scratch, with four decisions in Session 1: (1) Redis for URL storage, (2) base62 short-code generation, (3) 30-day TTL, (4) the hedged sliding-expiration option that became the modality test.

Commands and protocol.

Fresh claude-mem install into the sandboxed CLAUDE_CONFIG_DIR. Verified install behavior set CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 in settings.json (Finding #1).
Session 1: interactive Claude Code session, walked through the four decisions with the tool actively capturing via lifecycle hooks. Watched the Bun worker on localhost:37701 accept events.
Session ended; session-boundary compression fired; Haiku-compressed facts[] and narrative written to SQLite-FTS5 + Chroma. Token count for the compression call read from the API trace.
Session 2: fresh interactive session; queried for each of the four decisions; observed the recall path (mcp-search round-trip; ~22s extra latency).
Compared retrieved content against the original Session 1 transcript byte-by-byte to identify the modality flattening (Finding #5).
Repeated the install + Session 1 pattern in headless claude -p mode to confirm Finding #4 (no events captured).

What I’m explicitly not claiming.

This is a single test run on a deliberately small project. The N is 1.
The CLAUDE.md baseline column in the test table is not a separately measured comparison session. It reflects deterministic properties of the built-in CLAUDE.md path (verbatim recall, no compression bill, no extra round-trip) that follow from the design — not a measured outcome.
I didn’t benchmark Chroma vector recall quality across many queries. The modality finding came from a single targeted probe (the TTL question); the cache-coherence framing predicts the same class of failure across many queries, but predicting and measuring are different.
I tested v13.2.0. The tool is actively developed; specific findings may have been addressed in later releases by the time you read this. The cache-coherence framing is version-independent.

Artifacts. Session 1 / Session 2 transcripts, the raw claude-mem SQLite + Chroma snapshots, and the token-cost API traces from the test run are kept locally and can be made available on reasonable request — get in touch.

Companion piece: Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn’t Use RAG Background: Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works For the design rationale behind Claude Code’s built-in memory in particular: Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs

🎧 More Ways to Consume This Content

HarrisonSecurityLab Podcast

I occasionally advise small teams on backend reliability, Go performance, and production AI systems. Learn more: /services

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.

Agent Memory Is a Cache Coherence Problem

Lossless curated notes vs lossy auto-compression with vector recall: two AI-memory designs that fail differently. One fails like a cache — classical systems already mapped it.

Table of Contents

The Two Axes (Don’t Collapse Them Into One)

The Baseline: Lossless + Exact (`CLAUDE.md` + built-in memory)

The Contender: Lossy + Approximate (`claude-mem`)

The Test (Real Numbers, 2026-05-20)

The Sharpest Failure: Compression Flattens Modality

Six Measured Findings, Versioned to v13.2.0

Why This Is a Cache Problem, Precisely

When Each Wins

A Decision Framework

What a Coherent Agent Memory System Would Need

Closing — “Persistent Memory” Is a Trade, Not a Feature

Appendix: How I Measured This

🎧 More Ways to Consume This Content

Comments

Leave a Comment

Agent Memory Is a Cache Coherence Problem

Lossless curated notes vs lossy auto-compression with vector recall: two AI-memory designs that fail differently. One fails like a cache — classical systems already mapped it.

Table of Contents

The Two Axes (Don’t Collapse Them Into One)

The Baseline: Lossless + Exact (CLAUDE.md + built-in memory)

The Contender: Lossy + Approximate (claude-mem)

The Test (Real Numbers, 2026-05-20)

The Sharpest Failure: Compression Flattens Modality

Six Measured Findings, Versioned to v13.2.0

Why This Is a Cache Problem, Precisely

When Each Wins

A Decision Framework

What a Coherent Agent Memory System Would Need

Closing — “Persistent Memory” Is a Trade, Not a Feature

Appendix: How I Measured This

🎧 More Ways to Consume This Content

[ Agent_Architecture_Notes ]

Related Articles

Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn't Use RAG

How Claude Code Compresses Context — The 5-Level Pipeline

Claude Code Deep Dive Part 2: The 1,421-Line While Loop That Runs Everything

Comments

Leave a Comment

[ Connect_With_Me ]

The Baseline: Lossless + Exact (`CLAUDE.md` + built-in memory)

The Contender: Lossy + Approximate (`claude-mem`)