Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works

Strong, eventual, causal, read-your-writes, linearizable — consistency models are taught as a taxonomy. Production uses them as a menu. Ten scenarios, the right consistency choice for each, and the engineering that makes the choice work.

March 28, 2026
Harrison Guo
13 min read
System Design Backend Engineering

There’s an impulse, when someone first learns about consistency models in distributed systems, to want to classify the taxonomy into neat drawers. Strong here. Eventual there. Linearizable above it. Read-your-writes below. Study the diagram, pass the interview.

That taxonomy is real, but it’s not useful the way people think. Production systems don’t pick a consistency model and run with it. They pick a different model per feature, often per type of operation within a feature, and spend most of their engineering effort on the gaps between what the model provides and what users actually expect. The taxonomy is the menu. The interesting question is which dish each scenario needs.

This is a working engineer’s walk through ten real consistency scenarios — from the obvious ones (money transfers need strong) to the less obvious (collaborative editing, notification feeds, analytic dashboards) — with the specific engineering that makes each one work.

tl;dr — Consistency is not a global system property; it’s a per-operation property. A well-designed distributed system picks different consistency levels for different operations based on what users actually notice, what the business actually requires, and what latency budget each operation has. The CAP-theorem framing (“pick 2 of 3”) is a caricature; real systems use PACELC (which adds the latency trade-off during normal operation) and pick per-feature.


The Frames That Matter

Before scenarios, three frames you actually use in practice.

CAP (Consistency, Availability, Partition tolerance, pick 2). Useful as a first-week mental model. Misleading if taken literally, because (a) you can’t give up partition tolerance in a real network, and (b) the choice isn’t binary — you can tune per operation.

PACELC: if there’s a Partition, pick A (availability) or C (consistency). Else, pick L (latency) or C (consistency). Adds the latency trade-off you pay during normal operation, which is where 99% of design decisions actually live. A system that’s “consistent when no partition” but pays 50ms of cross-region round-trip for every write has made a latency-vs-consistency call, not a CAP call.

Consistency models, from strongest to weakest:

  • Linearizable: operations appear to happen instantaneously, in a total order consistent with real time. The strongest practical model. Expensive.
  • Sequential: operations appear in a total order, but not necessarily aligned with real time. Slightly weaker, slightly cheaper.
  • Causal: if event A causally precedes event B, every observer sees A before B. Preserves the “this reply should appear after the comment it replied to” property.
  • Read-your-writes: you see the effects of your own operations, even if other users don’t yet.
  • Monotonic read: once you see a value, you won’t see an older value later.
  • Eventual: if writes stop, replicas eventually converge. No ordering guarantees during the transient.

You don’t need to memorize these. You need to recognize which one each feature actually needs.

flowchart LR
    subgraph Strong["Stronger · more expensive"]
        L["Linearizable
Money transfer · distributed locks"] SQ["Sequential
Multi-leader with clock sync"] end subgraph Mid["Middle ground · usually the right answer"] CA["Causal
Social feed · replies after comments"] RY["Read-your-writes
User profile · settings"] MR["Monotonic read
Pagination · dashboards"] end subgraph Weak["Weaker · cheaper and faster"] EC["Eventual
Counters · analytics · CDN"] end Strong --> Mid --> Weak classDef strong fill:#fed7d7,stroke:#c53030 classDef mid fill:#fef5e7,stroke:#b7791f classDef weak fill:#f0fff4,stroke:#2f855a class Strong strong class Mid mid class Weak weak

Moving left to right: cheaper, faster, less coordinated — and more work you do in application code to close the gap between what the model gives you and what users expect.

Ten Scenarios

1. Money Transfer Between Accounts

Needs: strict linearizability. No double-spend. No lost updates.

Approach: transactional database with serializable isolation, or a strongly-consistent coordination layer (Paxos/Raft quorum). Typical implementation: single-region primary Postgres with synchronous replication, or a distributed SQL (Spanner, CockroachDB, YugabyteDB) with linearizable reads.

What you give up: latency (especially cross-region), availability during partitions. This is the right trade — a bank doesn’t tolerate double-spend to save 30ms.

Key engineering: idempotency keys on every request, deduplication at the persistence layer, well-audited transaction boundaries. Strong consistency at the DB isn’t enough if your retry logic double-writes.

2. Inventory Decrement with High Contention

Needs: “no overselling” without blocking every request.

Approach: the classic “hot key” problem. Options in ascending sophistication:

  • Pessimistic lockingSELECT ... FOR UPDATE on the inventory row. Works; serializes hot items. Under peak traffic on Black Friday, this queues up and tail latencies explode.
  • Optimistic concurrency — read version, decrement, compare-and-swap. Retries on conflict. Better tail latency at moderate contention, worse at very high contention (retry storms).
  • Reserved-inventory buckets — maintain N “shards” of available inventory, route requests to a random shard, only one shard hits contention at a time. Sacrifices a small amount of overselling risk (if shard A has 5 left but shard B has 0, a user might get told “out of stock” while 5 remain total) for huge throughput wins.
  • Best-effort with async reconciliation — accept orders optimistically, reconcile at a background worker, cancel overbooks with apology emails. Used by event-ticketing sites for popular drops.

The right choice depends on business rules. If overselling by 1% is unacceptable, pessimistic. If overselling by 0.1% is tolerable and user-experience matters, shard or async reconcile.

3. User Profile Update

Needs: read-your-writes. After I save my display name, I see it on next page load.

Approach: sticky reads. Either the session pins to the write replica for a short window, or the application tracks a “last write timestamp” per user and refuses to serve reads from a replica that hasn’t caught up.

The naive alternative — “eventually consistent, just retry” — breaks user expectations immediately. “I updated my name and it didn’t save” is one of the most expensive support tickets on a per-incident basis, because the user has no way to distinguish “didn’t save” from “saved but replication is lagging.”

The engineering is not glamorous. A session cookie that carries last_write_ts, a read path that asserts replica.latest_ts >= last_write_ts, and a fallback to the primary if the assertion fails. Most frameworks don’t give you this for free; you build it.

4. Social Media Feed

Needs: causal consistency for comments and replies. Eventual consistency everywhere else.

Approach: two-tier. Posts and likes are written to a local region with async replication. Replies are linked to their parent post with an explicit cause-precedes relationship — the reply’s store won’t surface the reply until the parent has propagated.

The CRDT-adjacent pattern (version vectors, Lamport timestamps) sits underneath, but you don’t usually expose it to the application. What you expose is “here’s the list of replies, in causally-consistent order.” What the user sees: “I replied to a comment, and my reply appears under it” — which is exactly the mental model they expect.

What you save by not using strong consistency everywhere: low write latency (local region only), high availability during partitions, and the ability to handle massive fan-out (a celebrity’s post propagating to 40M followers doesn’t need to wait on a single coordinator).

5. Collaborative Document Editing

Needs: offline-first, multi-user concurrent edits, always-eventually-converge, no lost updates.

Approach: CRDT (conflict-free replicated data type) or OT (operational transformation). This is one of the few spots where CRDTs genuinely shine. The underlying math guarantees that any two replicas will converge to the same state, regardless of the order operations arrive in, as long as all operations eventually reach all replicas.

Google Docs uses a version of OT. Figma uses multivalue registers and CRDT-adjacent primitives. Notion uses a mix. The common property: any user can edit while offline, sync when reconnected, and the final document reflects all edits.

What you give up: simplicity. CRDT implementations are subtle, and naive “last-write-wins” semantics are almost never what the user wants (their previous sentence vanished, not merged).

What you gain: offline support without an ugly “you’ve been offline, your changes may conflict” modal.

6. Ad Click Counter

Needs: eventual consistency, very high write throughput, lossy-okay for a tiny fraction.

Approach: local counter per shard, periodic aggregation to central store. Writes are fire-and-forget to a stream (Kafka, Kinesis). Reads come from a precomputed aggregate that’s a few seconds stale.

Why this works: no advertiser is going to detect the difference between “47,312 clicks” and “47,318 clicks” in their dashboard. Counting-with-precision across a global distributed system is ten times harder than counting approximately. Do the latter.

What’s non-obvious: the system should be designed for approximate counts, with explicit tolerance in the SLA (“counts are accurate to within 0.01% and updated every 30 seconds”). If you don’t say that upfront, someone will eventually ask “why don’t our counts match the backend logs exactly” and you’ll be in a two-week project to eliminate errors that never mattered.

7. Multi-Region Primary / Secondary

Needs: fast reads in every region, writes can live in one region.

Approach: primary-in-region-A, async replication to regions B/C/D. Reads in B/C/D may lag the primary by milliseconds to seconds. Writes always route to A.

Consistency model you’re serving: eventual, with read-your-writes available on demand (see scenario 3). Reads from the primary are strongly consistent; reads from secondaries are lagged but fast.

Key engineering: the client SDK should know which operations need primary reads (after a recent write, for “show me the thing I just wrote” operations) and which can hit secondaries (dashboards, history views, anything time-insensitive).

This is where most backend systems actually live. The bulk of reads go to secondaries — cheap, fast. A small percentage route to primary for freshness. Latency and availability both win.

8. Distributed Lock / Leader Election

Needs: exactly one leader, no split-brain, sometimes-unavailable-is-okay.

Approach: a consensus system (Zookeeper, etcd, Consul — all Raft or Paxos variants). Acquire the lock or lease, renew it, do the work, release. If you lose the network partition, the other side knows you lost it because it couldn’t renew.

The classic failure: leader election on top of Redis. Redis is not a consensus system. RedLock has well-documented failure modes — it is not safe for correctness-critical locking. Use etcd. Use Zookeeper. Use a real consensus system. The tempting shortcut will, eventually, bite.

What consensus buys you: guaranteed linearizability for operations on the lock/lease. What it costs: every operation is a quorum round-trip. That’s fine for leader election (infrequent). It’s not fine for a hot write path (use a different mechanism).

9. Analytics Dashboard

Needs: all the data, eventually, in a queryable form. No urgency on freshness.

Approach: stream writes to a durable log (Kafka), have an ETL job populate a columnar warehouse (BigQuery, ClickHouse, Snowflake) on a schedule. Dashboards query the warehouse. Data is minutes to hours stale.

Consistency model: none, in the traditional sense. You have an append-only log and a materialized view. The view is eventually consistent with the log, and that’s the whole contract.

This is simple but worth calling out because people sometimes try to do analytics against the operational database directly (“we’ll run these queries on the primary, it’ll be fine”). It will not be fine. Analytic queries are different workload shapes — they want columnar storage, aggressive parallelism, no transactional overhead. Put them in a warehouse.

10. Cross-Service Orchestration (Saga)

Needs: multi-step business flow across services — create order, reserve inventory, charge payment, schedule shipment. Each step might fail. The system should end up in a consistent state either way.

Approach: Saga. Each step is a local transaction in its own service. For each step, you also define a compensating step that undoes it. If step 4 fails, you run compensations for steps 1-3.

Step 1: Create order          Compensation: Cancel order
Step 2: Reserve inventory     Compensation: Release inventory
Step 3: Charge payment        Compensation: Refund payment
Step 4: Schedule shipment     Compensation: Cancel shipment

Not all compensations are symmetric — you can’t un-send an email, you can’t un-refund a payment. But for most business flows, you can design compensations that leave the system in a consistent-enough state.

The alternative — 2PC (two-phase commit) across all services — is real but rarely used. 2PC requires every participant to support the protocol, holds locks while waiting, and blocks the whole transaction if any participant is slow or down. For services owned by different teams on different storage engines, 2PC doesn’t scale.

Saga engineering concerns: saga orchestrators (a coordinator service that runs the state machine) vs saga choreography (each service emits events that trigger the next). Orchestrators are simpler to reason about. Choreography scales further but can produce spaghetti.

The Meta-Rule

Walking through those ten: the choice isn’t really “which consistency model is best for my system.” It’s “which consistency model does this specific operation need, given what users expect to see.”

Most production systems use all of the following, in different places:

  • Strong/linearizable consistency for anything money-related.
  • Read-your-writes for user-visible writes that users need to see immediately.
  • Causal consistency for feed-like data where ordering matters.
  • Eventual consistency for counters, analytics, and anything where approximate-and-fast beats exact-and-slow.
  • CRDTs (narrowly) for collaborative editing and specific offline-first features.
  • Saga for cross-service business flows.
  • Consensus (Zookeeper/etcd) for the very few things that actually need leader election or distributed locks.

The engineering decision is not “pick a consistency level for the whole system.” It’s “for this specific feature, what consistency level does the user need to experience, what trade-offs does the stronger version cost, and can we engineer the weaker version to feel as good?”

That last clause matters. A read-your-writes layer on top of eventual consistency often feels strongly consistent to users while actually being cheap to operate. Users don’t experience consistency models; they experience whether their updates show up, whether their comments appear in order, whether their refund matches what they expected. Engineering consistency is about closing the gap between the model you can afford and the experience the user requires.

Common Anti-Patterns

A few shapes that show up repeatedly in code reviews:

“Retry until consistent.” Seen in code that does a write, then reads from a secondary and loops until it sees the write. Works on the happy path, deadlocks on partition, creates unbounded retry storms under load. Use read-your-writes through a session token instead.

“We’ll use eventual consistency for speed.” Used as a justification for skipping engineering. Yes, eventual is faster. The engineering to make it feel correct (causal ordering, conflict resolution, read-your-writes fallback) is what you’re skipping — and users will notice.

“Just use Redis for leader election.” Already mentioned. Redlock is not safe. If you’re doing anything correctness-critical with leader election, use a real consensus system.

“Saga with no compensations.” “What happens if step 3 fails?” “Oh, we’ll fix it manually.” That’s a saga you haven’t designed. It’s a half-finished state machine waiting to corrupt data. Design the compensations before you ship.

“Strong consistency everywhere, for safety.” Default-safe sounds responsible. It also means your read latency is 50ms minimum, you can’t serve a region during a partition, and the cost per query is high. Users rarely need strong consistency everywhere. They need it in a few specific places.

The Senior Move

Consistency is a user-experience feature, not a system property. The right question at design time isn’t “what consistency model does our database provide” — it’s “what does the user need to see, in what order, with what freshness, with what tolerance for partial failure.”

Most of the work in a well-designed distributed system is engineering around the consistency model the storage layer provides: sticky reads, session tokens, version vectors, compensating actions, explicit ordering, user-visible “your change is saved” confirmations. The model is the floor; the engineering lifts the experience to what users actually expect.

The difference between senior and junior distributed-systems work often shows up here. Junior picks a model and fights everything else to conform. Senior picks the model per-feature, builds the engineering scaffolding that closes the gap, and ships something that feels right to users — even though underneath, ten different operations run on five different consistency levels.


🎧 More Ways to Consume This Content

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.