Store→Load Reordering Explained — x86 vs ARM64, Real-World Test
Field notes on what happens when CPUs reorder your instructions. x86 has TSO (Total Store Order, almost). ARM64 is weakly ordered. Run the same lock-free code on both and the bug only appears on ARM. This is not academic — it's why some Go programs ship to production fine on Intel and break on M-series Macs or Graviton.
Status: Companion blog draft. Anchors on Obsidian note [[fence]] — the r1==0 && r2==0 impossibility proof.
Companion assets
- Original video:
Store→Load Reordering EXPLAINED: x86 vs ARM64 Real-World Test!
- GitHub: harrison001/CoreTracer — includes memory-model experiments
TL;DR
x86 and ARM64 do not agree on what your code means under concurrent execution.
- x86 (TSO model): stores are observed in program order by other cores; only store→load reordering is permitted, and even that is restricted.
- ARM64 (weakly ordered): stores and loads can be reordered freely unless you insert explicit barriers (
DMB,DSB).
The same lock-free C program that runs millions of iterations correctly on Intel can deadlock or produce impossible-on-x86 results after one minute on ARM64 — without changing the source.
The setup
A classic memory-model test:
// Thread 1 // Thread 2
x = 1; y = 1;
r1 = y; r2 = x;
On a system with sequential consistency, r1 == 0 && r2 == 0 is impossible (one of the stores must complete before both loads). On real hardware:
- x86: r1==0 && r2==0 happens, but rarely (the only allowed reordering is store→load)
- ARM64: r1==0 && r2==0 happens frequently
Debug command transcript
# TODO: paste actual test harness + perf invocation from the video
# Run a tight loop of the above test, count occurrences where r1==0 && r2==0
# Compare counts on x86 vs ARM64 (or Apple Silicon)
# ./reorder_test -arch=x86_64
# ./reorder_test -arch=arm64
What the data shows
(Placeholder — paste actual numbers from video.)
| Platform | Iterations | r1==0 && r2==0 hits |
|---|---|---|
| x86_64 (Intel) | 100M | ~50 (very rare) |
| ARM64 (Apple Silicon / Graviton) | 100M | ~hundreds of thousands (common) |
Why the fence ([[fence]] in Obsidian) makes r1==0 && r2==0 actually impossible
Insert an mfence (x86) or DMB ISH (ARM64) between the store and the load on each thread, and the case becomes provably impossible — the fence forces a global serialization point at which all earlier stores are visible.
The Obsidian field note covers the formal argument: with sequential consistency, you can always linearize the four ops; if r1==0 then the load of y happened before the store of y=1, meaning thread 2’s y=1 came after thread 1’s whole sequence, meaning x=1 was visible before thread 2’s r2 = x, contradiction.
What this teaches backend / AI infra engineers
You don’t write inline memory barriers in production Go. But:
- Atomics in Go and Rust map to specific hardware instructions; on x86 these are often “free” (the CPU already does most of what you need), on ARM64 they emit explicit barriers and cost cycles
- Lock-free queues that test fine on x86-based CI can deadlock on ARM-based prod. AWS Graviton, Apple Silicon, GCP Tau T2A all expose this. Real production incidents trace to “the same code works on dev’s MacBook Intel and breaks on Graviton.”
- AI infra running on multi-arch fleets: inference servers increasingly run on ARM (Graviton, NVIDIA Grace). Code written assuming x86’s memory model has latent bugs that surface only under specific scheduling.
The lesson: portability between x86 and ARM is not “recompile and run.” The compiler honors the source-language memory model (Go, Rust, C++11+); the bugs that surface on ARM are bugs that the source language always permitted but x86’s stronger model accidentally hid.
Related work
- Video: Cache Miss, TLB Miss & False Sharing
- Obsidian: [[fence]] — fence formal argument and proof
- GitHub: CoreTracer
🎧 More Ways to Consume This Content
I occasionally advise small teams on backend reliability, Go performance, and production AI systems. Learn more: /services
Comments
This space is waiting for your voice.
Comments will be supported shortly. Stay connected for updates!
This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.
Have questions? Reach out through:
Want to see your comment featured? Mention us on X or tag us on Reddit.
Leave a Comment