IronSys: A Production Blueprint for Modern Concurrency
After Four Pillars of Concurrency, the natural question: what does a system actually look like when it uses all of them deliberately? IronSys is a composite blueprint — the concurrency architecture I'd build today if I were starting over, with the trade-offs each choice buys.
Table of Contents
In the last post I walked through the four concurrency pillars — shared memory + locks, CSP, actors, STM — and argued that real systems mix them on purpose. Someone reasonably asked: okay, but what does that actually look like? Fair question. Abstract taxonomy is less useful than a worked example.
IronSys is that worked example. It’s a composite blueprint — not a real service, but representative of a class of services I’ve designed, helped design, or debugged in production. Let’s say it’s a mid-sized backend system: public API, stateful user sessions, streaming data in, aggregation and reporting out. The kind of thing that appears in the middle of any serious platform.
The interesting part isn’t the features. It’s which concurrency primitive shows up where, and why.
tl;dr — IronSys is a composite production blueprint: a multi-service Go backend with stateful user sessions, streaming ingest, and usage aggregation. It uses CSP channels for pipelines and coordination, a goroutine-per-entity actor pattern for stateful sessions, mutexes and atomics for hot shared counters, and durable queues for cross-service handoff. Each primitive is picked for a specific failure mode. The pattern is not “mix for variety”; it’s “match the primitive to the work.”
The System Shape
Before deciding on concurrency primitives, sketch the work shapes. IronSys has four:
- Public API — request/response, modest concurrency, latency-sensitive. The classic HTTP backend.
- Live sessions — stateful, long-lived per-user entities. Think multiplayer game server, collaborative editor, real-time dashboard.
- Streaming ingest — high-throughput events arriving over Kafka/NATS, fanned out to workers for processing.
- Batch aggregation — periodic rollup jobs that read from storage, compute, write back.
Four shapes, four concurrency patterns. The wrong design would apply the same primitive to all four. The right design picks each separately.
flowchart LR
subgraph Shapes["Work shapes"]
S1["1. Public API
stateless · request/response"]
S2["2. Live sessions
stateful · long-lived"]
S3["3. Streaming ingest
high throughput · stateless"]
S4["4. Batch aggregation
pipeline · scheduled"]
end
subgraph Primitives["Concurrency primitives"]
P1["Goroutine + mutex
per-request handler"]
P2["Goroutine-per-entity
actor-like · private state"]
P3["Bounded channel + worker pool
CSP · backpressure"]
P4["CSP pipeline + errgroup
staged · cancellable"]
end
S1 --> P1
S2 --> P2
S3 --> P3
S4 --> P4
classDef shape fill:#e8f4f8,stroke:#2c5282
classDef prim fill:#f0fff4,stroke:#2f855a
class Shapes shape
class Primitives prim
The API Handlers
Nothing fancy. Stock Go HTTP server. Each request is its own goroutine (Go’s runtime does this automatically). Shared state — rate limiters, cache, config — is protected by mutexes or atomics:
type RateLimiter struct {
mu sync.Mutex
buckets map[string]*bucket
}
func (r *RateLimiter) Allow(key string) bool {
r.mu.Lock()
defer r.mu.Unlock()
b, ok := r.buckets[key]
if !ok {
b = newBucket()
r.buckets[key] = b
}
return b.allow()
}
Obvious choice. The contention is bounded by request rate, the state is small, a mutex is the simplest possible tool. Over-engineering here — sharded maps, lock-free data structures — buys nothing.
What IronSys does here that many teams miss: every handler is context-aware from request entry:
func (s *Server) HandleFoo(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
defer cancel()
result, err := s.service.Foo(ctx, parseReq(r))
writeResponse(w, result, err)
}
Context flows everywhere downstream. The handler layer is boring; that’s the point.
The Live Sessions — Actor Pattern in Go
Each active user session is a long-lived goroutine with an inbox channel. I call this the goroutine-per-entity pattern — it’s Erlang actors without the runtime, built from Go primitives.
type Session struct {
id SessionID
mailbox chan SessionCmd // the "actor" inbox
shutdown chan struct{}
state sessionState // private to this goroutine
}
type SessionCmd struct {
op string
args interface{}
reply chan<- SessionReply // optional reply channel
}
func runSession(ctx context.Context, s *Session) {
defer close(s.mailbox)
for {
select {
case cmd := <-s.mailbox:
s.handle(cmd)
case <-s.shutdown:
s.flush() // persist final state
return
case <-ctx.Done():
return
}
}
}
Why this pattern, not “session is a struct with a mutex”?
- State is private to one goroutine. No sharing, no locks, no lock-ordering bugs. The session state is accessed by exactly one execution context.
- Serial message processing. Commands process one at a time, in FIFO order. Business invariants hold naturally.
- Natural location for cross-session coordination. Each session is a message destination. Broadcasting to all sessions, or routing a command to a specific session, is just “send on its inbox.”
- Clean lifecycle. The goroutine runs until
shutdownorctx.Done. State is flushed once, on exit. No race between “is this session still alive” and “did we finish writing its state.”
The manager that creates and routes to sessions looks like:
type SessionManager struct {
mu sync.RWMutex
sessions map[SessionID]*Session
}
func (m *SessionManager) Get(id SessionID) (*Session, bool) {
m.mu.RLock()
defer m.mu.RUnlock()
s, ok := m.sessions[id]
return s, ok
}
func (m *SessionManager) Start(ctx context.Context, id SessionID) *Session {
m.mu.Lock()
defer m.mu.Unlock()
s, ok := m.sessions[id]
if ok { return s }
s = newSession(id)
m.sessions[id] = s
go runSession(ctx, s) // supervisor goroutine
return s
}
Note the mixing: the manager uses a mutex-protected map (shared state with a clear owner), individual sessions use the actor pattern (isolated state, message-passing). Two primitives, picked per-job.
This pattern scales to millions of sessions because goroutines are cheap. I’ve seen this exact pattern serve 400K concurrent sessions on a single pod.
The Streaming Ingest — Bounded Worker Pool (CSP)
Kafka consumer feeding a worker pool. Canonical CSP territory:
func runConsumer(ctx context.Context, cons *kafka.Consumer) error {
jobs := make(chan Event, 256)
var wg sync.WaitGroup
// Fixed worker pool
for i := 0; i < workerCount; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case job, ok := <-jobs:
if !ok { return }
if err := process(ctx, job); err != nil {
log.Error(err)
}
case <-ctx.Done():
return
}
}
}()
}
// Producer
go func() {
defer close(jobs)
for {
msg, err := cons.ReadMessage(ctx)
if err != nil { return }
select {
case jobs <- msg:
case <-ctx.Done():
return
}
}
}()
<-ctx.Done()
wg.Wait()
return ctx.Err()
}
The bounded channel is the concurrency clamp. Kafka can push as fast as it wants; the worker pool consumes at its own pace; backpressure propagates back to Kafka’s consumer offset naturally.
Why not actors here? Because the work items are stateless — you’re processing events, not maintaining per-entity state. The overhead of an actor (mailbox, dispatch, ownership) is unjustified. CSP is the right fit.
Why not mutex + a worker loop? You could, but the channel primitive is exactly the right shape — bounded capacity + safe cross-goroutine handoff + graceful shutdown — without needing to build those three features yourself.
The Batch Aggregation — Pipelines + errgroup
Nightly rollup: read from storage, compute per-account aggregates, write back.
func runRollup(ctx context.Context, input <-chan Event) error {
g, gctx := errgroup.WithContext(ctx)
// Stage 1: parse
parsed := make(chan ParsedEvent, 64)
g.Go(func() error {
defer close(parsed)
return parseStage(gctx, input, parsed)
})
// Stage 2: aggregate (keyed by account)
agged := make(chan Aggregate, 64)
g.Go(func() error {
defer close(agged)
return aggregateStage(gctx, parsed, agged)
})
// Stage 3: persist
g.Go(func() error {
return persistStage(gctx, agged)
})
return g.Wait()
}
Three stages in a pipeline. Each stage is a goroutine, connected by bounded channels. errgroup ties them together: first error cancels the whole pipeline.
The aggregation stage internally uses a map protected by a mutex, because it’s a single goroutine reading the map — no contention at all, but still safe if a future change introduces more readers.
This is textbook CSP: the topology of channels is the architecture. Read the code and the shape of the computation is obvious.
The Cross-Service Handoff — Durable Queues
IronSys talks to two other services: a billing service (async, eventually consistent) and an auth service (sync, immediate).
For billing: a dedicated NATS JetStream subject with at-least-once delivery. Usage events go in one end; the billing service reads them. The emission codepath has a local write-ahead log so that if NATS is briefly down, events buffer on disk and replay when the connection recovers.
For auth: gRPC with tight timeouts. Caller owns completion. If auth is slow, the API handler’s deadline fires and the request fails fast.
Two different ownership models for two different shapes of work. See: RPC vs NATS: Who Owns Completion.
How the Primitives Map
Summarizing which primitive serves which job in IronSys:
| Work shape | Primitive | Why |
|---|---|---|
| HTTP request handling | Stock net/http + goroutine per request | Language default, right for stateless |
| Hot shared state (rate limiter, cache) | Mutex / atomic | Simplest primitive that works |
| Stateful user sessions | Goroutine-per-entity (actor-like) | Isolated state, message-passing, serial processing |
| Session directory | RWMutex-protected map | Shared lookup, read-heavy |
| Streaming event processing | Bounded channel + worker pool (CSP) | Backpressure, parallelism, graceful shutdown |
| Multi-stage data pipeline | CSP pipeline + errgroup | Stage topology = architecture; first-error cancels all |
| Async cross-service handoff | Durable queue (NATS JetStream / Kafka) | Receiver owns completion, at-least-once delivery |
| Sync cross-service call | gRPC with ctx timeout | Caller owns completion, fast failure |
Notice: all four concurrency pillars show up. Mutexes in the rate limiter. CSP in the event pipeline. Actors (in pattern) in the session runtime. (STM is missing; it would show up if I were doing this in Clojure or Haskell.)
What This Architecture Gets Wrong
Every architecture has weaknesses. IronSys’s are real:
- The actor pattern isn’t real actors. Without Erlang-style supervision, if a session goroutine panics, Go’s default behavior is to kill the process. Adding panic recovery per-session is easy but not free. In practice, most teams hit this 6 months in, add a recovery wrapper, and move on.
- Bounded channels can mask slow downstream. If a channel fills up and the producer blocks, that’s backpressure — great. But if the channel is buffered too large, you can buffer a lot of work into memory before realizing downstream is slow. Tune buffer sizes with measurements, not guesses.
- Goroutine-per-entity has a per-session baseline cost. Cheap but not free. A million sessions is ~2.5GB of goroutine stacks. For services where most entities are inactive, a lazy pattern (spin up on activity, suspend to disk on idle) is better.
- Mixing paradigms cognitively. New engineers have to learn four patterns instead of one. The productivity hit is real for the first two weeks; the payoff is in the next two years.
What This Blueprint Is Really Selling
A system with four work shapes should have four concurrency patterns, not one stretched to cover everything. The four pillars aren’t theoretical; they map to real design decisions, and production Go services that use them deliberately are easier to reason about than those that don’t.
What IronSys is really selling is intentional heterogeneity. Every primitive is there for a reason. Every reason is traceable to a specific failure mode you want to prevent. The architecture should be legible — a new engineer reading the code should understand why a channel is there instead of a mutex, why a session has its own goroutine instead of being a struct in a shared map, why billing goes through a durable queue instead of a gRPC call.
If you can’t answer “why this primitive here,” the code isn’t finished. It’s just working, for now.
Blueprints are useful precisely because they’re generic. The specifics of your system will be different. But the decision framework — what’s the work shape, what’s the failure mode, what’s the right primitive — is the same every time.
Related
- From Locks to Actors: The Four Pillars of Modern Concurrency — the taxonomy behind the choices in IronSys.
- Go’s Concurrency Is About Structure, Not Speed — chan and context as the glue across all of these.
- RPC vs NATS: It’s Not About Sync vs Async — It’s About Who Owns Completion — the cross-service handoff choices.
- Testing Real-World Go Backends Isn’t What Many People Think — how you verify a system like this actually holds up.
Comments
This space is waiting for your voice.
Comments will be supported shortly. Stay connected for updates!
This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.
Have questions? Reach out through:
Want to see your comment featured? Mention us on X or tag us on Reddit.