IronSys: A Production Blueprint for Modern Concurrency

After Four Pillars of Concurrency, the natural question: what does a system actually look like when it uses all of them deliberately? IronSys is a composite blueprint — the concurrency architecture I'd build today if I were starting over, with the trade-offs each choice buys.

October 22, 2025
Harrison Guo
10 min read
System Design Backend Engineering

In the last post I walked through the four concurrency pillars — shared memory + locks, CSP, actors, STM — and argued that real systems mix them on purpose. Someone reasonably asked: okay, but what does that actually look like? Fair question. Abstract taxonomy is less useful than a worked example.

IronSys is that worked example. It’s a composite blueprint — not a real service, but representative of a class of services I’ve designed, helped design, or debugged in production. Let’s say it’s a mid-sized backend system: public API, stateful user sessions, streaming data in, aggregation and reporting out. The kind of thing that appears in the middle of any serious platform.

The interesting part isn’t the features. It’s which concurrency primitive shows up where, and why.

tl;dr — IronSys is a composite production blueprint: a multi-service Go backend with stateful user sessions, streaming ingest, and usage aggregation. It uses CSP channels for pipelines and coordination, a goroutine-per-entity actor pattern for stateful sessions, mutexes and atomics for hot shared counters, and durable queues for cross-service handoff. Each primitive is picked for a specific failure mode. The pattern is not “mix for variety”; it’s “match the primitive to the work.”


The System Shape

Before deciding on concurrency primitives, sketch the work shapes. IronSys has four:

  1. Public API — request/response, modest concurrency, latency-sensitive. The classic HTTP backend.
  2. Live sessions — stateful, long-lived per-user entities. Think multiplayer game server, collaborative editor, real-time dashboard.
  3. Streaming ingest — high-throughput events arriving over Kafka/NATS, fanned out to workers for processing.
  4. Batch aggregation — periodic rollup jobs that read from storage, compute, write back.

Four shapes, four concurrency patterns. The wrong design would apply the same primitive to all four. The right design picks each separately.

flowchart LR
    subgraph Shapes["Work shapes"]
        S1["1. Public API
stateless · request/response"] S2["2. Live sessions
stateful · long-lived"] S3["3. Streaming ingest
high throughput · stateless"] S4["4. Batch aggregation
pipeline · scheduled"] end subgraph Primitives["Concurrency primitives"] P1["Goroutine + mutex
per-request handler"] P2["Goroutine-per-entity
actor-like · private state"] P3["Bounded channel + worker pool
CSP · backpressure"] P4["CSP pipeline + errgroup
staged · cancellable"] end S1 --> P1 S2 --> P2 S3 --> P3 S4 --> P4 classDef shape fill:#e8f4f8,stroke:#2c5282 classDef prim fill:#f0fff4,stroke:#2f855a class Shapes shape class Primitives prim

The API Handlers

Nothing fancy. Stock Go HTTP server. Each request is its own goroutine (Go’s runtime does this automatically). Shared state — rate limiters, cache, config — is protected by mutexes or atomics:

type RateLimiter struct {
    mu      sync.Mutex
    buckets map[string]*bucket
}

func (r *RateLimiter) Allow(key string) bool {
    r.mu.Lock()
    defer r.mu.Unlock()
    b, ok := r.buckets[key]
    if !ok {
        b = newBucket()
        r.buckets[key] = b
    }
    return b.allow()
}

Obvious choice. The contention is bounded by request rate, the state is small, a mutex is the simplest possible tool. Over-engineering here — sharded maps, lock-free data structures — buys nothing.

What IronSys does here that many teams miss: every handler is context-aware from request entry:

func (s *Server) HandleFoo(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    defer cancel()

    result, err := s.service.Foo(ctx, parseReq(r))
    writeResponse(w, result, err)
}

Context flows everywhere downstream. The handler layer is boring; that’s the point.

The Live Sessions — Actor Pattern in Go

Each active user session is a long-lived goroutine with an inbox channel. I call this the goroutine-per-entity pattern — it’s Erlang actors without the runtime, built from Go primitives.

type Session struct {
    id       SessionID
    mailbox  chan SessionCmd  // the "actor" inbox
    shutdown chan struct{}
    state    sessionState      // private to this goroutine
}

type SessionCmd struct {
    op     string
    args   interface{}
    reply  chan<- SessionReply // optional reply channel
}

func runSession(ctx context.Context, s *Session) {
    defer close(s.mailbox)
    for {
        select {
        case cmd := <-s.mailbox:
            s.handle(cmd)
        case <-s.shutdown:
            s.flush() // persist final state
            return
        case <-ctx.Done():
            return
        }
    }
}

Why this pattern, not “session is a struct with a mutex”?

  • State is private to one goroutine. No sharing, no locks, no lock-ordering bugs. The session state is accessed by exactly one execution context.
  • Serial message processing. Commands process one at a time, in FIFO order. Business invariants hold naturally.
  • Natural location for cross-session coordination. Each session is a message destination. Broadcasting to all sessions, or routing a command to a specific session, is just “send on its inbox.”
  • Clean lifecycle. The goroutine runs until shutdown or ctx.Done. State is flushed once, on exit. No race between “is this session still alive” and “did we finish writing its state.”

The manager that creates and routes to sessions looks like:

type SessionManager struct {
    mu       sync.RWMutex
    sessions map[SessionID]*Session
}

func (m *SessionManager) Get(id SessionID) (*Session, bool) {
    m.mu.RLock()
    defer m.mu.RUnlock()
    s, ok := m.sessions[id]
    return s, ok
}

func (m *SessionManager) Start(ctx context.Context, id SessionID) *Session {
    m.mu.Lock()
    defer m.mu.Unlock()

    s, ok := m.sessions[id]
    if ok { return s }

    s = newSession(id)
    m.sessions[id] = s
    go runSession(ctx, s) // supervisor goroutine
    return s
}

Note the mixing: the manager uses a mutex-protected map (shared state with a clear owner), individual sessions use the actor pattern (isolated state, message-passing). Two primitives, picked per-job.

This pattern scales to millions of sessions because goroutines are cheap. I’ve seen this exact pattern serve 400K concurrent sessions on a single pod.

The Streaming Ingest — Bounded Worker Pool (CSP)

Kafka consumer feeding a worker pool. Canonical CSP territory:

func runConsumer(ctx context.Context, cons *kafka.Consumer) error {
    jobs := make(chan Event, 256)
    var wg sync.WaitGroup

    // Fixed worker pool
    for i := 0; i < workerCount; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for {
                select {
                case job, ok := <-jobs:
                    if !ok { return }
                    if err := process(ctx, job); err != nil {
                        log.Error(err)
                    }
                case <-ctx.Done():
                    return
                }
            }
        }()
    }

    // Producer
    go func() {
        defer close(jobs)
        for {
            msg, err := cons.ReadMessage(ctx)
            if err != nil { return }
            select {
            case jobs <- msg:
            case <-ctx.Done():
                return
            }
        }
    }()

    <-ctx.Done()
    wg.Wait()
    return ctx.Err()
}

The bounded channel is the concurrency clamp. Kafka can push as fast as it wants; the worker pool consumes at its own pace; backpressure propagates back to Kafka’s consumer offset naturally.

Why not actors here? Because the work items are stateless — you’re processing events, not maintaining per-entity state. The overhead of an actor (mailbox, dispatch, ownership) is unjustified. CSP is the right fit.

Why not mutex + a worker loop? You could, but the channel primitive is exactly the right shape — bounded capacity + safe cross-goroutine handoff + graceful shutdown — without needing to build those three features yourself.

The Batch Aggregation — Pipelines + errgroup

Nightly rollup: read from storage, compute per-account aggregates, write back.

func runRollup(ctx context.Context, input <-chan Event) error {
    g, gctx := errgroup.WithContext(ctx)

    // Stage 1: parse
    parsed := make(chan ParsedEvent, 64)
    g.Go(func() error {
        defer close(parsed)
        return parseStage(gctx, input, parsed)
    })

    // Stage 2: aggregate (keyed by account)
    agged := make(chan Aggregate, 64)
    g.Go(func() error {
        defer close(agged)
        return aggregateStage(gctx, parsed, agged)
    })

    // Stage 3: persist
    g.Go(func() error {
        return persistStage(gctx, agged)
    })

    return g.Wait()
}

Three stages in a pipeline. Each stage is a goroutine, connected by bounded channels. errgroup ties them together: first error cancels the whole pipeline.

The aggregation stage internally uses a map protected by a mutex, because it’s a single goroutine reading the map — no contention at all, but still safe if a future change introduces more readers.

This is textbook CSP: the topology of channels is the architecture. Read the code and the shape of the computation is obvious.

The Cross-Service Handoff — Durable Queues

IronSys talks to two other services: a billing service (async, eventually consistent) and an auth service (sync, immediate).

For billing: a dedicated NATS JetStream subject with at-least-once delivery. Usage events go in one end; the billing service reads them. The emission codepath has a local write-ahead log so that if NATS is briefly down, events buffer on disk and replay when the connection recovers.

For auth: gRPC with tight timeouts. Caller owns completion. If auth is slow, the API handler’s deadline fires and the request fails fast.

Two different ownership models for two different shapes of work. See: RPC vs NATS: Who Owns Completion.

How the Primitives Map

Summarizing which primitive serves which job in IronSys:

Work shapePrimitiveWhy
HTTP request handlingStock net/http + goroutine per requestLanguage default, right for stateless
Hot shared state (rate limiter, cache)Mutex / atomicSimplest primitive that works
Stateful user sessionsGoroutine-per-entity (actor-like)Isolated state, message-passing, serial processing
Session directoryRWMutex-protected mapShared lookup, read-heavy
Streaming event processingBounded channel + worker pool (CSP)Backpressure, parallelism, graceful shutdown
Multi-stage data pipelineCSP pipeline + errgroupStage topology = architecture; first-error cancels all
Async cross-service handoffDurable queue (NATS JetStream / Kafka)Receiver owns completion, at-least-once delivery
Sync cross-service callgRPC with ctx timeoutCaller owns completion, fast failure

Notice: all four concurrency pillars show up. Mutexes in the rate limiter. CSP in the event pipeline. Actors (in pattern) in the session runtime. (STM is missing; it would show up if I were doing this in Clojure or Haskell.)

What This Architecture Gets Wrong

Every architecture has weaknesses. IronSys’s are real:

  • The actor pattern isn’t real actors. Without Erlang-style supervision, if a session goroutine panics, Go’s default behavior is to kill the process. Adding panic recovery per-session is easy but not free. In practice, most teams hit this 6 months in, add a recovery wrapper, and move on.
  • Bounded channels can mask slow downstream. If a channel fills up and the producer blocks, that’s backpressure — great. But if the channel is buffered too large, you can buffer a lot of work into memory before realizing downstream is slow. Tune buffer sizes with measurements, not guesses.
  • Goroutine-per-entity has a per-session baseline cost. Cheap but not free. A million sessions is ~2.5GB of goroutine stacks. For services where most entities are inactive, a lazy pattern (spin up on activity, suspend to disk on idle) is better.
  • Mixing paradigms cognitively. New engineers have to learn four patterns instead of one. The productivity hit is real for the first two weeks; the payoff is in the next two years.

What This Blueprint Is Really Selling

A system with four work shapes should have four concurrency patterns, not one stretched to cover everything. The four pillars aren’t theoretical; they map to real design decisions, and production Go services that use them deliberately are easier to reason about than those that don’t.

What IronSys is really selling is intentional heterogeneity. Every primitive is there for a reason. Every reason is traceable to a specific failure mode you want to prevent. The architecture should be legible — a new engineer reading the code should understand why a channel is there instead of a mutex, why a session has its own goroutine instead of being a struct in a shared map, why billing goes through a durable queue instead of a gRPC call.

If you can’t answer “why this primitive here,” the code isn’t finished. It’s just working, for now.

Blueprints are useful precisely because they’re generic. The specifics of your system will be different. But the decision framework — what’s the work shape, what’s the failure mode, what’s the right primitive — is the same every time.


🎧 More Ways to Consume This Content

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.