Is IronSys a real open-source project?

No — it's a composite blueprint. The architecture reflects production patterns I've seen work well; calling it something lets me reference it concretely instead of saying "a generic service" every paragraph.

Why not just pick one concurrency model and stick with it?

Because different parts of a production system have different shapes. A connection handler is not a billing aggregator is not a retry loop. Forcing one model onto all of them leads to awkward code for at least two of the three.

How do I actually know which primitive to use for a new piece of functionality?

Ask what failure mode would be catastrophic. If it's "work got dropped," you need durability (queue, retry, DLQ). If it's "too many things alive at once," you need bounded concurrency (pool, semaphore, bounded channel). If it's "state got corrupted," you need isolation (actor, mutex, STM). Pick the primitive whose design addresses that failure.

Is this overkill for a small service?

Yes. Small services should start with a request handler, a database, and a mutex where needed. IronSys-level layering appears when you've got dozens of services, nontrivial stateful entities, and cost-attribution requirements. Don't build for scale you don't have.

How does this hold up under actual load?

That's the test every architecture has to pass, and the one most fail. The blueprint reflects services I've seen survive production load — but every real system has to be measured and tuned. Don't trust a blueprint; trust production telemetry.

IronSys: A Production Blueprint for Modern Concurrency

After Four Pillars of Concurrency, the natural question: what does a system actually look like when it uses all of them deliberately? IronSys is a composite blueprint — the concurrency architecture I'd build today if I were starting over, with the trade-offs each choice buys.

October 22, 2025

Harrison Guo

10 min read

System Design Backend Engineering

In the last post I walked through the four concurrency pillars — shared memory + locks, CSP, actors, STM — and argued that real systems mix them on purpose. Someone reasonably asked: okay, but what does that actually look like? Fair question. Abstract taxonomy is less useful than a worked example.

IronSys is that worked example. It’s a composite blueprint — not a real service, but representative of a class of services I’ve designed, helped design, or debugged in production. Let’s say it’s a mid-sized backend system: public API, stateful user sessions, streaming data in, aggregation and reporting out. The kind of thing that appears in the middle of any serious platform.

The interesting part isn’t the features. It’s which concurrency primitive shows up where, and why.

tl;dr — IronSys is a composite production blueprint: a multi-service Go backend with stateful user sessions, streaming ingest, and usage aggregation. It uses CSP channels for pipelines and coordination, a goroutine-per-entity actor pattern for stateful sessions, mutexes and atomics for hot shared counters, and durable queues for cross-service handoff. Each primitive is picked for a specific failure mode. The pattern is not “mix for variety”; it’s “match the primitive to the work.”

The System Shape

Before deciding on concurrency primitives, sketch the work shapes. IronSys has four:

Public API — request/response, modest concurrency, latency-sensitive. The classic HTTP backend.
Live sessions — stateful, long-lived per-user entities. Think multiplayer game server, collaborative editor, real-time dashboard.
Streaming ingest — high-throughput events arriving over Kafka/NATS, fanned out to workers for processing.
Batch aggregation — periodic rollup jobs that read from storage, compute, write back.

Four shapes, four concurrency patterns. The wrong design would apply the same primitive to all four. The right design picks each separately.

flowchart LR
    subgraph Shapes["Work shapes"]
        S1["1. Public API
stateless · request/response"]
        S2["2. Live sessions
stateful · long-lived"]
        S3["3. Streaming ingest
high throughput · stateless"]
        S4["4. Batch aggregation
pipeline · scheduled"]
    end

    subgraph Primitives["Concurrency primitives"]
        P1["Goroutine + mutex
per-request handler"]
        P2["Goroutine-per-entity
actor-like · private state"]
        P3["Bounded channel + worker pool
CSP · backpressure"]
        P4["CSP pipeline + errgroup
staged · cancellable"]
    end

    S1 --> P1
    S2 --> P2
    S3 --> P3
    S4 --> P4

    classDef shape fill:#e8f4f8,stroke:#2c5282
    classDef prim fill:#f0fff4,stroke:#2f855a
    class Shapes shape
    class Primitives prim

The API Handlers

Nothing fancy. Stock Go HTTP server. Each request is its own goroutine (Go’s runtime does this automatically). Shared state — rate limiters, cache, config — is protected by mutexes or atomics:

type RateLimiter struct {
    mu      sync.Mutex
    buckets map[string]*bucket
}

func (r *RateLimiter) Allow(key string) bool {
    r.mu.Lock()
    defer r.mu.Unlock()
    b, ok := r.buckets[key]
    if !ok {
        b = newBucket()
        r.buckets[key] = b
    }
    return b.allow()
}

Obvious choice. The contention is bounded by request rate, the state is small, a mutex is the simplest possible tool. Over-engineering here — sharded maps, lock-free data structures — buys nothing.

What IronSys does here that many teams miss: every handler is context-aware from request entry:

func (s *Server) HandleFoo(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    defer cancel()

    result, err := s.service.Foo(ctx, parseReq(r))
    writeResponse(w, result, err)
}

Context flows everywhere downstream. The handler layer is boring; that’s the point.

The Live Sessions — Actor Pattern in Go

Each active user session is a long-lived goroutine with an inbox channel. I call this the goroutine-per-entity pattern — it’s Erlang actors without the runtime, built from Go primitives.

type Session struct {
    id       SessionID
    mailbox  chan SessionCmd  // the "actor" inbox
    shutdown chan struct{}
    state    sessionState      // private to this goroutine
}

type SessionCmd struct {
    op     string
    args   interface{}
    reply  chan<- SessionReply // optional reply channel
}

func runSession(ctx context.Context, s *Session) {
    defer close(s.mailbox)
    for {
        select {
        case cmd := <-s.mailbox:
            s.handle(cmd)
        case <-s.shutdown:
            s.flush() // persist final state
            return
        case <-ctx.Done():
            return
        }
    }
}

Why this pattern, not “session is a struct with a mutex”?

State is private to one goroutine. No sharing, no locks, no lock-ordering bugs. The session state is accessed by exactly one execution context.
Serial message processing. Commands process one at a time, in FIFO order. Business invariants hold naturally.
Natural location for cross-session coordination. Each session is a message destination. Broadcasting to all sessions, or routing a command to a specific session, is just “send on its inbox.”
Clean lifecycle. The goroutine runs until shutdown or ctx.Done. State is flushed once, on exit. No race between “is this session still alive” and “did we finish writing its state.”

The manager that creates and routes to sessions looks like:

type SessionManager struct {
    mu       sync.RWMutex
    sessions map[SessionID]*Session
}

func (m *SessionManager) Get(id SessionID) (*Session, bool) {
    m.mu.RLock()
    defer m.mu.RUnlock()
    s, ok := m.sessions[id]
    return s, ok
}

func (m *SessionManager) Start(ctx context.Context, id SessionID) *Session {
    m.mu.Lock()
    defer m.mu.Unlock()

    s, ok := m.sessions[id]
    if ok { return s }

    s = newSession(id)
    m.sessions[id] = s
    go runSession(ctx, s) // supervisor goroutine
    return s
}

Note the mixing: the manager uses a mutex-protected map (shared state with a clear owner), individual sessions use the actor pattern (isolated state, message-passing). Two primitives, picked per-job.

This pattern scales to millions of sessions because goroutines are cheap. I’ve seen this exact pattern serve 400K concurrent sessions on a single pod.

The Streaming Ingest — Bounded Worker Pool (CSP)

Kafka consumer feeding a worker pool. Canonical CSP territory:

func runConsumer(ctx context.Context, cons *kafka.Consumer) error {
    jobs := make(chan Event, 256)
    var wg sync.WaitGroup

    // Fixed worker pool
    for i := 0; i < workerCount; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for {
                select {
                case job, ok := <-jobs:
                    if !ok { return }
                    if err := process(ctx, job); err != nil {
                        log.Error(err)
                    }
                case <-ctx.Done():
                    return
                }
            }
        }()
    }

    // Producer
    go func() {
        defer close(jobs)
        for {
            msg, err := cons.ReadMessage(ctx)
            if err != nil { return }
            select {
            case jobs <- msg:
            case <-ctx.Done():
                return
            }
        }
    }()

    <-ctx.Done()
    wg.Wait()
    return ctx.Err()
}

The bounded channel is the concurrency clamp. Kafka can push as fast as it wants; the worker pool consumes at its own pace; backpressure propagates back to Kafka’s consumer offset naturally.

Why not actors here? Because the work items are stateless — you’re processing events, not maintaining per-entity state. The overhead of an actor (mailbox, dispatch, ownership) is unjustified. CSP is the right fit.

Why not mutex + a worker loop? You could, but the channel primitive is exactly the right shape — bounded capacity + safe cross-goroutine handoff + graceful shutdown — without needing to build those three features yourself.

The Batch Aggregation — Pipelines + errgroup

Nightly rollup: read from storage, compute per-account aggregates, write back.

func runRollup(ctx context.Context, input <-chan Event) error {
    g, gctx := errgroup.WithContext(ctx)

    // Stage 1: parse
    parsed := make(chan ParsedEvent, 64)
    g.Go(func() error {
        defer close(parsed)
        return parseStage(gctx, input, parsed)
    })

    // Stage 2: aggregate (keyed by account)
    agged := make(chan Aggregate, 64)
    g.Go(func() error {
        defer close(agged)
        return aggregateStage(gctx, parsed, agged)
    })

    // Stage 3: persist
    g.Go(func() error {
        return persistStage(gctx, agged)
    })

    return g.Wait()
}

Three stages in a pipeline. Each stage is a goroutine, connected by bounded channels. errgroup ties them together: first error cancels the whole pipeline.

The aggregation stage internally uses a map protected by a mutex, because it’s a single goroutine reading the map — no contention at all, but still safe if a future change introduces more readers.

This is textbook CSP: the topology of channels is the architecture. Read the code and the shape of the computation is obvious.

The Cross-Service Handoff — Durable Queues

IronSys talks to two other services: a billing service (async, eventually consistent) and an auth service (sync, immediate).

For billing: a dedicated NATS JetStream subject with at-least-once delivery. Usage events go in one end; the billing service reads them. The emission codepath has a local write-ahead log so that if NATS is briefly down, events buffer on disk and replay when the connection recovers.

For auth: gRPC with tight timeouts. Caller owns completion. If auth is slow, the API handler’s deadline fires and the request fails fast.

Two different ownership models for two different shapes of work. See: RPC vs NATS: Who Owns Completion.

How the Primitives Map

Summarizing which primitive serves which job in IronSys:

Work shape	Primitive	Why
HTTP request handling	Stock `net/http` + goroutine per request	Language default, right for stateless
Hot shared state (rate limiter, cache)	Mutex / atomic	Simplest primitive that works
Stateful user sessions	Goroutine-per-entity (actor-like)	Isolated state, message-passing, serial processing
Session directory	RWMutex-protected map	Shared lookup, read-heavy
Streaming event processing	Bounded channel + worker pool (CSP)	Backpressure, parallelism, graceful shutdown
Multi-stage data pipeline	CSP pipeline + errgroup	Stage topology = architecture; first-error cancels all
Async cross-service handoff	Durable queue (NATS JetStream / Kafka)	Receiver owns completion, at-least-once delivery
Sync cross-service call	gRPC with ctx timeout	Caller owns completion, fast failure

Work shape HTTP request handling

Primitive Stock net/http + goroutine per request

Why Language default, right for stateless

Work shape Hot shared state (rate limiter, cache)

Primitive Mutex / atomic

Why Simplest primitive that works

Work shape Stateful user sessions

Primitive Goroutine-per-entity (actor-like)

Why Isolated state, message-passing, serial processing

Work shape Session directory

Primitive RWMutex-protected map

Why Shared lookup, read-heavy

Work shape Streaming event processing

Primitive Bounded channel + worker pool (CSP)

Why Backpressure, parallelism, graceful shutdown

Work shape Multi-stage data pipeline

Primitive CSP pipeline + errgroup

Why Stage topology = architecture; first-error cancels all

Work shape Async cross-service handoff

Primitive Durable queue (NATS JetStream / Kafka)

Why Receiver owns completion, at-least-once delivery

Work shape Sync cross-service call

Primitive gRPC with ctx timeout

Why Caller owns completion, fast failure

Notice: all four concurrency pillars show up. Mutexes in the rate limiter. CSP in the event pipeline. Actors (in pattern) in the session runtime. (STM is missing; it would show up if I were doing this in Clojure or Haskell.)

What This Architecture Gets Wrong

Every architecture has weaknesses. IronSys’s are real:

The actor pattern isn’t real actors. Without Erlang-style supervision, if a session goroutine panics, Go’s default behavior is to kill the process. Adding panic recovery per-session is easy but not free. In practice, most teams hit this 6 months in, add a recovery wrapper, and move on.
Bounded channels can mask slow downstream. If a channel fills up and the producer blocks, that’s backpressure — great. But if the channel is buffered too large, you can buffer a lot of work into memory before realizing downstream is slow. Tune buffer sizes with measurements, not guesses.
Goroutine-per-entity has a per-session baseline cost. Cheap but not free. A million sessions is ~2.5GB of goroutine stacks. For services where most entities are inactive, a lazy pattern (spin up on activity, suspend to disk on idle) is better.
Mixing paradigms cognitively. New engineers have to learn four patterns instead of one. The productivity hit is real for the first two weeks; the payoff is in the next two years.

What This Blueprint Is Really Selling

A system with four work shapes should have four concurrency patterns, not one stretched to cover everything. The four pillars aren’t theoretical; they map to real design decisions, and production Go services that use them deliberately are easier to reason about than those that don’t.

What IronSys is really selling is intentional heterogeneity. Every primitive is there for a reason. Every reason is traceable to a specific failure mode you want to prevent. The architecture should be legible — a new engineer reading the code should understand why a channel is there instead of a mutex, why a session has its own goroutine instead of being a struct in a shared map, why billing goes through a durable queue instead of a gRPC call.

If you can’t answer “why this primitive here,” the code isn’t finished. It’s just working, for now.

Blueprints are useful precisely because they’re generic. The specifics of your system will be different. But the decision framework — what’s the work shape, what’s the failure mode, what’s the right primitive — is the same every time.

From Locks to Actors: The Four Pillars of Modern Concurrency — the taxonomy behind the choices in IronSys.
Go’s Concurrency Is About Structure, Not Speed — chan and context as the glue across all of these.
RPC vs NATS: It’s Not About Sync vs Async — It’s About Who Owns Completion — the cross-service handoff choices.
Testing Real-World Go Backends Isn’t What Many People Think — how you verify a system like this actually holds up.

🎧 More Ways to Consume This Content

HarrisonSecurityLab Podcast

I occasionally advise small teams on backend reliability, Go performance, and production AI systems. Learn more: /services

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.

IronSys: A Production Blueprint for Modern Concurrency

After Four Pillars of Concurrency, the natural question: what does a system actually look like when it uses all of them deliberately? IronSys is a composite blueprint — the concurrency architecture I'd build today if I were starting over, with the trade-offs each choice buys.

Table of Contents

The System Shape

The API Handlers

The Live Sessions — Actor Pattern in Go

The Streaming Ingest — Bounded Worker Pool (CSP)

The Batch Aggregation — Pipelines + errgroup

The Cross-Service Handoff — Durable Queues

How the Primitives Map

What This Architecture Gets Wrong

What This Blueprint Is Really Selling

🎧 More Ways to Consume This Content

Comments

Leave a Comment

IronSys: A Production Blueprint for Modern Concurrency

After Four Pillars of Concurrency, the natural question: what does a system actually look like when it uses all of them deliberately? IronSys is a composite blueprint — the concurrency architecture I'd build today if I were starting over, with the trade-offs each choice buys.

Table of Contents

The System Shape

The API Handlers

The Live Sessions — Actor Pattern in Go

The Streaming Ingest — Bounded Worker Pool (CSP)

The Batch Aggregation — Pipelines + errgroup

The Cross-Service Handoff — Durable Queues

How the Primitives Map

What This Architecture Gets Wrong

What This Blueprint Is Really Selling

Related

🎧 More Ways to Consume This Content

[ Agent_Architecture_Notes ]

Related Articles

From Locks to Actors: The Four Pillars of Modern Concurrency

Why Go Handles Millions of Connections: User-Space Context Switching, Explained

Comments

Leave a Comment

[ Connect_With_Me ]