Go Context in Distributed Systems: What Actually Works in Production

context.Context is not documentation, not a metadata bag, and not optional. A production-hardened guide to cancellation propagation, the background-goroutine trap, and the patterns that keep services alive when downstreams slow down.

February 13, 2026
Harrison Guo
9 min read
System Design Backend Engineering

The bug was alive for three weeks. On a normal day it cost nothing. On the day it activated, it nearly took the service down.

The pattern was simple. An HTTP handler had to fetch data from three downstream gRPC services and merge the results. The team had done the disciplined thing: set a 5-second deadline on the request context, propagate it all the way through to the handler, use errgroup for parallelism. Except — and you’ve probably seen this one — the fan-out looked like this:

func handleRequest(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context() // has a 5-second deadline

    var a, b, c Result
    go func() { a, _ = callA(context.Background(), req) }() // ← here
    go func() { b, _ = callB(context.Background(), req) }() // ← here
    go func() { c, _ = callC(context.Background(), req) }() // ← here

    // ... some sync wait ...
    respond(w, merge(a, b, c))
}

Every day for three weeks, the downstreams responded in 20 ms and everything worked. Then one of them — the slow path — got a planned capacity change that degraded it from 20 ms to 20 seconds. Not a crash. Just slow. And the HTTP handler’s 5-second deadline did exactly what it promised: returned a timeout to the client.

But the three goroutines kept running. They didn’t get the memo.

Within ninety seconds, goroutines climbed from 2,000 to 80,000, connection pools drained, the GC started to choke on the churn, and the entire service had to be restarted twice before someone figured out that context.Background() inside a handler-scoped goroutine isn’t a stylistic choice — it’s a goroutine leak with extra steps.

tl;drcontext.Context is not documentation. It is the runtime boundary between “this work still matters” and “this work should stop.” Every time you launch a goroutine from inside a request-scoped context and fail to propagate the parent ctx, you are creating work that outlives its reason to exist. Under load, that’s what brings a service down — not CPU, not memory, not the downstream. Goroutines that won’t die.


What Context Actually Is

The single biggest mistake I see engineers make is treating context.Context like an argument convention — “the standard library says I should pass one, so I pass one.” That’s the wrong mental model.

context.Context is four things, in order of importance:

  1. A cancellation signal. When the context is done (cancelled, deadline exceeded), every goroutine holding it is being asked to stop.
  2. A deadline. How much wall-clock budget this work has before it’s considered failed.
  3. An error cause. Why the context ended (context.Canceled, context.DeadlineExceeded, or a custom reason via context.Cause).
  4. A narrow channel for request-scoped metadata. Trace ID, deadline, auth principal. That’s about it.

Notice what’s not on the list: data transport, DI container, settings object, session store, cache. If you’re using context to pass any of those, you’ve already lost.

Context is control flow, not data.

flowchart TD
    H["HTTP handler
ctx · 5s deadline"] --> G1 H --> G2 H --> G3 G1["goroutine A
callA(gctx · req)"] --> D1[(Downstream A)] G2["goroutine B
callB(gctx · req)"] --> D2[(Downstream B)] G3["goroutine C
callC(gctx · req)"] --> D3[(Downstream C)] Cancel{{"ctx.Done() fires
timeout, client gone,
or sibling errored"}} -.->|broadcast| G1 Cancel -.->|broadcast| G2 Cancel -.->|broadcast| G3 H -.-> Cancel classDef handler fill:#e8f4f8,stroke:#2c5282 classDef worker fill:#f0fff4,stroke:#2f855a classDef cancel fill:#fed7d7,stroke:#c53030,stroke-dasharray:5 5 class H handler class G1,G2,G3 worker class Cancel cancel

When the parent ctx is cancelled, the signal propagates to every goroutine that inherited it. Every spawned call drops the work it was doing and returns. That’s the whole value of context — and the reason context.Background() inside a spawned goroutine breaks everything: it severs this tree.

Every correct use of context follows from this. The moment you treat it as something else — a way to pass a config value, a way to smuggle a feature flag, a way to avoid changing a function signature — you start breaking the cancellation semantics that make it useful at all.

The Five Patterns That Work

After enough production debugging, a small set of patterns covers 95% of cases.

1. Always propagate, never replace

The outer context defines the lifetime of the work. Any goroutine spawned to do part of that work must inherit it.

// ✗ Wrong: spawned work is unkillable
go func() { doWork(context.Background()) }()

// ✓ Right: spawned work dies with the parent
go func() { doWork(ctx) }()

If your linter isn’t flagging context.Background() or context.TODO() inside functions that already have a ctx in scope, fix your linter. contextcheck in golangci-lint catches most of these.

2. Fan out with errgroup.WithContext

Raw goroutines + sync.WaitGroup is the wrong primitive for fan-out calls to downstreams. Use golang.org/x/sync/errgroup:

func fanOut(ctx context.Context, req Request) (A, B, C, error) {
    g, gctx := errgroup.WithContext(ctx)

    var a A; var b B; var c C

    g.Go(func() error {
        var err error
        a, err = callA(gctx, req)
        return err
    })
    g.Go(func() error {
        var err error
        b, err = callB(gctx, req)
        return err
    })
    g.Go(func() error {
        var err error
        c, err = callC(gctx, req)
        return err
    })

    if err := g.Wait(); err != nil {
        return A{}, B{}, C{}, err
    }
    return a, b, c, nil
}

Two properties that matter:

  • gctx inherits the parent’s deadline and cancellation. The spawned calls die when the caller gives up.
  • The first error cancels the sibling calls. If callA fails fast, the in-flight callB and callC stop wasting work.

Both are invisible in the code. That’s the point. You get the right behavior without having to think about it per-callsite.

3. Cap the subtree with WithTimeout

The parent gives you the outer boundary. Sometimes you want a tighter one for a specific piece of work:

func callSlowly(ctx context.Context, req Request) (Result, error) {
    ctx, cancel := context.WithTimeout(ctx, 800*time.Millisecond)
    defer cancel() // ← don't leak the timer

    return client.Call(ctx, req)
}

Three things people get wrong here:

  • Forgetting defer cancel() leaks the timer goroutine. It’s small, but it adds up under load.
  • Using WithTimeout where WithDeadline makes more sense — if your budget is “finish by a fixed wall-clock time,” use WithDeadline. Timers and deadlines aren’t the same.
  • Stacking timeouts that exceed the parent. A WithTimeout(ctx, 30*time.Second) on a context that already has a 5-second deadline has a 5-second effective timeout. If you’re setting 30 seconds, you probably meant to replace the parent, not extend it — which is almost never what you want. Check your assumptions.

4. Make cancellation observable

In a handler loop or polling loop, cancellation must be checked at every iteration:

for {
    select {
    case <-ctx.Done():
        return ctx.Err()
    case work := <-queue:
        if err := process(ctx, work); err != nil {
            return err
        }
    }
}

I’ve debugged a service that looked like it was “stuck” but was actually processing a queue in a tight loop that never checked ctx.Done(). The cancellation had fired long ago; the code just didn’t care.

5. Return ctx.Err() at the right boundary

When a context ends, the standard library returns context.Canceled or context.DeadlineExceeded. Your code needs to either:

  • Pass it up, because the caller asked for cancellation and you’re honoring it, or
  • Translate it, because your API surface speaks a different error vocabulary (gRPC codes, HTTP status codes, domain errors).
result, err := downstream.Call(ctx, req)
if err != nil {
    // Was this our fault, or theirs?
    if errors.Is(err, context.DeadlineExceeded) {
        return Result{}, status.Error(codes.DeadlineExceeded, "upstream deadline")
    }
    if errors.Is(err, context.Canceled) {
        return Result{}, status.Error(codes.Canceled, "caller cancelled")
    }
    return Result{}, err
}

If you don’t do this, the errors that reach your caller will be a mix of “the downstream is broken” and “you asked me to stop, remember?”, and your on-call will waste hours separating the two.

The Anti-Patterns

There are a handful of things that look fine and aren’t. These are the ones I see most.

context.Background() inside a spawned goroutine

The bug that opens this post. You already have a context in scope. Use it. Spawning with context.Background() breaks the cancellation chain and creates work that outlives the caller. It’s the single most common goroutine leak I’ve seen in production Go.

Passing the context by field instead of by argument

// ✗ Wrong
type Worker struct {
    ctx context.Context
}
func (w *Worker) Do() error { return callA(w.ctx) } // stale ctx

// ✓ Right
type Worker struct{}
func (w *Worker) Do(ctx context.Context) error { return callA(ctx) }

Context is per-call, not per-object. The moment you stash it in a struct, you’ve made it stale — the context from construction time is not the context from the current call. golangci-lint with the contextcheck linter enabled catches most of these. If your CI doesn’t run it, add it today.

Storing business data in context

// ✗ Wrong
ctx = context.WithValue(ctx, "currentUser", user)

// ✓ Right
ProcessOrder(ctx, user, order)

The rule is: if the function needs it to work, it goes in the signature. If it’s optional metadata that cross-cuts every call (trace ID, request ID, auth principal for logging), context is fine — but keep the key typed (not a raw string) and keep the set small.

Blanket rethrow without translating

Returning ctx.Err() from a library function when the caller doesn’t know about context produces baffling errors two layers up. If you’re writing something reusable, translate context errors to your own error type at the boundary.

A Small Debugging Tool

When you suspect a context-propagation problem, the fastest way to find it is usually a goroutine dump under load. Something like this keeps one around:

// /debug/goroutines — read-only, auth-gated in prod
mux.HandleFunc("/debug/goroutines", func(w http.ResponseWriter, r *http.Request) {
    p := pprof.Lookup("goroutine")
    p.WriteTo(w, 1) // 1 = text format with stacks
})

Ship it behind auth, point a cron or load test at the thing you’re trying to exercise, and diff two snapshots 10 seconds apart. Goroutines that persist across snapshots and aren’t in netpoll or runtime.park_m are your suspects. Nine times out of ten, when I follow the stack traces, the leaked goroutines were spawned from a handler that’s already returned — because someone wrote context.Background() inside a go func().

Where This Leaves You

The moment you treat context.Context as decoration — as a parameter you pass because the lint rule told you to — you’ve already lost the benefit. The entire reason context exists is to be the one shared signal that ties the lifetime of spawned work to the lifetime of its cause. Ignore that and you get goroutine leaks. Honor it and you get a service that drains cleanly under partial failure.

In a monolith, you can get away with sloppy cancellation because the damage stays local. In a distributed system, where one slow downstream can cascade through three layers of fan-out into a goroutine explosion, you cannot. The cost of sloppy context handling scales with the number of network hops, and modern architectures have many.

The fix is boring. Use errgroup.WithContext for fan-out. Never context.Background() inside a handler-scoped goroutine. Translate context errors at API boundaries. Check <-ctx.Done() in loops. Add a /debug/goroutines endpoint and actually look at it.

There are no clever moves here. There’s only the habit of passing context correctly, every time, for years — and the services that outlast the ones that didn’t.


🎧 More Ways to Consume This Content

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.