gRPC Interceptors in Production: Design Patterns That Survive Real Load

gRPC interceptors are where cross-cutting concerns live — auth, tracing, retry, metrics, rate limiting. Most examples online show toy single-interceptor demos. Production systems need to stack, order, and compose them correctly. A practical guide.

March 24, 2026
Harrison Guo
9 min read
System Design Backend Engineering

gRPC interceptors are the middleware pattern, specialized for gRPC. If you’ve written HTTP middleware before, the shape is familiar — a function that wraps a call, can observe or modify the request, pass to the next handler, then observe or modify the response. The difference: gRPC’s type system makes the flavors (unary, server-stream, client-stream, bidi) explicit, and chain ordering matters more than most people realize.

Most online examples show a single toy interceptor. Production systems stack five to ten of them per service. Getting the composition right — ordering, concern separation, testability — is half of running a gRPC-based microservice well.

tl;dr — gRPC interceptors are middleware with more explicit types. Chain them outside-in: observability wraps everything, then throttling, then auth, then retry, then the actual service. Keep each interceptor focused on one concern; the moment an interceptor does two things you’re writing coupled middleware. Stream interceptors are trickier than unary — don’t copy-paste unary logic into stream without thinking. Test the chain composition with bufconn, not just each interceptor in isolation.


The Four Interceptor Types

gRPC has four interceptor signatures, two for client, two for server:

  • Unary server interceptor: wraps a single request → single response call.
  • Stream server interceptor: wraps streaming RPCs (server-stream, client-stream, bidi).
  • Unary client interceptor: wraps the client side of a unary call.
  • Stream client interceptor: wraps the client side of a streaming call.

Unary interceptors are easy. Stream interceptors are harder because you’re wrapping a bidirectional wire, not a single call.

Example unary server interceptor:

func loggingInterceptor(ctx context.Context, req interface{},
    info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
    start := time.Now()
    resp, err := handler(ctx, req)
    log.Printf("method=%s duration=%s err=%v", info.FullMethod, time.Since(start), err)
    return resp, err
}

Register it:

s := grpc.NewServer(grpc.UnaryInterceptor(loggingInterceptor))

Straightforward. Now stack five of them.

Chaining and Order

Real services need multiple interceptors. gRPC’s standard library gives you grpc.ChainUnaryInterceptor(...) (since 1.25), or you can use google.golang.org/grpc/interceptor helpers:

s := grpc.NewServer(
    grpc.ChainUnaryInterceptor(
        observabilityInterceptor,  // outermost
        rateLimitInterceptor,
        authInterceptor,
        validationInterceptor,
        businessLogicContextInterceptor, // innermost
    ),
)

Chain order matters enormously. Interceptors execute outside-in on the way to the handler, inside-out on the way back. Put the wrong interceptor outside the wrong one and you get bugs that are hard to debug.

Canonical order I use:

flowchart LR
    Client([gRPC client]) --> I1
    I1["Observability
tracing · metrics · logging"] --> I2 I2["Rate limiting / quota"] --> I3 I3["Auth
authn · authz"] --> I4 I4["Validation"] --> I5 I5["Retry / idempotency"] --> I6 I6["Context enrichment"] --> Handler{{"Business handler"}} classDef outer fill:#fef5e7,stroke:#b7791f classDef mid fill:#e8f4f8,stroke:#2c5282 classDef inner fill:#f0fff4,stroke:#2f855a class I1 outer class I2,I3,I4 mid class I5,I6 inner

Outside-in on the way to the handler, inside-out on the way back. Observability must wrap everything — so it sees every rejection, every rate-limit hit, every failed auth — otherwise you have operational blind spots. Details:

  1. Observability (tracing + metrics + logging) — outermost. You want to see every request, including the ones that get rejected by later interceptors. If observability is inside auth, unauth’d attempts are invisible — a security-relevant blind spot.

  2. Rate limiting / quota — before auth. Why? Because auth involves token verification (DB lookup, JWT parsing, external identity service), and you don’t want unauthenticated requests to cost you CPU. Rate-limit first, authenticate second.

  3. Auth (authentication + authorization) — before business logic. Reject unauthenticated/unauthorized requests early.

  4. Validation (request shape, basic sanity) — before business logic. Catches malformed requests before they hit service code.

  5. Retry / idempotency handling — closer to business. Only retry what actually made it through auth.

  6. Request context enrichment (trace IDs, user metadata) — innermost. Populate context with validated data for the service to use.

Inverted order produces real bugs. I’ve seen auth outside observability (auth failures weren’t logged). Retry outside rate limiter (a retry storm blew through the rate limit). Validation outside observability (validation failures invisible in metrics). Each one a real incident.

Keeping Interceptors Focused

The rule: one concern per interceptor. The moment you have an “auth-and-logging” interceptor, you’re coupling concerns that should evolve separately.

Concretely:

  • Don’t: single “observability” interceptor that does tracing, metrics, and logging in one function.
  • Do: three interceptors (tracingInterceptor, metricsInterceptor, loggingInterceptor), chained.

Cost: three function-call overheads instead of one. Marginal.

Benefit: you can swap tracing backends without touching logging. You can disable metrics in tests without disabling tracing. Each interceptor is testable in isolation.

This is the same argument for Unix pipes over monolithic commands. Composition beats monoliths.

Common Interceptor Recipes

Real interceptors I’ve written variants of many times:

Tracing (OpenTelemetry)

Use the otelgrpc integration from go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc. Don’t write your own — the ecosystem is mature. Current idiomatic setup uses a StatsHandler, which hooks deeper than the interceptor chain and captures stream events correctly:

import "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"

s := grpc.NewServer(
    grpc.StatsHandler(otelgrpc.NewServerHandler()),
    grpc.ChainUnaryInterceptor( /* your app interceptors */ ),
)

Older codebases still use otelgrpc.UnaryServerInterceptor() and otelgrpc.StreamServerInterceptor() — those are deprecated but still work. Migrate when convenient; don’t rewrite in a panic.

Metrics

Prometheus histogram of request duration per method:

var (
    reqDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "grpc_server_request_duration_seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "code"},
    )
)

func metricsInterceptor(ctx context.Context, req interface{},
    info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
    start := time.Now()
    resp, err := handler(ctx, req)
    code := status.Code(err).String()
    reqDuration.WithLabelValues(info.FullMethod, code).Observe(time.Since(start).Seconds())
    return resp, err
}

Note: cardinality of method is bounded (you know your service’s methods). Cardinality of code is bounded (gRPC codes are a fixed enum). Don’t add user-id or request-id as labels — that’s cardinality-explosion territory.

Auth

Extract bearer token from metadata, verify, inject user context:

func authInterceptor(ctx context.Context, req interface{},
    info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
    md, ok := metadata.FromIncomingContext(ctx)
    if !ok {
        return nil, status.Error(codes.Unauthenticated, "no metadata")
    }
    tokens := md.Get("authorization")
    if len(tokens) == 0 {
        return nil, status.Error(codes.Unauthenticated, "no auth token")
    }

    claims, err := verifyToken(tokens[0])
    if err != nil {
        return nil, status.Error(codes.Unauthenticated, "invalid token")
    }

    // Skip certain public methods
    if isPublic(info.FullMethod) {
        return handler(ctx, req)
    }

    ctx = context.WithValue(ctx, userCtxKey{}, claims)
    return handler(ctx, req)
}

Key detail: add the user context here, near the boundary. Service code reads it from context. You don’t pass claims as argument through every service method.

Rate limiting

Token bucket per caller or per method:

func rateLimitInterceptor(limiter *rate.Limiter) grpc.UnaryServerInterceptor {
    return func(ctx context.Context, req interface{},
        info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
        if !limiter.Allow() {
            return nil, status.Error(codes.ResourceExhausted, "rate limited")
        }
        return handler(ctx, req)
    }
}

Production rate limiting is fancier — per-tenant, distributed state in Redis, burst capacity — but the shape is the same. Reject with ResourceExhausted before doing work.

Retry (client-side)

Client interceptor that retries on transient errors:

func retryClientInterceptor(attempts int) grpc.UnaryClientInterceptor {
    return func(ctx context.Context, method string, req, reply interface{},
        cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
        var err error
        for i := 0; i < attempts; i++ {
            err = invoker(ctx, method, req, reply, cc, opts...)
            if err == nil {
                return nil
            }
            if !isRetryable(err) {
                return err
            }
            backoff := time.Duration(1<<uint(i)) * 100 * time.Millisecond
            select {
            case <-time.After(backoff):
            case <-ctx.Done():
                return ctx.Err()
            }
        }
        return err
    }
}

Retry is one of the most dangerous interceptors. Get it wrong (no idempotency keys, retry non-idempotent operations, retry storm during outage) and it causes more production incidents than it prevents. Pair with grpc-middleware/retry if you can; it’s battle-tested.

The Stream Interceptor Trap

Stream interceptors are harder. The interceptor signature gives you a grpc.ServerStream, which is a bidirectional channel. Logging becomes:

func loggingStreamInterceptor(srv interface{}, ss grpc.ServerStream,
    info *grpc.StreamServerInfo, handler grpc.StreamHandler) error {
    start := time.Now()
    err := handler(srv, ss)
    log.Printf("stream=%s duration=%s err=%v", info.FullMethod, time.Since(start), err)
    return err
}

This only logs at stream-end, not per message. If you want per-message observability, you need to wrap the ServerStream itself:

type observedStream struct {
    grpc.ServerStream
    sent, recv int64
}

func (s *observedStream) SendMsg(m interface{}) error {
    atomic.AddInt64(&s.sent, 1)
    return s.ServerStream.SendMsg(m)
}

func (s *observedStream) RecvMsg(m interface{}) error {
    err := s.ServerStream.RecvMsg(m)
    if err == nil {
        atomic.AddInt64(&s.recv, 1)
    }
    return err
}

Then pass the wrapper to the handler. This is the pattern for any stream interceptor that needs per-message visibility.

Common mistakes:

  • Forgetting to propagate context to the wrapper. The wrapped stream’s Context() should be the enriched context.
  • Per-message overhead blows up long streams. A message-level log line is fine at 100 msgs/sec. At 100K msgs/sec, it’s your dominant cost.
  • State in the wrapper not thread-safe. Streams can be concurrent on the Send and Recv sides. Protect counters.

Testing Interceptor Chains

Unit test each interceptor in isolation:

func TestAuthInterceptor_NoToken(t *testing.T) {
    ctx := context.Background() // no metadata
    info := &grpc.UnaryServerInfo{FullMethod: "/my.Service/Method"}
    handler := func(ctx context.Context, req interface{}) (interface{}, error) {
        t.Fatal("handler should not be called")
        return nil, nil
    }

    _, err := authInterceptor(ctx, nil, info, handler)
    require.Equal(t, codes.Unauthenticated, status.Code(err))
}

Integration-test the chain end-to-end using bufconn (in-memory connection):

func TestChain_Ordering(t *testing.T) {
    lis := bufconn.Listen(1024 * 1024)
    defer lis.Close()

    s := grpc.NewServer(grpc.ChainUnaryInterceptor(observability, auth, business))
    pb.RegisterMyServer(s, &realImpl{})
    go s.Serve(lis)
    defer s.Stop()

    conn, _ := grpc.Dial("bufnet",
        grpc.WithContextDialer(func(ctx context.Context, _ string) (net.Conn, error) {
            return lis.DialContext(ctx)
        }),
        grpc.WithTransportCredentials(insecure.NewCredentials()),
    )
    defer conn.Close()

    client := pb.NewMyClient(conn)
    resp, err := client.Method(ctx, req)
    // assert on behavior end-to-end
}

Integration tests catch bugs that unit tests don’t: metadata propagation, interceptor ordering, context enrichment visible to the handler. Don’t skip them.

Patterns That Save Time

  • Use grpc-middleware/v2 (github.com/grpc-ecosystem/go-grpc-middleware/v2) for chain helpers, recovery, and batteries-included interceptors. Don’t reinvent every wheel.
  • Keep error semantics consistent. Every interceptor should return status.Error(code, msg) for failures. Don’t return raw Go errors — clients can’t parse them properly.
  • Skip-list for public methods. Auth and rate limit often need to skip health check and reflection endpoints. Keep the skip list in one place.
  • Per-service vs global interceptors. Most interceptors are global (tracing, metrics, auth). A few might be per-service (e.g., a bespoke rate limiter for a specific hot endpoint). Compose accordingly.
  • Panic recovery at the outermost layer. A panic in a handler shouldn’t kill the server. Use the recovery middleware from grpc-middleware or write your own, and put it first in the chain.

The Discipline That Makes This Work

Interceptors are the right tool for cross-cutting concerns — the things every RPC needs but the service code shouldn’t have to think about. The discipline is: one concern per interceptor, careful ordering, consistent error semantics, tested end-to-end.

The services I’ve seen do this well have clean business logic (because the cross-cutting stuff is outside it) and reliable operational behavior (because the interceptor chain is tested as a unit, not just piece-by-piece). The services that do it poorly have auth logic sprinkled through their handlers, tracing that randomly misses requests, and rate limiters that let certain code paths bypass.

Interceptor order is one of those details that looks tactical and turns out to be architectural. Get it right once; the service’s behavior improves every release.


🎧 More Ways to Consume This Content

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.