gRPC Interceptors in Production: Design Patterns That Survive Real Load
gRPC interceptors are where cross-cutting concerns live — auth, tracing, retry, metrics, rate limiting. Most examples online show toy single-interceptor demos. Production systems need to stack, order, and compose them correctly. A practical guide.
Table of Contents
gRPC interceptors are the middleware pattern, specialized for gRPC. If you’ve written HTTP middleware before, the shape is familiar — a function that wraps a call, can observe or modify the request, pass to the next handler, then observe or modify the response. The difference: gRPC’s type system makes the flavors (unary, server-stream, client-stream, bidi) explicit, and chain ordering matters more than most people realize.
Most online examples show a single toy interceptor. Production systems stack five to ten of them per service. Getting the composition right — ordering, concern separation, testability — is half of running a gRPC-based microservice well.
tl;dr — gRPC interceptors are middleware with more explicit types. Chain them outside-in: observability wraps everything, then throttling, then auth, then retry, then the actual service. Keep each interceptor focused on one concern; the moment an interceptor does two things you’re writing coupled middleware. Stream interceptors are trickier than unary — don’t copy-paste unary logic into stream without thinking. Test the chain composition with bufconn, not just each interceptor in isolation.
The Four Interceptor Types
gRPC has four interceptor signatures, two for client, two for server:
- Unary server interceptor: wraps a single request → single response call.
- Stream server interceptor: wraps streaming RPCs (server-stream, client-stream, bidi).
- Unary client interceptor: wraps the client side of a unary call.
- Stream client interceptor: wraps the client side of a streaming call.
Unary interceptors are easy. Stream interceptors are harder because you’re wrapping a bidirectional wire, not a single call.
Example unary server interceptor:
func loggingInterceptor(ctx context.Context, req interface{},
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
start := time.Now()
resp, err := handler(ctx, req)
log.Printf("method=%s duration=%s err=%v", info.FullMethod, time.Since(start), err)
return resp, err
}
Register it:
s := grpc.NewServer(grpc.UnaryInterceptor(loggingInterceptor))
Straightforward. Now stack five of them.
Chaining and Order
Real services need multiple interceptors. gRPC’s standard library gives you grpc.ChainUnaryInterceptor(...) (since 1.25), or you can use google.golang.org/grpc/interceptor helpers:
s := grpc.NewServer(
grpc.ChainUnaryInterceptor(
observabilityInterceptor, // outermost
rateLimitInterceptor,
authInterceptor,
validationInterceptor,
businessLogicContextInterceptor, // innermost
),
)
Chain order matters enormously. Interceptors execute outside-in on the way to the handler, inside-out on the way back. Put the wrong interceptor outside the wrong one and you get bugs that are hard to debug.
Canonical order I use:
flowchart LR
Client([gRPC client]) --> I1
I1["Observability
tracing · metrics · logging"] --> I2
I2["Rate limiting / quota"] --> I3
I3["Auth
authn · authz"] --> I4
I4["Validation"] --> I5
I5["Retry / idempotency"] --> I6
I6["Context enrichment"] --> Handler{{"Business handler"}}
classDef outer fill:#fef5e7,stroke:#b7791f
classDef mid fill:#e8f4f8,stroke:#2c5282
classDef inner fill:#f0fff4,stroke:#2f855a
class I1 outer
class I2,I3,I4 mid
class I5,I6 inner
Outside-in on the way to the handler, inside-out on the way back. Observability must wrap everything — so it sees every rejection, every rate-limit hit, every failed auth — otherwise you have operational blind spots. Details:
Observability (tracing + metrics + logging) — outermost. You want to see every request, including the ones that get rejected by later interceptors. If observability is inside auth, unauth’d attempts are invisible — a security-relevant blind spot.
Rate limiting / quota — before auth. Why? Because auth involves token verification (DB lookup, JWT parsing, external identity service), and you don’t want unauthenticated requests to cost you CPU. Rate-limit first, authenticate second.
Auth (authentication + authorization) — before business logic. Reject unauthenticated/unauthorized requests early.
Validation (request shape, basic sanity) — before business logic. Catches malformed requests before they hit service code.
Retry / idempotency handling — closer to business. Only retry what actually made it through auth.
Request context enrichment (trace IDs, user metadata) — innermost. Populate context with validated data for the service to use.
Inverted order produces real bugs. I’ve seen auth outside observability (auth failures weren’t logged). Retry outside rate limiter (a retry storm blew through the rate limit). Validation outside observability (validation failures invisible in metrics). Each one a real incident.
Keeping Interceptors Focused
The rule: one concern per interceptor. The moment you have an “auth-and-logging” interceptor, you’re coupling concerns that should evolve separately.
Concretely:
- Don’t: single “observability” interceptor that does tracing, metrics, and logging in one function.
- Do: three interceptors (
tracingInterceptor,metricsInterceptor,loggingInterceptor), chained.
Cost: three function-call overheads instead of one. Marginal.
Benefit: you can swap tracing backends without touching logging. You can disable metrics in tests without disabling tracing. Each interceptor is testable in isolation.
This is the same argument for Unix pipes over monolithic commands. Composition beats monoliths.
Common Interceptor Recipes
Real interceptors I’ve written variants of many times:
Tracing (OpenTelemetry)
Use the otelgrpc integration from go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc. Don’t write your own — the ecosystem is mature. Current idiomatic setup uses a StatsHandler, which hooks deeper than the interceptor chain and captures stream events correctly:
import "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
s := grpc.NewServer(
grpc.StatsHandler(otelgrpc.NewServerHandler()),
grpc.ChainUnaryInterceptor( /* your app interceptors */ ),
)
Older codebases still use otelgrpc.UnaryServerInterceptor() and otelgrpc.StreamServerInterceptor() — those are deprecated but still work. Migrate when convenient; don’t rewrite in a panic.
Metrics
Prometheus histogram of request duration per method:
var (
reqDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "grpc_server_request_duration_seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "code"},
)
)
func metricsInterceptor(ctx context.Context, req interface{},
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
start := time.Now()
resp, err := handler(ctx, req)
code := status.Code(err).String()
reqDuration.WithLabelValues(info.FullMethod, code).Observe(time.Since(start).Seconds())
return resp, err
}
Note: cardinality of method is bounded (you know your service’s methods). Cardinality of code is bounded (gRPC codes are a fixed enum). Don’t add user-id or request-id as labels — that’s cardinality-explosion territory.
Auth
Extract bearer token from metadata, verify, inject user context:
func authInterceptor(ctx context.Context, req interface{},
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
md, ok := metadata.FromIncomingContext(ctx)
if !ok {
return nil, status.Error(codes.Unauthenticated, "no metadata")
}
tokens := md.Get("authorization")
if len(tokens) == 0 {
return nil, status.Error(codes.Unauthenticated, "no auth token")
}
claims, err := verifyToken(tokens[0])
if err != nil {
return nil, status.Error(codes.Unauthenticated, "invalid token")
}
// Skip certain public methods
if isPublic(info.FullMethod) {
return handler(ctx, req)
}
ctx = context.WithValue(ctx, userCtxKey{}, claims)
return handler(ctx, req)
}
Key detail: add the user context here, near the boundary. Service code reads it from context. You don’t pass claims as argument through every service method.
Rate limiting
Token bucket per caller or per method:
func rateLimitInterceptor(limiter *rate.Limiter) grpc.UnaryServerInterceptor {
return func(ctx context.Context, req interface{},
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
if !limiter.Allow() {
return nil, status.Error(codes.ResourceExhausted, "rate limited")
}
return handler(ctx, req)
}
}
Production rate limiting is fancier — per-tenant, distributed state in Redis, burst capacity — but the shape is the same. Reject with ResourceExhausted before doing work.
Retry (client-side)
Client interceptor that retries on transient errors:
func retryClientInterceptor(attempts int) grpc.UnaryClientInterceptor {
return func(ctx context.Context, method string, req, reply interface{},
cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
var err error
for i := 0; i < attempts; i++ {
err = invoker(ctx, method, req, reply, cc, opts...)
if err == nil {
return nil
}
if !isRetryable(err) {
return err
}
backoff := time.Duration(1<<uint(i)) * 100 * time.Millisecond
select {
case <-time.After(backoff):
case <-ctx.Done():
return ctx.Err()
}
}
return err
}
}
Retry is one of the most dangerous interceptors. Get it wrong (no idempotency keys, retry non-idempotent operations, retry storm during outage) and it causes more production incidents than it prevents. Pair with grpc-middleware/retry if you can; it’s battle-tested.
The Stream Interceptor Trap
Stream interceptors are harder. The interceptor signature gives you a grpc.ServerStream, which is a bidirectional channel. Logging becomes:
func loggingStreamInterceptor(srv interface{}, ss grpc.ServerStream,
info *grpc.StreamServerInfo, handler grpc.StreamHandler) error {
start := time.Now()
err := handler(srv, ss)
log.Printf("stream=%s duration=%s err=%v", info.FullMethod, time.Since(start), err)
return err
}
This only logs at stream-end, not per message. If you want per-message observability, you need to wrap the ServerStream itself:
type observedStream struct {
grpc.ServerStream
sent, recv int64
}
func (s *observedStream) SendMsg(m interface{}) error {
atomic.AddInt64(&s.sent, 1)
return s.ServerStream.SendMsg(m)
}
func (s *observedStream) RecvMsg(m interface{}) error {
err := s.ServerStream.RecvMsg(m)
if err == nil {
atomic.AddInt64(&s.recv, 1)
}
return err
}
Then pass the wrapper to the handler. This is the pattern for any stream interceptor that needs per-message visibility.
Common mistakes:
- Forgetting to propagate context to the wrapper. The wrapped stream’s
Context()should be the enriched context. - Per-message overhead blows up long streams. A message-level log line is fine at 100 msgs/sec. At 100K msgs/sec, it’s your dominant cost.
- State in the wrapper not thread-safe. Streams can be concurrent on the
SendandRecvsides. Protect counters.
Testing Interceptor Chains
Unit test each interceptor in isolation:
func TestAuthInterceptor_NoToken(t *testing.T) {
ctx := context.Background() // no metadata
info := &grpc.UnaryServerInfo{FullMethod: "/my.Service/Method"}
handler := func(ctx context.Context, req interface{}) (interface{}, error) {
t.Fatal("handler should not be called")
return nil, nil
}
_, err := authInterceptor(ctx, nil, info, handler)
require.Equal(t, codes.Unauthenticated, status.Code(err))
}
Integration-test the chain end-to-end using bufconn (in-memory connection):
func TestChain_Ordering(t *testing.T) {
lis := bufconn.Listen(1024 * 1024)
defer lis.Close()
s := grpc.NewServer(grpc.ChainUnaryInterceptor(observability, auth, business))
pb.RegisterMyServer(s, &realImpl{})
go s.Serve(lis)
defer s.Stop()
conn, _ := grpc.Dial("bufnet",
grpc.WithContextDialer(func(ctx context.Context, _ string) (net.Conn, error) {
return lis.DialContext(ctx)
}),
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
defer conn.Close()
client := pb.NewMyClient(conn)
resp, err := client.Method(ctx, req)
// assert on behavior end-to-end
}
Integration tests catch bugs that unit tests don’t: metadata propagation, interceptor ordering, context enrichment visible to the handler. Don’t skip them.
Patterns That Save Time
- Use
grpc-middleware/v2(github.com/grpc-ecosystem/go-grpc-middleware/v2) for chain helpers, recovery, and batteries-included interceptors. Don’t reinvent every wheel. - Keep error semantics consistent. Every interceptor should return
status.Error(code, msg)for failures. Don’t return raw Go errors — clients can’t parse them properly. - Skip-list for public methods. Auth and rate limit often need to skip health check and reflection endpoints. Keep the skip list in one place.
- Per-service vs global interceptors. Most interceptors are global (tracing, metrics, auth). A few might be per-service (e.g., a bespoke rate limiter for a specific hot endpoint). Compose accordingly.
- Panic recovery at the outermost layer. A panic in a handler shouldn’t kill the server. Use the
recoverymiddleware fromgrpc-middlewareor write your own, and put it first in the chain.
The Discipline That Makes This Work
Interceptors are the right tool for cross-cutting concerns — the things every RPC needs but the service code shouldn’t have to think about. The discipline is: one concern per interceptor, careful ordering, consistent error semantics, tested end-to-end.
The services I’ve seen do this well have clean business logic (because the cross-cutting stuff is outside it) and reliable operational behavior (because the interceptor chain is tested as a unit, not just piece-by-piece). The services that do it poorly have auth logic sprinkled through their handlers, tracing that randomly misses requests, and rate limiters that let certain code paths bypass.
Interceptor order is one of those details that looks tactical and turns out to be architectural. Get it right once; the service’s behavior improves every release.
Related
- Go Context in Distributed Systems: What Actually Works in Production — the context that flows through every interceptor.
- RPC vs NATS: It’s Not About Sync vs Async — It’s About Who Owns Completion — the shape of gRPC calls as one side of the bigger messaging picture.
- Observability and Cost Attribution: Why One Pipeline Isn’t Enough — why tracing interceptors alone aren’t enough for business attribution.
Comments
This space is waiting for your voice.
Comments will be supported shortly. Stay connected for updates!
This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.
Have questions? Reach out through:
Want to see your comment featured? Mention us on X or tag us on Reddit.