Which is fastest — NATS, Kafka, or MQTT?

Depends entirely on workload. NATS core (no persistence) has the lowest publish latency and highest request/reply throughput. Kafka has the highest sustained ingest for append-only workloads. MQTT is tuned for bandwidth-constrained devices, not raw speed. Asking which is fastest is like asking whether a truck, a sedan, or a motorbike is fastest.

Can Kafka do request/reply like NATS?

Technically yes — two topics, a correlation ID — but it fights the protocol. Kafka is optimized for partitioned, ordered, replayable logs. Request/reply on top of that adds latency and operational complexity. Use NATS when you want request/reply semantics.

Does MQTT scale to backend use cases?

Broker-to-broker, yes (EMQX, VerneMQ can run millions of connections). But MQTT's design assumes unreliable, bandwidth-constrained clients and lots of them. If you don't have those clients, you're paying for design trade-offs you don't need. Kafka or NATS is usually a better fit for inter-service messaging.

When do I actually need Kafka and not NATS JetStream?

When you need partition-level ordering with consumer groups that independently track offsets, long-retention replay (weeks to months), and tight integration with stream-processing frameworks like Flink or ksqlDB. For most 'I need at-least-once delivery and some replay' cases, NATS JetStream is simpler to operate.

What's the cheapest path out of this decision?

Don't. Picking the wrong message system is expensive to undo — every consumer codepath gets coupled to the client library, delivery semantics, and operational runbook. Spend a day genuinely matching your workload to the axes below before you commit.

NATS vs Kafka vs MQTT: Same Category, Very Different Jobs

All three are 'messaging systems.' None of them is interchangeable with the others. A practical breakdown of NATS, Kafka, and MQTT — by the actual design axes that determine which one breaks when you misuse it.

February 24, 2026

Harrison Guo

9 min read

System Design Backend Engineering

The number of times I’ve watched a team pick a message system based on “Company X uses it” is depressing. Right behind it: the team that picks the one they already know, regardless of whether it fits the workload. NATS, Kafka, and MQTT get lumped together because they all pass messages between processes. That’s like lumping trucks, sedans, and motorbikes together because they all have wheels.

They are three different tools for three different shapes of problem. Once you know the axes that matter, the decision is usually easy.

tl;dr — NATS is the low-latency nervous system for request/reply, fan-out, and loosely-coupled services. Kafka is a partitioned, replayable log optimized for ingest, ordered processing, and stream analytics. MQTT is a wire-efficient broadcast protocol for large fleets of intermittently-connected devices. The wrong one looks “slow” or “complicated” not because it is bad, but because it’s optimizing for something you don’t need.

The Axes That Actually Matter

Before comparing features, pick the axes that will make or break your system:

Delivery guarantee: at-most-once, at-least-once, effectively-once (via dedup)
Ordering: no ordering, partition-level ordering, global ordering
Persistence / replay: ephemeral, durable with short retention, durable with long retention and replay
Throughput pattern: many small messages vs few large messages; sustained high throughput vs bursty
Client shape: services on fast reliable networks vs devices on flaky cellular links
Operational complexity tolerance: can you run a ZooKeeper/KRaft quorum? or do you want a single binary with zero ops?

Every tool makes a different bet on these. Let’s walk through.

NATS: the low-latency nervous system

NATS is a pub/sub bus with native request/reply, plus wildcards for subject hierarchies. Core NATS is fire-and-forget, at-most-once, no persistence. JetStream (built in since 2.2) adds durable streams, at-least-once delivery, and replay.

What NATS optimizes for:

Sub-millisecond publish latency in most topologies. The design is ruthlessly minimal — TCP connection per client, topic routing, done.
Request/reply as a first-class operation. nc.Request(subject, data, timeout) gives you RPC ergonomics on the message bus.
Subject hierarchies with wildcards. orders.*.created, orders.US.*, orders.> — easy to model domains.
Low operational overhead. One binary, built in Go, clustered with raft, no external dependencies.

What NATS is not optimized for:

Long retention. JetStream handles durable streams, but it’s not designed for months-long event logs the way Kafka is.
Partitioned ordered processing at scale. You can do it with JetStream work queues, but the ergonomics and tooling are behind Kafka’s consumer groups.
Stream processing frameworks. The Flink/ksqlDB/Spark ecosystem is Kafka’s home turf.

Pick NATS when the shape of your workload is “lots of services, mostly talking to each other in short exchanges, some broadcast, some work queues, and I want to stop running three different message systems.” It’s the default I reach for in modern backend stacks.

Kafka: the append-only log

Kafka is fundamentally a distributed commit log. Topics are partitioned. Each partition is an append-only ordered sequence of records. Consumers track their own offsets. Messages stick around for the configured retention (days, weeks, or forever).

What Kafka optimizes for:

Sustained high ingest. The append-only log plus zero-copy send makes Kafka handle hundreds of MB/sec per broker without breathing hard.
Partition-level ordering. Within a partition, order is guaranteed. This is how you get “all events for user X are processed in sequence” — just key by user ID.
Replay and reprocessing. Offset management means you can rewind a consumer to last Tuesday and replay everything. Critical for analytics, for rebuilding downstream state after a bug, for change-data-capture.
Stream processing integration. Flink, ksqlDB, Spark Streaming, Kafka Streams — the ecosystem assumes Kafka semantics.
Large event histories. Tiered storage (pushing older segments to S3) makes long retention cheap.

What Kafka is not optimized for:

Request/reply. The log model actively fights against it. You can hack it with correlation IDs and reply topics, but you’ll fight the framework.
Low operational overhead. ZooKeeper was always a pain; KRaft helps but running Kafka in production is still real work.
Low-latency small messages. A single publish round-trip is typically 5-10ms even on a hot path. That’s fine for most workloads but doesn’t compete with NATS on tight RPC loops.
Large fan-out to thin clients. Every consumer is assumed to be a persistent process tracking offsets. Not suitable for IoT devices that connect intermittently.

Pick Kafka when you have event histories that matter, ordered per-key processing at scale, stream-processing pipelines downstream, or CDC integration with your databases. Also when you already have it and a new workload can reasonably ride on the existing platform.

Don’t pick Kafka because “it’s what big companies use.” Big companies have Kafka teams. You probably don’t.

MQTT: the device protocol

MQTT is a lightweight pub/sub protocol designed in the late 1990s for SCADA over satellite links — constrained bandwidth, intermittent connectivity, thousands of devices per broker. It’s a wire protocol first, infrastructure second. Popular brokers include EMQX, HiveMQ, Mosquitto, VerneMQ.

What MQTT optimizes for:

Tiny wire overhead. A PUBLISH packet header can be as small as 2 bytes. Critical for cellular-cost-sensitive deployments.
Intermittent connections. Persistent sessions, QoS levels 0/1/2, last-will-and-testament. Designed to survive a device being offline for hours.
Massive broadcast fan-out. One publish to a subject with 100,000 subscribers is feasible on a modern broker.
Constrained clients. Low CPU, low memory, simple state machine — fits on a microcontroller.

What MQTT is not optimized for:

Inter-service messaging on reliable networks. You’re paying for reliability features (QoS 2, retained messages, sessions) that you don’t need between two services in the same VPC.
Long-term persistence and replay. The protocol has retained messages but nothing like Kafka’s log model.
Complex routing. Subject wildcards work (+ single-level, # multi-level) but the routing semantics are simpler than NATS subjects.

Pick MQTT when you have actual devices on the other end — sensors, meters, vehicles, consumer hardware. For anything server-to-server on a reliable network, MQTT is over-engineered on one axis (device resilience) and under-engineered on another (rich routing / replay).

A Decision Flow

When a team asks me which to use, the path I walk them through is usually some version of this:

flowchart TD
    Start([New messaging need]) --> Q1{Are your clients
actual devices?}
    Q1 -->|Yes · IoT, sensors,
cellular| MQTT["MQTT
device pub/sub
tiny wire overhead"]
    Q1 -->|No · services on
reliable networks| Q2{Do you need
replay of past events?}

    Q2 -->|Yes · long retention,
analytics, CDC| Q3{Partitioned ordering
required?}
    Q2 -->|No · pub/sub
or request/reply| NATS["NATS
low-latency service bus
JetStream if durable"]

    Q3 -->|Yes · per-key ordering
at high volume| Kafka["Kafka
partitioned commit log
days to months retention"]
    Q3 -->|No, but I want
durable streams| NATSJet["NATS JetStream
simpler ops
shorter retention than Kafka"]

    classDef mqtt fill:#fef5e7,stroke:#b7791f
    classDef nats fill:#e8f4f8,stroke:#2c5282
    classDef kafka fill:#f0fff4,stroke:#2f855a
    class MQTT mqtt
    class NATS,NATSJet nats
    class Kafka kafka

The Matrix

If you want the one-page decision:

	NATS	Kafka	MQTT
Primary use	Service-to-service bus	Event log, stream processing	Device pub/sub
Delivery default	At-most-once (JetStream: at-least-once)	At-least-once	Configurable (QoS 0/1/2)
Ordering	Not guaranteed (JetStream: per stream)	Per partition	Per subject per client
Persistence	None in core; durable with JetStream	Built-in; long retention	Retained messages only
Replay	JetStream only, with some friction	First-class	No
Latency	Sub-ms	5-10ms	Device-bound
Throughput per node	10s of millions msg/s	100s of MB/s	Highly variable
Ops complexity	Low	High	Medium
Request/reply	First-class	Awkward	Not really
Client assumption	Reliable services	Reliable consumers	Intermittent devices
Good default for	Microservice mesh	Event sourcing, analytics	IoT fleets

Primary use

NATS Service-to-service bus

Kafka Event log, stream processing

MQTT Device pub/sub

Delivery default

NATS At-most-once (JetStream: at-least-once)

Kafka At-least-once

MQTT Configurable (QoS 0/1/2)

Ordering

NATS Not guaranteed (JetStream: per stream)

Kafka Per partition

MQTT Per subject per client

Persistence

NATS None in core; durable with JetStream

Kafka Built-in; long retention

MQTT Retained messages only

Replay

NATS JetStream only, with some friction

Kafka First-class

MQTT No

Latency

NATS Sub-ms

Kafka 5-10ms

MQTT Device-bound

Throughput per node

NATS 10s of millions msg/s

Kafka 100s of MB/s

MQTT Highly variable

Ops complexity

NATS Low

Kafka High

MQTT Medium

Request/reply

NATS First-class

Kafka Awkward

MQTT Not really

Client assumption

NATS Reliable services

Kafka Reliable consumers

MQTT Intermittent devices

Good default for

NATS Microservice mesh

Kafka Event sourcing, analytics

MQTT IoT fleets

The Real-World Patterns

A few shapes I’ve seen work well, and the corresponding mismatch patterns that caused pain.

Works: Internal service mesh on NATS, CDC on Kafka, IoT on MQTT

A reasonable large-company pattern is all three, each doing what it’s good at:

NATS (or NATS JetStream) for inter-service request/reply, pub/sub, work queues.
Kafka for the event log: database CDC, audit events, analytics pipelines, anything that feeds Flink or the data warehouse.
MQTT for actual devices in the field.

Bridging happens at defined boundaries: an MQTT-to-Kafka connector for device telemetry you want replayable. A NATS-to-Kafka shipper for events that need long retention. Services don’t cross the boundaries directly; platform infra does.

Mismatch: “let’s replace our RPC with Kafka”

I’ve seen this at least four times. Someone reads an event-driven-architecture book, decides RPC is old-fashioned, publishes every inter-service call through Kafka topics. What happens:

Latency goes from ~5ms to ~30-50ms round-trip because Kafka’s commit-log design isn’t tuned for low-latency reply.
Debugging gets painful — a request that used to be one span in Jaeger is now half a dozen topics and offsets.
Backpressure disappears — consumers can fall arbitrarily behind, and the publisher has no idea.

Every time, the fix was “put RPC back in front for the actual synchronous call paths, keep Kafka for the async event flow.” The event log is a great thing to have. It is not a substitute for RPC.

Mismatch: “let’s standardize on MQTT for everything”

Organizations with a strong IoT background sometimes try this. MQTT is what they know. So they run inter-service communication on it too.

Problems:

Subject matching is less expressive than NATS’s hierarchical patterns — complex routing becomes awkward.
No persistence/replay means any design requiring “rebuild downstream state” is blocked.
Broker clusters are tuned for device fan-out, not low-latency service-to-service, so tail latencies are higher than they need to be.

The advice I give: if you’re not sending to devices, don’t use a device protocol.

Mismatch: “we picked NATS, now we need replay”

Teams that picked NATS core for its simplicity sometimes discover six months in that they need event replay — maybe for a bug-induced reprocessing, maybe for a new downstream that needs historical data. Two fixes:

Migrate to JetStream. Usually the right answer — it’s the same product with durable streams. The upgrade is mostly configuration.
Add Kafka alongside for the replay use case. More operational overhead, but gives you the full Kafka tooling ecosystem.

Neither is terrible. The real lesson is to check the replay question at the design-review stage.

How to Make the Decision

“Which message system should I use” is not a tech question; it’s a workload-fit question. Answer these first:

Who talks to whom, and how long do those conversations last?
What’s the delivery semantics I actually need — at-most-once, at-least-once, effectively-once?
Do I need to replay history to rebuild downstream state?
What’s the shape of my clients — reliable services, flaky devices, or mixed?
What’s my operational appetite — do I want one binary, or can I run a real platform team?

The three tools map onto the answers cleanly. The trouble starts when you skip the questions.

RPC vs NATS: It’s Not About Sync vs Async — It’s About Who Owns Completion — the prior question: do you even want messaging, or RPC?
Why Your “Fail-Fast” Strategy is Killing Your Distributed System — what happens when any of these fails.

🎧 More Ways to Consume This Content

HarrisonSecurityLab Podcast

I occasionally advise small teams on backend reliability, Go performance, and production AI systems. Learn more: /services

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.

NATS vs Kafka vs MQTT: Same Category, Very Different Jobs

All three are 'messaging systems.' None of them is interchangeable with the others. A practical breakdown of NATS, Kafka, and MQTT — by the actual design axes that determine which one breaks when you misuse it.

Table of Contents

The Axes That Actually Matter

NATS: the low-latency nervous system

Kafka: the append-only log

MQTT: the device protocol

A Decision Flow

The Matrix

The Real-World Patterns

Works: Internal service mesh on NATS, CDC on Kafka, IoT on MQTT

Mismatch: “let’s replace our RPC with Kafka”

Mismatch: “let’s standardize on MQTT for everything”

Mismatch: “we picked NATS, now we need replay”

How to Make the Decision

🎧 More Ways to Consume This Content

Comments

Leave a Comment

NATS vs Kafka vs MQTT: Same Category, Very Different Jobs

All three are 'messaging systems.' None of them is interchangeable with the others. A practical breakdown of NATS, Kafka, and MQTT — by the actual design axes that determine which one breaks when you misuse it.

Table of Contents

The Axes That Actually Matter

NATS: the low-latency nervous system

Kafka: the append-only log

MQTT: the device protocol

A Decision Flow

The Matrix

The Real-World Patterns

Works: Internal service mesh on NATS, CDC on Kafka, IoT on MQTT

Mismatch: “let’s replace our RPC with Kafka”

Mismatch: “let’s standardize on MQTT for everything”

Mismatch: “we picked NATS, now we need replay”

How to Make the Decision

Related

🎧 More Ways to Consume This Content

[ Agent_Architecture_Notes ]

Related Articles

Observability and Cost Attribution: Why One Pipeline Isn't Enough

IronSys: A Production Blueprint for Modern Concurrency

Scale-Up vs Scale-Out: Why Every Language Wins Somewhere

Comments

Leave a Comment

[ Connect_With_Me ]