eBPF on Linux — kprobe vs fentry: Hooking Internals & What Production Observability Misses
Field notes comparing kprobe and fentry as eBPF attachment mechanisms — how each loads, where each costs you, and why the choice matters for any production AI infrastructure that wants real syscall-level observability without paying for the wrong overhead.
Status: Companion blog draft for the existing video. Long-form transcript + bridge framing TBD.
Companion assets
- Original video:
eBPF on Linux: kprobe vs 'fentry' (tracing) — Hooking Internals & Assembly Analysis
- GitHub: harrison001/SentinelEdge — production-grade eBPF kernel security project (13,600+ LOC, 3,200+ eBPF programs)
TL;DR
kprobe and fentry both let you hook kernel functions from eBPF, but they install differently and cost differently. kprobe patches the function with a software breakpoint (int3 / equivalent); fentry uses the kernel’s __fentry__ hook that exists for ftrace and direct-call infrastructure. The difference matters when you’re hooking something that fires millions of times per second — and in production AI workloads, that’s most of the syscall path.
The setup
- Linux kernel ≥ 5.5 (for
fentryBPF program type support) - bpftool / libbpf-based program
- Two BPF programs attached to the same kernel function — one as
kprobe, one asfentry - Disassembly side-by-side
Debug command transcript
# TODO: paste the actual bpftool / objdump sequence from the video
# bpftool prog show
# bpftool prog dump xlated id <id>
# objdump -d /sys/kernel/btf/...
# perf stat -e cycles,instructions ./load_kprobe
# perf stat -e cycles,instructions ./load_fentry
What broke (or rather, what’s hidden)
Standard eBPF tutorials show you how to attach a probe and read the data. They almost never show:
- How the probe actually inserts — kprobe rewrites instruction bytes; fentry hooks the pre-allocated
__fentry__slot - What the per-call cost is at the instruction level
- Why some hooks attach instantly while others fail or require BTF for type-correct argument decoding
What fixed it
TODO: write up the side-by-side disassembly + perf data showing kprobe overhead (~50-100ns) vs fentry overhead (~10-20ns) on hot paths, plus the BTF requirement for fentry’s argument typing.
What this teaches backend / AI infra engineers
eBPF is becoming the default “production observability without code change” tool. For AI infrastructure specifically, it’s the only way to see:
- Real syscall latency during LLM inference (network, file I/O, GPU driver entry points)
- Per-token streaming overhead that user-space tracing doesn’t catch
- Container boundary cost for multi-tenant inference serving
The kprobe-vs-fentry choice is not academic. If you hook recvmsg() on a 100K-RPS service with kprobe instead of fentry, you’ve just added 5–10% overhead to every request. With fentry, sub-1%. Same observability, very different production cost.
The bigger lesson: observability isn’t free. Every tracer adds cost. Knowing which hook mechanism costs what is the difference between “we ran with eBPF in prod” and “we tried eBPF in prod and rolled back because tail latency exploded.”
Related work
- Video: SentinelEdge eBPF project overview
- GitHub: SentinelEdge — production patterns at scale
🎧 More Ways to Consume This Content
I occasionally advise small teams on backend reliability, Go performance, and production AI systems. Learn more: /services
Comments
This space is waiting for your voice.
Comments will be supported shortly. Stay connected for updates!
This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.
Have questions? Reach out through:
Want to see your comment featured? Mention us on X or tag us on Reddit.
Leave a Comment