eBPF on Linux — kprobe vs fentry: Hooking Internals & What Production Observability Misses

Field notes comparing kprobe and fentry as eBPF attachment mechanisms — how each loads, where each costs you, and why the choice matters for any production AI infrastructure that wants real syscall-level observability without paying for the wrong overhead.

May 12, 2026

Harrison Guo

3 min read

Kernel Debug Field Notes Observability

Status: Companion blog draft for the existing video. Long-form transcript + bridge framing TBD.

Companion assets

Original video:

eBPF on Linux: kprobe vs 'fentry' (tracing) — Hooking Internals & Assembly Analysis
GitHub: harrison001/SentinelEdge — production-grade eBPF kernel security project (13,600+ LOC, 3,200+ eBPF programs)

TL;DR

kprobe and fentry both let you hook kernel functions from eBPF, but they install differently and cost differently. kprobe patches the function with a software breakpoint (int3 / equivalent); fentry uses the kernel’s __fentry__ hook that exists for ftrace and direct-call infrastructure. The difference matters when you’re hooking something that fires millions of times per second — and in production AI workloads, that’s most of the syscall path.

The setup

Linux kernel ≥ 5.5 (for fentry BPF program type support)
bpftool / libbpf-based program
Two BPF programs attached to the same kernel function — one as kprobe, one as fentry
Disassembly side-by-side

Debug command transcript

# TODO: paste the actual bpftool / objdump sequence from the video
# bpftool prog show
# bpftool prog dump xlated id <id>
# objdump -d /sys/kernel/btf/...
# perf stat -e cycles,instructions ./load_kprobe
# perf stat -e cycles,instructions ./load_fentry

What broke (or rather, what’s hidden)

Standard eBPF tutorials show you how to attach a probe and read the data. They almost never show:

How the probe actually inserts — kprobe rewrites instruction bytes; fentry hooks the pre-allocated __fentry__ slot
What the per-call cost is at the instruction level
Why some hooks attach instantly while others fail or require BTF for type-correct argument decoding

What fixed it

TODO: write up the side-by-side disassembly + perf data showing kprobe overhead (~50-100ns) vs fentry overhead (~10-20ns) on hot paths, plus the BTF requirement for fentry’s argument typing.

What this teaches backend / AI infra engineers

eBPF is becoming the default “production observability without code change” tool. For AI infrastructure specifically, it’s the only way to see:

Real syscall latency during LLM inference (network, file I/O, GPU driver entry points)
Per-token streaming overhead that user-space tracing doesn’t catch
Container boundary cost for multi-tenant inference serving

The kprobe-vs-fentry choice is not academic. If you hook recvmsg() on a 100K-RPS service with kprobe instead of fentry, you’ve just added 5–10% overhead to every request. With fentry, sub-1%. Same observability, very different production cost.

The bigger lesson: observability isn’t free. Every tracer adds cost. Knowing which hook mechanism costs what is the difference between “we ran with eBPF in prod” and “we tried eBPF in prod and rolled back because tail latency exploded.”

Video: SentinelEdge eBPF project overview
GitHub: SentinelEdge — production patterns at scale

Tags: ebpf kernel linux observability kprobe fentry tracing bpf

🎧 More Ways to Consume This Content

HarrisonSecurityLab Podcast

I occasionally advise small teams on backend reliability, Go performance, and production AI systems. Learn more: /services

Comments

This space is waiting for your voice.

Comments will be supported shortly. Stay connected for updates!

Preview of future curated comments

This section will display user comments from various platforms like X, Reddit, YouTube, and more. Comments will be curated for quality and relevance.

eBPF on Linux — kprobe vs fentry: Hooking Internals & What Production Observability Misses

Field notes comparing kprobe and fentry as eBPF attachment mechanisms — how each loads, where each costs you, and why the choice matters for any production AI infrastructure that wants real syscall-level observability without paying for the wrong overhead.

Companion assets

eBPF on Linux: kprobe vs 'fentry' (tracing) — Hooking Internals & Assembly Analysis

TL;DR

The setup

Debug command transcript

What broke (or rather, what’s hidden)

What fixed it

What this teaches backend / AI infra engineers

Related work

🎧 More Ways to Consume This Content

[ Get_One_Essay_A_Week ]

Related Articles

Building & Debugging a Custom ARM64 Linux Kernel — Yocto, QEMU, GDB

Debugging the Ubuntu 6.8 x86_64 Kernel with GDB & QEMU — Without Rebuilding

Observability and Cost Attribution: Why One Pipeline Isn't Enough

Comments

Leave a Comment

[ Connect_With_Me ]