Recently, eBPF technology has been continuously heating up in the cloud-native community. After I translated “What is eBPF”, when reading “eBPF in Cloud-Native Environments,” I’ve been thinking about what position eBPF actually holds in cloud-native environments and what role it plays. At the time, I commented that “eBPF opens up a god’s eye view, able to see all activity on the host, while sidecars can only observe activity within pods. As long as we get process isolation right, eBPF-based proxy-per-node is the best choice.” After seeing William Morgan’s this article [^1], it suddenly dawned on me. Below is an excerpt translation of the article’s points I agree with - that eBPF cannot replace service mesh and sidecars. Interested readers can read William’s original article.
What is eBPF
In the past, if you wanted an application to process network packets, it was impossible. Because applications run in Linux user space, they cannot directly access the host’s network buffer. Buffers are managed by the kernel and protected by the kernel. The kernel needs to ensure process isolation, and processes cannot directly read each other’s network packets. The correct approach is for applications to request network packet information through system calls (syscalls), which is essentially a kernel API call - the application calls a syscall, and the kernel checks whether the application has permission to obtain the requested packet; if so, it returns the packet.
With eBPF, applications no longer need syscalls, and packets don’t need to pass back and forth between kernel space and user space. Instead, we give code directly to the kernel and let the kernel execute it itself, allowing the code to run at full speed with higher efficiency. eBPF allows applications and the kernel to share memory in a secure way. eBPF allows applications to submit code directly to the kernel, with the goal of achieving performance improvement by surpassing system calls.
eBPF is not a silver bullet. You cannot run arbitrary programs with eBPF. In fact, what eBPF can do is very limited.
Limitations of eBPF
The limitations of eBPF also stem from the kernel. Applications running in the kernel should have their own tenants, and these tenants compete for the system’s memory, disk, and network. The kernel’s responsibility is to isolate and schedule these applications’ resources while also protecting and confirming application permissions, protecting them from being compromised by other programs.
Because we directly give eBPF code to the kernel for execution, this bypasses kernel security protections (like syscalls), and the kernel faces direct security risks. To protect the kernel, all eBPF programs must pass through a verifier before they can run. But automatically verifying programs is very difficult. The verifier may overly restrict program functionality. For example, eBPF programs cannot be blocking, cannot have infinite loops, and cannot exceed a predetermined size. Their complexity is also limited. The verifier evaluates all possible execution paths, and if the eBPF program cannot complete within certain ranges, or cannot prove that every loop has an exit condition, the verifier will not allow the program to run. Many applications violate these restrictions. To run them as eBPF programs, they would either need to be rewritten to meet the verifier’s requirements, or the kernel would need to be patched to bypass some verification (which may be quite difficult). However, as kernel versions upgrade, these verifiers are becoming smarter, restrictions are gradually loosening, and there are also creative ways to bypass these restrictions.
But overall, what eBPF programs can do is very limited. For some heavyweight event handling, such as handling HTTP/2 traffic globally, or TLS handshake negotiation, cannot be completed in a pure eBPF environment. At best, eBPF can do a small part of the work and then call user-space applications to handle parts that are too complex for eBPF to process.
The Relationship Between eBPF and Service Mesh
Due to the various limitations of eBPF described above, Layer 7 traffic still requires user-space network proxies to complete. eBPF cannot replace service mesh. eBPF can run together with CNI (Container Network Interface) to handle Layer 3/Layer 4 traffic, while service mesh handles Layer 7 traffic.
Per-Host Proxy Mode is Worse Than Sidecar
For the per-host proxy mode, Linkerd 1.x, an early service mesh practitioner, used this approach. I also started following service mesh at that time, and Linkerd 1.x even used the JVM virtual machine! However, user practice with Linkerd 1.x proved that this mode is worse for operations and security compared to the sidecar mode.
Why is the sidecar mode better than the per-host mode? Because the sidecar mode has several advantages that the per-host mode doesn’t have:
- The proxy’s resource consumption varies with the application’s load. As instance traffic increases, the sidecar consumes more resources, just like the application. If the application’s traffic is very small, the sidecar doesn’t need to consume many resources. Kubernetes’s existing mechanisms for managing resource consumption, such as resource requests and limits and OOM kill, continue to work.
- The proxy failure’s blast radius is limited to one pod. Proxy failure is the same as application failure, with Kubernetes responsible for handling failed pods.
- Proxy maintenance. For example, proxy version upgrades are completed through the same mechanisms as the application itself, such as rolling updates and canary releases.
- Security boundaries are clear (and small): at the pod level. The sidecar runs in the same security context as the application instance. It’s part of the pod and has the same IP address as the application. The sidecar enforces policies and applies mTLS to traffic entering and leaving that pod, and it only needs that pod’s keys.
For the per-host mode, none of these benefits exist. The proxy is completely decoupled from the application pod, handling traffic for all pods on the host. This introduces various problems:
- The proxy’s resource consumption is highly variable, depending on how many pods Kubernetes has scheduled on that host at a given time. You can’t effectively predict a specific proxy’s resource consumption, so there’s a risk of the proxy crashing (this is what the original text says, though I still have doubts about this point and hope some readers can help explain).
- Traffic contention between pods on the host. Because all traffic on the host goes through the same proxy, if one application pod has extremely high traffic and consumes all the proxy’s resources, other applications on the host are at risk of starvation.
- The proxy’s blast radius is large and constantly changing. Proxy failures and upgrades now affect a random subset of pods in a random set of applications, meaning any failure or maintenance task has unpredictable risks.
- Makes security issues more complex. Taking TLS as an example, the proxy on the host must contain keys for all applications on that host, making it a new attack vector vulnerable to confused deputy issues - any CVE or vulnerability in the proxy is a potential key leak risk.
In short, the sidecar mode continues to implement container-level isolation protections - the kernel can execute all security protections and fair multi-tenant scheduling at the container level. Container isolation still works perfectly, while the per-host mode breaks all this, re-introducing contention-based multi-tenant isolation issues.
Of course, per-host isn’t without merits. The mode’s biggest advantage is that it can reduce the number of proxies by orders of magnitude and reduce network hops, which also reduces resource consumption and network latency. But compared to the operations and security issues this mode brings, these advantages are secondary. We can also make up for the sidecar mode’s shortcomings in this area through continuous optimization of sidecars, while the per-host mode’s flaws are fatal.
Actually, when you get down to it, it returns to the contention-based multi-tenant issue. So can we use existing kernel solutions to improve the proxy in the per-host mode to make it support multi-tenant? For example, refactoring the Envoy proxy to support multi-tenant mode. Although this is theoretically feasible, the workload is huge, and Matt Klein also doesn’t think it’s worth doing [^2], it’s better to use containers to implement tenant isolation. And even if the proxy in per-host mode supports multi-tenancy, there are still blast radius and security issues to resolve.
Summary
Whether or not eBPF exists, in the foreseeable future, service mesh will be based on sidecar proxies running in user space (except for proxyless mode). Although the sidecar mode also has drawbacks, it remains the optimal solution that can maintain container isolation and operations advantages while handling cloud-native networking complexity. Whether eBPF’s capabilities will develop in the future to handle Layer 7 network traffic and thus replace service mesh and sidecars - perhaps, but that day may be far off.
