This article will introduce you to the four plane deployment modes of the Istio service mesh, analyzing their strengths and weaknesses to offer suggestions based on their performance, reliability, and security.
A service mesh is an infrastructure layer that typically employs application proxies to facilitate various functionalities. Taking Istio as an example, it enables users to programmatically manage traffic aware of applications, ensure robust observability, and maintain solid security within the network. Istio ensures resilience in cloud-native and distributed systems, enabling modern enterprises to maintain their workloads across various platforms while ensuring connectivity and protection. Its capabilities include zero-trust security, policy management, access control for security and governance, as well as canary deployments, A/B testing, load balancing, and fault recovery for network functions. It also provides observability across all network traffic. Istio is unrestricted by any single cluster, network, or runtime and can incorporate services running on Kubernetes or virtual machines within a single mesh, whether across multi-cloud, hybrid, or on-premises environments. Its design is scalable and supported by an extensive ecosystem.
The architecture of a service mesh is divided into the control plane and the data plane. In the case of Istio, istiod
acts as the control plane, while the data plane offers two deployment modes: sidecar or ambient.
In fact, there are more than these two modes for deploying the service mesh data plane. Including Istio’s proxyless gRPC service mesh and the Cilium service mesh, there are a total of four deployment modes.
The following table compares the service mesh data plane deployment modes across several dimensions.
Data plane modes | Platform security Threat assessment, risk | Resource Efficiency – infra/resource consumption, etc. | Manageability – upgrades, vulnerabilities, etc. | Performance – Latency, etc. |
---|---|---|---|---|
Sidecar mode: L4 and L7 Proxy per Service Instance | High security, as each service instance has an independent proxy, reducing the attack surface. Risk management depends on control plane configuration. | Higher resource consumption, as each instance requires an independent proxy. | Centralized management and configuration required, upgrades are relatively complex, but can be simplified through the control plane. | May increase latency as requests need to be forwarded through the proxy. |
Ambient mode: Shared L4 – L7 per Service Model | Designed for security with ztunnel for local routing. However, shared proxies can introduce risks, and its overall security maturity is still evolving. | Higher efficiency as multiple services share the same proxy. | Relatively simple management, but may face vulnerabilities due to the shared proxy. | Good performance with local routing, but may incur cross-AZ costs with waypoint proxies. |
Cilium mesh mode: Shared L4 and L7 Model | Moderate security with a focus on eBPF and fine-grained access control. However, there are known issues with identity and trust models. | Efficiency due to kernel-level processing, reducing infrastructure expenses. | Management is more complex, needing to handle configurations for multiple services. | Variable performance; certain scenarios might introduce significant latency. |
gRPC mode: L4 and L7 Part of the Application Model | While gRPC integrates proxy functions within the application, theoretically reducing the attack surface, the application’s complexity and variability can actually expand it. The security of the gRPC mode depends on specific use cases and needs careful evaluation of potential threats and attack surfaces. | Higher efficiency because the proxy is implemented inline in the same process as the app. | Complex management, regular updates and maintenance of application layer proxy required. | Superior performance with low latency, suitable for real-time applications. |
You can see a more visual comparison of these four modes in terms of cost and security from the diagram below:
These four deployment modes are differentiated based on how proxies are associated with service instances.
The following diagram illustrates potential locations for proxies in different deployment modes of the service mesh data plane.
The diagram below shows the communication paths in sidecar mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.
This is the most common deployment mode for service meshes and was the initial mode supported by Istio. Each service instance is accompanied by a proxy (such as Envoy), which handles all inbound and outbound network communications, including L4 and L7 layers.
The diagram below illustrates the communication paths in ambient mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.
In this mode, a shared L4 proxy on each node serves all service instances on the same physical host, while each service account has a dedicated L7 proxy.
The diagram below displays the communication paths in Cilium mesh mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.
This mode is a middle ground between fully independent and fully shared setups, with each node having a shared L7 proxy. However, there are known issues with identities and trust models. The Cilium service mesh, which uses eBPF, allows for network policies without a proxy through kernel programs.
Note: This model is not the data plane of Istio.
In the gRPC mode, no external proxies are deployed; instead, proxy functions are directly integrated into the application using the RPC framework, leading to significant intrusion into the application. The service mesh control plane uses a set of discovery APIs known as xDS APIs to dynamically configure the application. The gRPC client libraries within the application provide extensive support for the xDS APIs. With this capability, the service mesh control plane can program L4 and L7 proxy functions directly within this library inside the service container.
The diagram below illustrates how, in Istio’s gRPC mode, the control plane communicates with the application.
In this mode, when a gRPC service communicates with the control plane, a traditional Sidecar proxy is not needed; instead, a specific agent is used for initialization and communication with the control plane. This design reduces resource consumption and deployment complexity while still enabling functions such as service discovery and traffic management.
As previously introduced, several factors influence the choice of a service mesh data plane deployment mode:
When considering the deployment modes of the service mesh data plane, maturity is a key factor. The maturity level of each mode affects its reliability and support in production environments:
If your business has high security requirements, such as in the financial or healthcare sectors, then the Sidecar Mode might be the best choice. This mode provides strong security by ensuring each service instance has its own independent proxy, thus maximizing service isolation. For those exploring newer models like Ambient Mode, it’s essential to understand that while ztunnel aims for secure local routing, the model’s overall security strategy is still evolving.
In resource-constrained environments, deploying a separate proxy for each service instance may not be practical. In such cases, consider the gRPC Mode or Ambient Mode. gRPC Mode is particularly suitable for organizations that already use gRPC extensively and are willing to handle complex networking functions internally within the application. The Ambient Mode, on the other hand, uses a shared proxy to reduce resource consumption.
For applications requiring high performance and low latency, the gRPC Mode provides optimal performance because it eliminates the additional network hops introduced by traditional proxies. However, it’s important to note that the gRPC Mode is still experimental and may not support all features of Istio. Consider your service mesh functionality needs accordingly.
Each data plane mode has distinct characteristics affecting network overhead. Sidecar mode, with locality-aware routing, reduces cross-zone traffic but adds network hops, increasing latency and compute use. Ambient mode uses ztunnels for local routing but may incur cross-AZ costs with waypoint proxies. Cilium mode places proxies on the same node as applications, potentially reducing inter-node traffic but could introduce more latency. gRPC mode integrates RPC framework into the application, minimizing network hops and overhead, ideal for high-performance, low-latency needs.
Management complexity is also a significant consideration when choosing a service mesh data plane mode. Sidecar Mode and gRPC Mode may require more complex configurations and maintenance, while the Ambient Mode might offer a more streamlined management experience in some deployment environments. Cilium Mode could require complex management due to its reliance on eBPF and multiple configuration points.
Choosing the right service mesh data plane deployment mode depends on specific factors including maturity, security, resource constraints, performance, and management complexity. Here’s a quick guide:
The best choice will align with your application requirements, security policies, and technical familiarity. It’s essential to understand each mode’s strengths and limitations to make an informed decision that balances benefits, risks, and costs.
This blog was initially published at tetrate.io.
Last updated on Oct 8, 2024