Service Mesh Data Plane Deployment Modes Explanation

This article introduces the four plane deployment modes of service meshes, analyzing their advantages and disadvantages, and provides recommendations based on their performance, reliability, and security.

Copyright
This is an original article by Jimmy Song. You may repost it, but please credit this source: https://jimmysong.io/en/blog/service-mesh-data-plane-deployment-modes/
Click to show the outline

This article will introduce you to the four plane deployment modes of the Istio service mesh, analyzing their strengths and weaknesses to offer suggestions based on their performance, reliability, and security.

What is a Service Mesh?

A service mesh is an infrastructure layer that typically employs application proxies to facilitate various functionalities. Taking Istio as an example, it enables users to programmatically manage traffic aware of applications, ensure robust observability, and maintain solid security within the network. Istio ensures resilience in cloud-native and distributed systems, enabling modern enterprises to maintain their workloads across various platforms while ensuring connectivity and protection. Its capabilities include zero-trust security, policy management, access control for security and governance, as well as canary deployments, A/B testing, load balancing, and fault recovery for network functions. It also provides observability across all network traffic. Istio is unrestricted by any single cluster, network, or runtime and can incorporate services running on Kubernetes or virtual machines within a single mesh, whether across multi-cloud, hybrid, or on-premises environments. Its design is scalable and supported by an extensive ecosystem.

The architecture of a service mesh is divided into the control plane and the data plane. In the case of Istio, istiod acts as the control plane, while the data plane offers two deployment modes: sidecar or ambient.

image
Architecture of the Istio Service Mesh (Source: istio.io)

In fact, there are more than these two modes for deploying the service mesh data plane. Including Istio’s proxyless gRPC service mesh and the Cilium service mesh, there are a total of four deployment modes.

Data Plane Deployment Modes

The following table compares the service mesh data plane deployment modes across several dimensions.

Data plane modes Platform security Threat assessment, risk Resource Efficiency – infra/resource consumption, etc. Manageability – upgrades, vulnerabilities, etc. Performance – Latency, etc.
Sidecar mode: L4 and L7 Proxy per Service Instance High security, as each service instance has an independent proxy, reducing the attack surface. Risk management depends on control plane configuration. Higher resource consumption, as each instance requires an independent proxy. Centralized management and configuration required, upgrades are relatively complex, but can be simplified through the control plane. May increase latency as requests need to be forwarded through the proxy.
Ambient mode: Shared L4 – L7 per Service Model Designed for security with ztunnel for local routing. However, shared proxies can introduce risks, and its overall security maturity is still evolving. Higher efficiency as multiple services share the same proxy. Relatively simple management, but may face vulnerabilities due to the shared proxy. Good performance with local routing, but may incur cross-AZ costs with waypoint proxies.
Cilium mesh mode: Shared L4 and L7 Model Moderate security with a focus on eBPF and fine-grained access control. However, there are known issues with identity and trust models. Efficiency due to kernel-level processing, reducing infrastructure expenses. Management is more complex, needing to handle configurations for multiple services. Variable performance; certain scenarios might introduce significant latency.
gRPC mode: L4 and L7 Part of the Application Model While gRPC integrates proxy functions within the application, theoretically reducing the attack surface, the application’s complexity and variability can actually expand it. The security of the gRPC mode depends on specific use cases and needs careful evaluation of potential threats and attack surfaces. Higher efficiency because the proxy is implemented inline in the same process as the app. Complex management, regular updates and maintenance of application layer proxy required. Superior performance with low latency, suitable for real-time applications.
Comparison of Four Service Mesh Deployment Modes

You can see a more visual comparison of these four modes in terms of cost and security from the diagram below:

image
Comparison of Service Mesh Deployment Modes

These four deployment modes are differentiated based on how proxies are associated with service instances.

The following diagram illustrates potential locations for proxies in different deployment modes of the service mesh data plane.

image
Potential Locations of Proxies in the Data Plane
  • Sidecar Mode: The proxy is in the same Pod as the application container.
  • Ambient Mode: The L4 proxy is on the same node as the application container, while the L7 proxy may not be on the same node.
  • Cilium Mode: The L4 and L7 proxies are combined and located on the same node as the application container.
  • gRPC Mode: The gRPC framework is integrated into the application and deployed within the same container.

Sidecar Mode: L4 and L7 Proxy per Service Instance

The diagram below shows the communication paths in sidecar mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.

image
Sidecar Mode: L4 and L7 Proxy per Service Instance

This is the most common deployment mode for service meshes and was the initial mode supported by Istio. Each service instance is accompanied by a proxy (such as Envoy), which handles all inbound and outbound network communications, including L4 and L7 layers.

  • Advantages: High security, as each service instance is isolated, reducing potential attack surfaces.
  • Disadvantages: High resource consumption, as each service instance requires a separate proxy, increasing infrastructure costs.
  • Maturity: The maturity of the Istio Sidecar mode has reached the production level. They have undergone extensive testing and are ready for use in actual environments.

Ambient Mode: Shared L4 – L7 per Service Model

The diagram below illustrates the communication paths in ambient mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.

image
Ambient Mode: Node-shared L4 Proxy, Service Account-shared L7 Proxy

In this mode, a shared L4 proxy on each node serves all service instances on the same physical host, while each service account has a dedicated L7 proxy.

  • Advantages: Lower costs, as the proxy is shared among multiple services.
  • Disadvantages: Although the ztunnel component is designed for security, shared proxies can introduce risks. The security maturity of this model is still evolving.
  • Maturity: The Istio ambient mode is currently in the beta stage; there are no large-scale production-level best practices yet, and it does not support multi-clusters.

Cilium Mesh Mode: Shared L4 and L7 Model

The diagram below displays the communication paths in Cilium mesh mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.

image
Cilium Mesh Mode: Shared L4 and L7 Proxies

This mode is a middle ground between fully independent and fully shared setups, with each node having a shared L7 proxy. However, there are known issues with identities and trust models. The Cilium service mesh, which uses eBPF, allows for network policies without a proxy through kernel programs.

  • Advantages: Kernel-level efficiency can reduce infrastructure costs in specific scenarios.
  • Disadvantages: Management is more complex, and certain scenarios may result in increased latency.
  • Maturity: Cilium mesh manages L4 traffic directly through eBPF and configures the Envoy proxy on each node to control L7 traffic via CRDs (such as CiliumEnvoyConfig). However, there are concerns about its security due to inconsistent identity models.

Note: This model is not the data plane of Istio.

gRPC Mode: L4 and L7 Part of the Application Model

In the gRPC mode, no external proxies are deployed; instead, proxy functions are directly integrated into the application using the RPC framework, leading to significant intrusion into the application. The service mesh control plane uses a set of discovery APIs known as xDS APIs to dynamically configure the application. The gRPC client libraries within the application provide extensive support for the xDS APIs. With this capability, the service mesh control plane can program L4 and L7 proxy functions directly within this library inside the service container.

The diagram below illustrates how, in Istio’s gRPC mode, the control plane communicates with the application.

image
gRPC Mode: L4 and L7 Proxies Integrated into the Application

In this mode, when a gRPC service communicates with the control plane, a traditional Sidecar proxy is not needed; instead, a specific agent is used for initialization and communication with the control plane. This design reduces resource consumption and deployment complexity while still enabling functions such as service discovery and traffic management.

  • Advantages: High performance, as the proxy is tightly integrated with the application, reducing network hops and additional overhead.
  • Disadvantages: High complexity, as complex network processing functions need to be implemented within the application, which may increase development costs.
  • Security Considerations: The security of this model is debated. While integrating proxy functions within the application theoretically reduces the external attack surface, the application’s diversity and complexity could expand the overall attack surface. Therefore, when considering the security of the gRPC mode, it is crucial to carefully analyze the security threat model and attack risks in specific use cases.
  • Maturity: The gRPC mode in Istio is still in the experimental stage.

Which Mode Should I Use?

As previously introduced, several factors influence the choice of a service mesh data plane deployment mode:

  • Maturity
  • Enterprise security needs
  • Resource constraints
  • Performance requirements
  • Network overhead
  • Tolerance for management complexity

Maturity

When considering the deployment modes of the service mesh data plane, maturity is a key factor. The maturity level of each mode affects its reliability and support in production environments:

  • Sidecar Mode: This is the most mature service mesh deployment mode, widely adopted in production environments and well-supported.
  • Ambient Mode: While this mode offers some cost and performance advantages, it is still in the early stages and may lack mature best practices and broad ecosystem support.
  • Cilium Mesh Mode: As a relatively new option, it offers unique technological advantages, especially in scenarios using eBPF. However, concerns about its security model and identity management suggest it may not be as mature or reliable as other modes.
  • gRPC Mode: Despite excellent performance, the complexity and intrusiveness of this mode mean it may require more custom development and is still in the experimental stage.

Enterprise Security Needs

If your business has high security requirements, such as in the financial or healthcare sectors, then the Sidecar Mode might be the best choice. This mode provides strong security by ensuring each service instance has its own independent proxy, thus maximizing service isolation. For those exploring newer models like Ambient Mode, it’s essential to understand that while ztunnel aims for secure local routing, the model’s overall security strategy is still evolving.

Resource Constraints

In resource-constrained environments, deploying a separate proxy for each service instance may not be practical. In such cases, consider the gRPC Mode or Ambient Mode. gRPC Mode is particularly suitable for organizations that already use gRPC extensively and are willing to handle complex networking functions internally within the application. The Ambient Mode, on the other hand, uses a shared proxy to reduce resource consumption.

Performance Requirements

For applications requiring high performance and low latency, the gRPC Mode provides optimal performance because it eliminates the additional network hops introduced by traditional proxies. However, it’s important to note that the gRPC Mode is still experimental and may not support all features of Istio. Consider your service mesh functionality needs accordingly.

Network Overhead

Each data plane mode has distinct characteristics affecting network overhead. Sidecar mode, with locality-aware routing, reduces cross-zone traffic but adds network hops, increasing latency and compute use. Ambient mode uses ztunnels for local routing but may incur cross-AZ costs with waypoint proxies. Cilium mode places proxies on the same node as applications, potentially reducing inter-node traffic but could introduce more latency. gRPC mode integrates RPC framework into the application, minimizing network hops and overhead, ideal for high-performance, low-latency needs.

Tolerance for Management Complexity

Management complexity is also a significant consideration when choosing a service mesh data plane mode. Sidecar Mode and gRPC Mode may require more complex configurations and maintenance, while the Ambient Mode might offer a more streamlined management experience in some deployment environments. Cilium Mode could require complex management due to its reliance on eBPF and multiple configuration points.

Conclusion

Choosing the right service mesh data plane deployment mode depends on specific factors including maturity, security, resource constraints, performance, and management complexity. Here’s a quick guide:

  • Sidecar Mode: Best for high security needs, offering the most isolation.
  • gRPC Mode: Suitable for environments with high-performance demands where gRPC is already in use.
  • Ambient Mode: Good for cost-effectiveness and lower isolation needs, but the security model is evolving.
  • Cilium Mesh Mode: Could be good for infrastructures utilizing eBPF technology, but consider security and management complexity.

The best choice will align with your application requirements, security policies, and technical familiarity. It’s essential to understand each mode’s strengths and limitations to make an informed decision that balances benefits, risks, and costs.

References


This blog was initially published at tetrate.io.

Last updated on Oct 8, 2024