Try readFile: /home/runner/work/website/website/content/en/slide/beyond-sidecar/index.html

Beyond Sidecar:

In-depth Analysis of Istio Ambient Mode Traffic Mechanisms and Cost Efficiency

Jimmy Song
Developer Advocate at Tetrate

March 15, 2025

Agenda

  1. Why focus on Ambient Mode?
  2. Core Concepts of Istio Ambient Mode
  3. Traffic Flow and Key Mechanisms
  4. Ambient Mode vs. Sidecar Mode
  5. Summary & Q&A

1. Why Focus on Ambient Mode?

Challenges of Service Meshes

  • Resource overhead and operational complexity caused by Sidecar proxies
  • Upgrades or restarts often require restarting all Pods
  • Increasing demand for high-performance and cost efficiency

Key Question: Can we retain the core capabilities of a service mesh (security, observability, traffic control) while reducing intrusion and extra resource consumption for each Pod?

Four Deployment Models of Service Meshes

代理的位置

The Birth of Ambient Mode

  • Istio's new architecture removes Sidecars and leverages ztunnel + Waypoint Proxy to simplify the data plane.
  • Reduces resource consumption and complexity.
  • Still supports mTLS and policy controls with optional L7 proxying.

Comparison of Deployment Models

Model Security Efficiency Manageability Performance
Sidecar Mode High security, isolated proxy High resource usage Centralized management but complex Increased latency
Ambient Mode Security via ztunnel, still evolving More efficient, shared proxy Simpler management, some risks Good, but may have cross-AZ costs
Cilium Mesh Moderate security, based on eBPF Kernel-level efficiency Complex configuration Variable, may increase latency
gRPC Integrated security, app-based High efficiency Complex updates Low latency, ideal for real-time use cases

部署模式象限

2. Core Concepts of Istio Ambient Mode

Core Components of Ambient Mode

  • ztunnel (L4):
    • Node-level proxy
    • Handles transparent traffic interception and mTLS encryption
    • Most traffic only needs L4 forwarding
  • Waypoint Proxy (L7):
    • Deploy as needed (Namespace / Service / Pod level)
    • Handles HTTP/gRPC functions (auth, routing, observability)
  • Istio CNI
    • Replaces istio-init container, intercepts outbound traffic from Pods

Overall Architecture of Ambient Mode

Istio Ambient Architecture

Waypoint Proxy Deployment Strategies

  • Namespace Level (Default): Applies to all workloads in a namespace
  • Service Level: Only for specific critical services needing L7
  • Pod Level: More fine-grained control
  • Cross-Namespace: Use Gateway resources for sharing

Istio CNI

  • Traffic Interception: Replaces istio-init container, making installation cleaner.
  • Supports Two Modes: Compatible with Sidecar Mode and Ambient Mode.
  • Non-Privileged Mode Compatibility: Allows Pods to run in a non-privileged mode, enhancing security.
  • CNI Chaining: Extends CNI configuration at the node level.
  • Pod Internal Traffic Redirection (Ambient Mode):
    • Uses iptables REDIRECT rules inside Pod network namespaces.
    • Creates internal sockets to intercept and proxy traffic.

Resolving Conflicts with Kubernetes CNI

Istio CNI 插件的运行步骤

Istio CNI Plugin Operation

Istio CNI 插件工作原理

3. Traffic Flow and Key Mechanisms

Transparent Traffic Interception

  • Istio CNI injects iptables rules into the Pod network namespace
  • Pod outbound traffic is redirected to ztunnel
  • ztunnel decides whether to forward traffic to Waypoint Proxy for L7 processing

HBONE Protocol

  • Uses HTTP/2 + CONNECT to establish secure tunnels
  • Implements mTLS encryption and multiplexing
  • Reduces extra connection overhead, simplifies proxy handling

HBONE

HBONE Example

ztunnel A → ztunnel B:

  • Source: ztunnel_A_IP:52368
  • Destination: Node_B_IP:15008
:method: CONNECT
:scheme: https
:authority: Pod_B_IP:9080
:path: /api/v1/users?id=123
x-envoy-original-dst-host: Pod_B_IP:9080
x-forwarded-proto: hbone
x-istio-attributes: ...
x-istio-auth-userinfo: ...
...

Encrypted Traffic on the Same Node

Encrypted Traffic on the Same Node

Encrypted Traffic Across Nodes (L4)

Encrypted Traffic Across Nodes (L4)

Encrypted Traffic Across Nodes (L7)

Encrypted Traffic Across Nodes (L7)

Fallback Traffic (Preventing Traffic Leakage)

Traffic from Non-Mesh Pods

Differences Between L4 and L7 Traffic

Traffic Type Processing Location Example Scenarios
L4 ztunnel (transparent forwarding) TCP-level traffic, no application-layer policies needed
L7 ztunnel → Waypoint Proxy HTTP/gRPC requiring authentication, circuit breaking, routing, etc.

4. Ambient Mode vs. Sidecar Mode

Limitations of Ambient Mode

The following features are supported in Sidecar Mode but not yet in Ambient Mode:

  • Loss of fine-grained customization for individual Pods when using both Sidecar and Ambient Mode
  • Multi-cluster installation
  • Multi-network support
  • Virtual machine support

Feature and Difference Comparison

Aspect Sidecar Mode Ambient Mode
Proxy Location Each Pod has an Envoy Sidecar Node-level ztunnel + optional Waypoint Proxy
Resource Overhead Consumes CPU/memory in every Pod Lower, as proxies are shared at node or namespace level
Operational Complexity Injecting/upgrading Sidecars requires restarting or rolling updates of all Pods Easier deployment/upgrades, only ztunnel/Waypoint needs updating
Performance Good isolation per Pod but overall higher overhead Better L4 performance, L7 requires an additional hop
Feature Completeness Mature, supports multi-cluster, VM, hybrid networks Still evolving, some advanced features (multi-network, VM) not fully supported yet
Typical Use Cases Strict isolation, fine-grained traffic control Large-scale clusters needing lightweight management

Deployment Recommendations

  • If you already have a Sidecar-based architecture with mature features, it is best to continue using Sidecar.
  • If you prioritize resource savings and simplified operations, and most traffic is L4, consider adopting Ambient Mode.
  • If some applications still require Sidecar, a hybrid deployment can be considered (but requires extra planning).

5. Summary & Q&A

Key Takeaways

  1. Ambient Mode: Reduces proxy burden per Pod, lowers operational cost
  2. ztunnel + Waypoint architecture: Only uses Waypoint when L7 functions are needed
  3. Although GA, be aware of multi-cluster / VM / multi-network limitations, best practices are still evolving
  4. Use cases: Large-scale clusters with heavy L4 traffic + teams prioritizing resource and management efficiency

Q & A

Thank you for listening.