Envoy Jaeger Tracing Example Explained

This article provides a detailed analysis of the official Envoy example ‘jaeger-tracing’, explaining its construction process, related files, and functionality. It aims to supplement the Envoy documentation on Jaeger tracing, showcasing the architectural concept of service mesh and laying the foundation for learning service mesh technologies like Istio.

This article provides a detailed analysis of the official Envoy example jaeger-tracing, explaining its construction process, related files, and functionality. It aims to supplement the Envoy documentation on Jaeger tracing. The example deploys services and Envoy within the same container, with Envoy handling service discovery, tracing, and network logic, showcasing the architectural concept of service mesh. This design not only helps understand distributed tracing implementation but also lays the foundation for learning service mesh technologies like Istio.

Architecture Overview

This example uses docker-compose to launch a microservice environment with multiple containers. Instead of the traditional Sidecar model, it adopts an entry proxy combined with internal routing proxies.

Core Architecture

  1. An entry Envoy proxy: Named front-envoy, it serves as the unified entry point for all external requests and initiates tracing.

  2. Two backend services: service1 and service2. These service containers run an Envoy proxy internally, which handles inter-service routing and propagates tracing context rather than generating tracing data.

  3. A Jaeger All-in-One instance: Responsible for receiving, storing, and visualizing tracing data sent by front-envoy.

Request and Tracing Flow

An external request first reaches front-envoy. front-envoy initiates tracing, generates a Span, and adds tracing headers before routing the request to service1. The Envoy inside service1 propagates tracing headers and routes the request to service2. Finally, all Span data is sent to Jaeger by front-envoy.

Request and Tracing Flow Diagram
Request and Tracing Flow Diagram

2. File Functionality Breakdown

Let’s analyze the purpose of each file.

docker-compose.yaml

This is the “orchestrator” of the example, defining four core services: jaeger, service1, service2, and front-envoy.

  • front-envoy: Uses envoy.yaml as its configuration and exposes port 10000.
  • service1 & service2: Backend services. Each mounts service1-envoy-jaeger.yaml and service2-envoy-jaeger.yaml respectively for their internal Envoy processes.
  • jaeger: Jaeger service for receiving and visualizing tracing data.

envoy.yaml (front-envoy configuration)

This is the core configuration file for the entry proxy front-envoy.

  • Key Configurations:
    • tracing: Enables and initiates tracing.
    • provider: Specifies the tracing service provider as zipkin (compatible with Jaeger).
    • collector_cluster: Points to the jaeger service for sending tracing data.
    • route_config: Defines routing rules, routing all incoming requests (/) to the service1 cluster.
    • clusters: Defines upstream clusters, including service1 and jaeger.

service1-envoy-jaeger.yaml (service1 internal Envoy configuration)

This configuration file is loaded by the Envoy inside the service1 container.

  • Purpose: Purely for routing.
  • Key Configurations:
    • It does not have a tracing block, so it does not initiate tracing but propagates tracing headers from upstream (front-envoy).
    • route_config: Routes all incoming requests to the service2 cluster.

service2-envoy-jaeger.yaml (service2 internal Envoy configuration)

Similar to service1, this configuration is loaded by the Envoy inside the service2 container.

  • Purpose: Endpoint routing.
  • Key Configurations:
    • Also does not have a tracing block.
    • route_config: Routes requests to service_log (considered the endpoint in this example).

verify.sh

An automated script to verify if the example is working correctly.

  • Execution Flow:
    1. curl localhost:10000/: Sends a request to front-envoy.
    2. Waits a few seconds for front-envoy to asynchronously send tracing data to Jaeger.
    3. Queries Jaeger API (http://localhost:16686/api/traces?service=front-proxy) for tracing data generated by the front-proxy service (defined in envoy.yaml) and verifies its completeness.

3. Build and Run Process

  1. Start all services:
docker-compose up --build -d
  1. Send a test request:
./verify.sh
  1. View tracing in Jaeger UI:
  • Open Jaeger UI in your browser: http://localhost:16686
  • Select front-proxy from the “Service” dropdown menu in the top left corner.
  • Click the “Find Traces” button.
  • Click on a trace record to view a complete call chain with multiple Spans, clearly showing the flow from front-envoy to service1 and then to service2.

4. Internal Processes in Containers

Below is an analysis of the core processes running inside each container.

jaeger-tracing-front-envoy-1

  • Container Role: Entry Envoy proxy.
  • Running Process:
    • envoy: The only core process in this container. The docker-entrypoint script starts the Envoy proxy, which loads the envoy.yaml configuration file, listens on port 10000, processes all incoming requests, and initiates and reports tracing data.

jaeger-tracing-service1-1

  • Container Role: Backend application service service1.
  • Running Processes:
    • envoy (background process): The container’s startup script (/usr/local/bin/start_service.sh) first starts an Envoy proxy process in the background. This Envoy process loads the service1-envoy-jaeger.yaml configuration and routes incoming requests to service2.
    • python (foreground process): The script then starts a Python application server (based on aiohttp), which listens on an internal port (e.g., 8000) and processes requests forwarded by its internal Envoy proxy.

jaeger-tracing-service2-1

  • Container Role: Backend application service service2.
  • Running Processes:
    • envoy (background process): Similar to service1, the container’s startup script first starts an Envoy proxy in the background, loading the service2-envoy-jaeger.yaml configuration.
    • python (foreground process): The script then starts an aiohttp Python application server as the business logic for service2.

jaeger-tracing-jaeger-1

  • Container Role: Jaeger distributed tracing system.
  • Running Process:
    • /go/bin/all-in-one: This is the official Jaeger “all-in-one” executable. This single process contains all the core components of Jaeger, making it easy to deploy in development and testing environments:
      • Jaeger Agent: Listens for and receives Span data (e.g., via Zipkin protocol on port 9411).
      • Jaeger Collector: Receives data from the Agent, validates, processes, and stores it.
      • Jaeger Query: Provides an API for querying and retrieving tracing data.
      • Jaeger UI: Offers a web interface (on port 16686) for visualizing tracing data.

5. In-depth Discussion: Why Does the Service Container Need Envoy?

This is a crucial question that touches on the essence of the “Service Mesh” architecture. In simple terms, running an Envoy process inside the service container is to decouple the complex network communication logic from the application code.

Even though in this example, front-envoy handles the “initiation” of tracing, the Envoy processes inside service1 and service2 still play a critical role:

1. Service Discovery

  • Problem: How does the Python code in service1 know where service2 is? In a dynamic container environment, IP addresses change, and hardcoding addresses is not feasible.
  • Solution: The Python code in service1 is configured to send all outbound requests to its own container’s Envoy proxy (usually sent to localhost). The internal Envoy proxy, based on its configuration file (service1-envoy-jaeger.yaml), knows the logical name of the service2 service and resolves it to the correct container address via Docker’s internal DNS.
  • Benefit: The application code becomes extremely simple; it doesn’t need to care about the network topology, just the logical name of the next service.

2. Trace Context Propagation

  • Problem: front-envoy initiates tracing and generates tracing headers (e.g., x-b3-traceid). If service1 directly calls service2, who ensures these tracing headers are passed correctly?
  • Solution: The Envoy proxy in service1, upon receiving the request, automatically recognizes these tracing headers and ensures they are included when forwarding the request to service2.
  • Benefit: This guarantees that the distributed tracing chain is unbroken. Developers do not need to write any logic in the Python code to extract and re-inject tracing headers. Envoy handles all of this transparently.

3. Unified Network Control Layer

  • Problem: If you want to add more complex network policies for service-to-service calls, such as retries, timeouts, circuit breaking, traffic encryption (mTLS), where should this be implemented?
  • Solution: All of this can be declaratively done in the Envoy configuration file. You don’t need to re-implement these complex logics in every service written in Python, Java, Go, etc.
  • Benefits:
    • Language-agnostic: No matter what language your service is written in, the network behavior is controlled by Envoy.
    • Simplified Application: Application developers can focus on business logic rather than handling complex network failure scenarios.
    • Centralized Management: Operations personnel can adjust the network policies of the entire system by modifying the Envoy configuration without changing or redeploying any application code.

Summary

In this example, the Envoy process inside the service container, while seemingly just doing simple routing, is actually building the foundation of a micro-sized but fully functional “service mesh”. It acts as an intelligent, configurable local network proxy, decoupling the service itself (Python application) from the complex communication (network) between services.

6. Dockerfile and Envoy Configuration Deep Dive

Dockerfile Analysis (Based on Common Practices)

The Dockerfile in this example is located in the ../shared/ directory, reflecting the idea of image reuse.

../shared/envoy/Dockerfile (For front-envoy)

This Dockerfile is meant to build a pure Envoy proxy image.

# Use the official Envoy image as the base
FROM envoyproxy/envoy:v1.23-latest 
# (v1.23-latest is an example version)

# Copy the docker-entrypoint.sh script into the container and grant execute permissions
COPY docker-entrypoint.sh /
RUN chmod +x /docker-entrypoint.sh

# Set the command to be executed when the container starts
ENTRYPOINT ["/docker-entrypoint.sh"]

# Default command, can be overridden by command in docker-compose.yaml
CMD ["/usr/local/bin/envoy", "-c", "/etc/envoy/envoy.yaml"]
  • Core: It is based on the official envoyproxy/envoy image, which already contains the compiled Envoy binary.
  • ENTRYPOINT: Uses a custom docker-entrypoint.sh script as the entry point, usually to perform some pre-processing tasks (like waiting for other services to be ready) before starting Envoy.

../shared/python/Dockerfile (For service1 and service2)

This Dockerfile is more complex as it needs to package both the Python application and the Envoy proxy into the same image.

# Use a base image with Python environment
FROM python:3.9-slim

# Install Envoy
# (This would include steps to download and install the Envoy binary from the internet)
# For example:
# RUN apt-get update && apt-get install -y curl
# RUN curl -L https://getenvoy.io/cli | bash -s -- -b /usr/local/bin
# RUN getenvoy fetch envoy:v1.23-latest --path /usr/local/bin/envoy

# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application code and startup script
COPY app.py /
COPY start_service.sh /
RUN chmod +x /start_service.sh

# Set the command to be executed when the container starts
CMD ["/start_service.sh"]
  • Multi-stage Build: It first ensures a Python environment and then installs Envoy.
  • Packaging: It copies the Python application (app.py), dependencies (requirements.txt), and startup script (start_service.sh) into the image.
  • CMD: The container executes the start_service.sh script on startup, which is responsible for starting the background envoy process and then the foreground python application process.

Key Fields in Envoy Configuration

envoy.yaml (front-envoy)

# ...
tracing:
  provider:
    name: envoy.tracers.zipkin
    typed_config:
      "@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig
      collector_cluster: jaeger
      collector_endpoint: "/api/v2/spans"
      shared_span_context: false
      collector_endpoint_version: HTTP_JSON
# ...
clusters:
- name: service1
  type: STRICT_DNS
  lb_policy: ROUND_ROBIN
  load_assignment:
    cluster_name: service1
    endpoints:
    - lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: service1
              port_value: 8000
  • tracing:
    • provider: Defines the tracer. envoy.tracers.zipkin is a built-in Envoy tracer compatible with the Zipkin protocol, which Jaeger supports.
    • collector_cluster: Key Connection. It tells Envoy to send tracing data to the upstream cluster named jaeger.
    • collector_endpoint: Specifies the API path of the Jaeger Collector that receives the data.
  • clusters:
    • type: STRICT_DNS: Indicates that Envoy will use standard DNS queries to discover the backend addresses in this cluster. In the Docker Compose environment, service names (like service1) are automatically registered in Docker’s internal DNS.
    • address: service1: Envoy will DNS query the host named service1, and Docker DNS will resolve it to the IP address of the jaeger-tracing-service1-1 container.

service1-envoy-jaeger.yaml

# ...
route_config:
  name: local_route
  virtual_hosts:
  - name: backend
    domains:
    - "*"
    routes:
    - match:
        prefix: "/"
      route:
        cluster: service2
      decorator:
        operation: checkStock
  • decorator:
    • operation: checkStock: This is a crucial field. It gives a business name checkStock to this routing operation (i.e., the call to service2). When you view tracing data in the Jaeger UI, the Span representing this step will be named checkStock instead of a vague HTTP URL. This greatly enhances the readability of tracing data.

7. End-to-End Tracing Process Detailed Explanation: The Complete Lifecycle of a Request

To fully understand how it works, let’s follow an external request and see how the tracing information is generated, propagated, and finally displayed.

Request Tracing Process
Request Tracing Process

Step 1: The Birth of Tracing (Initiation)

  1. External Request: A client (e.g., curl) sends a regular HTTP request to http://localhost:10000/. This request does not contain any tracing information.
  2. Decision by front-envoy: The request reaches front-envoy. With tracing configured in envoy.yaml, Envoy checks the request headers.
  3. Generate Tracing Context: front-envoy detects the absence of incoming tracing headers (like x-b3-traceid) and determines that this is the start of a new request. It will:
    • Generate a globally unique Trace ID. This ID will span the entire request chain.
    • Generate a Span ID for the first operation it will perform (routing the request to service1).
    • Package these IDs and other information (collectively called “tracing context”) into standard HTTP headers (such as x-b3-traceid, x-b3-spanid, x-b3-sampled, etc.).
  4. Inject Tracing Headers: front-envoy injects these newly generated tracing headers into the request before forwarding it to service1.

Thus, a distributed trace is born.

Step 2: Context Propagation

  1. Arrival at service1: The request, now with tracing headers, arrives at the service1 container and is received by its internal envoy process.
  2. Transparent Proxying: The envoy of service1 checks the request headers, finds the tracing context, and since its own configuration (service1-envoy-jaeger.yaml) does not have a tracing block, it does not attempt to initiate a new trace or modify the trace IDs. Its duty is to propagate.
  3. Forward to Application: The envoy of service1 forwards the request (along with all tracing headers) to the python application listening within the same container.
  4. Internal Service Call: After processing the business logic, the python application in service1 needs to call service2. It sends the request to its local envoy.
  5. Propagation Continues: The envoy of service1 receives this outbound request and continues to add the tracing headers to the request being sent to service2. This process is completely transparent to the Python application.
  6. Arrival at service2: The request arrives at service2, and its internal envoy similarly and transparently propagates the tracing headers to the python application.

Step 3: Data Collection and Aggregation

  1. Span Generation: Throughout the request-response chain, only front-envoy actively generates tracing data. It creates Spans at several key points:
    • Entry Span: When it receives an external request and prepares to forward it to service1.
    • Exit Span: When service1 calls service2 through it.
    • And it records information like duration and HTTP status code at each step, completing the corresponding Span.
  2. Asynchronous Reporting: front-envoy does not block the request to report tracing data. It places the completed Spans in a buffer and asynchronously sends them to the jaeger cluster (defined by collector_cluster: jaeger).
  3. Jaeger Collector: The Collector component of Jaeger receives these Span data over the Zipkin-compatible protocol on port 9411.
  4. Aggregation: The Collector groups the Spans with the same Trace ID together as belonging to the same request chain. It uses the Span ID and Parent Span ID fields of each Span to assemble them into a parent-child tree structure (Trace).

Step 4: Visualization

  1. Querying: When you open the Jaeger UI in your browser and query for the front-proxy service, the Jaeger Query component retrieves the complete Trace data from the storage.
  2. Rendering: The Jaeger UI renders this tree-structured Trace data into an intuitive Flame Graph or Gantt chart.
    • You can see the root Span representing the entry operation at front-envoy.
    • Its child Span for the call to service1.
    • service1’s child Span for the call to service2.
    • The length of each Span represents its duration, helping to easily identify performance bottlenecks in the chain.
    • Clicking on each Span reveals detailed tags (like http.method, http.status_code) attached by Envoy, and the business operation name checkStock defined by decorator.operation.

Through this complete lifecycle, Envoy and Jaeger collaborate to provide a powerful and clear distributed tracing solution with minimal intrusion to the application code.

8. Conclusion

This article, by dissecting the official Envoy Jaeger Tracing example, has 深入探讨了分布式追踪的实现原理和架构设计。我们从架构概览开始,了解了如何通过 docker-compose 启动一个包含多个容器的微服务环境,并通过 Envoy 代理实现追踪上下文的传播。接着,我们分析了各个配置文件的功能,特别是如何通过 Envoy 的路由和追踪配置来实现服务间的追踪信息传递。最后,我们详细描述了一个请求的完整生命周期,从追踪的诞生、上下文的传播,到数据的收集与聚合,直至可视化展示,全面展示了 Envoy 和 Jaeger 在分布式追踪中的强大协作能力。

这种设计不仅有助于理解分布式追踪的实现,还为深入学习如 Istio 等服务网格技术奠定了基础。通过这个示例,读者可以更好地掌握服务网格中的追踪上下文传播与数据可视化,为实际应用中的可观测性打下坚实的基础。

如果你对分布式追踪、服务网格或 Envoy 有任何疑问,欢迎在评论区留言讨论。希望本文能为你的学习和实践提供帮助!

9. References

文章导航

评论区