This article provides a detailed analysis of the official Envoy example
jaeger-tracing
, explaining its construction process, related files, and functionality. It aims to supplement the Envoy documentation on
Jaeger tracing
. The example deploys services and Envoy within the same container, with Envoy handling service discovery, tracing, and network logic, showcasing the architectural concept of service mesh. This design not only helps understand distributed tracing implementation but also lays the foundation for learning service mesh technologies like Istio.
Architecture Overview
This example uses docker-compose to launch a microservice environment with multiple containers. Instead of the traditional Sidecar model, it adopts an entry proxy combined with internal routing proxies.
Core Architecture
An entry Envoy proxy: Named
front-envoy, it serves as the unified entry point for all external requests and initiates tracing.Two backend services:
service1andservice2. These service containers run an Envoy proxy internally, which handles inter-service routing and propagates tracing context rather than generating tracing data.A Jaeger All-in-One instance: Responsible for receiving, storing, and visualizing tracing data sent by
front-envoy.
Request and Tracing Flow
An external request first reaches front-envoy. front-envoy initiates tracing, generates a Span, and adds tracing headers before routing the request to service1. The Envoy inside service1 propagates tracing headers and routes the request to service2. Finally, all Span data is sent to Jaeger by front-envoy.
2. File Functionality Breakdown
Let’s analyze the purpose of each file.
docker-compose.yaml
This is the “orchestrator” of the example, defining four core services: jaeger, service1, service2, and front-envoy.
front-envoy: Usesenvoy.yamlas its configuration and exposes port10000.service1&service2: Backend services. Each mountsservice1-envoy-jaeger.yamlandservice2-envoy-jaeger.yamlrespectively for their internal Envoy processes.jaeger: Jaeger service for receiving and visualizing tracing data.
envoy.yaml (front-envoy configuration)
This is the core configuration file for the entry proxy front-envoy.
- Key Configurations:
tracing: Enables and initiates tracing.provider: Specifies the tracing service provider aszipkin(compatible with Jaeger).collector_cluster: Points to thejaegerservice for sending tracing data.route_config: Defines routing rules, routing all incoming requests (/) to theservice1cluster.clusters: Defines upstream clusters, includingservice1andjaeger.
service1-envoy-jaeger.yaml (service1 internal Envoy configuration)
This configuration file is loaded by the Envoy inside the service1 container.
- Purpose: Purely for routing.
- Key Configurations:
- It does not have a
tracingblock, so it does not initiate tracing but propagates tracing headers from upstream (front-envoy). route_config: Routes all incoming requests to theservice2cluster.
- It does not have a
service2-envoy-jaeger.yaml (service2 internal Envoy configuration)
Similar to service1, this configuration is loaded by the Envoy inside the service2 container.
- Purpose: Endpoint routing.
- Key Configurations:
- Also does not have a
tracingblock. route_config: Routes requests toservice_log(considered the endpoint in this example).
- Also does not have a
verify.sh
An automated script to verify if the example is working correctly.
- Execution Flow:
curl localhost:10000/: Sends a request tofront-envoy.- Waits a few seconds for
front-envoyto asynchronously send tracing data to Jaeger. - Queries Jaeger API (
http://localhost:16686/api/traces?service=front-proxy) for tracing data generated by thefront-proxyservice (defined inenvoy.yaml) and verifies its completeness.
3. Build and Run Process
- Start all services:
docker-compose up --build -d
- Send a test request:
./verify.sh
- View tracing in Jaeger UI:
- Open Jaeger UI in your browser: http://localhost:16686
- Select
front-proxyfrom the “Service” dropdown menu in the top left corner. - Click the “Find Traces” button.
- Click on a trace record to view a complete call chain with multiple Spans, clearly showing the flow from
front-envoytoservice1and then toservice2.
4. Internal Processes in Containers
Below is an analysis of the core processes running inside each container.
jaeger-tracing-front-envoy-1
- Container Role: Entry Envoy proxy.
- Running Process:
envoy: The only core process in this container. Thedocker-entrypointscript starts the Envoy proxy, which loads theenvoy.yamlconfiguration file, listens on port10000, processes all incoming requests, and initiates and reports tracing data.
jaeger-tracing-service1-1
- Container Role: Backend application service
service1. - Running Processes:
envoy(background process): The container’s startup script (/usr/local/bin/start_service.sh) first starts an Envoy proxy process in the background. This Envoy process loads theservice1-envoy-jaeger.yamlconfiguration and routes incoming requests toservice2.python(foreground process): The script then starts a Python application server (based onaiohttp), which listens on an internal port (e.g.,8000) and processes requests forwarded by its internal Envoy proxy.
jaeger-tracing-service2-1
- Container Role: Backend application service
service2. - Running Processes:
envoy(background process): Similar toservice1, the container’s startup script first starts an Envoy proxy in the background, loading theservice2-envoy-jaeger.yamlconfiguration.python(foreground process): The script then starts anaiohttpPython application server as the business logic forservice2.
jaeger-tracing-jaeger-1
- Container Role: Jaeger distributed tracing system.
- Running Process:
/go/bin/all-in-one: This is the official Jaeger “all-in-one” executable. This single process contains all the core components of Jaeger, making it easy to deploy in development and testing environments:- Jaeger Agent: Listens for and receives Span data (e.g., via Zipkin protocol on port
9411). - Jaeger Collector: Receives data from the Agent, validates, processes, and stores it.
- Jaeger Query: Provides an API for querying and retrieving tracing data.
- Jaeger UI: Offers a web interface (on port
16686) for visualizing tracing data.
- Jaeger Agent: Listens for and receives Span data (e.g., via Zipkin protocol on port
5. In-depth Discussion: Why Does the Service Container Need Envoy?
This is a crucial question that touches on the essence of the “Service Mesh” architecture. In simple terms, running an Envoy process inside the service container is to decouple the complex network communication logic from the application code.
Even though in this example, front-envoy handles the “initiation” of tracing, the Envoy processes inside service1 and service2 still play a critical role:
1. Service Discovery
- Problem: How does the Python code in
service1know whereservice2is? In a dynamic container environment, IP addresses change, and hardcoding addresses is not feasible. - Solution: The Python code in
service1is configured to send all outbound requests to its own container’s Envoy proxy (usually sent tolocalhost). The internal Envoy proxy, based on its configuration file (service1-envoy-jaeger.yaml), knows the logical name of theservice2service and resolves it to the correct container address via Docker’s internal DNS. - Benefit: The application code becomes extremely simple; it doesn’t need to care about the network topology, just the logical name of the next service.
2. Trace Context Propagation
- Problem:
front-envoyinitiates tracing and generates tracing headers (e.g.,x-b3-traceid). Ifservice1directly callsservice2, who ensures these tracing headers are passed correctly? - Solution: The Envoy proxy in
service1, upon receiving the request, automatically recognizes these tracing headers and ensures they are included when forwarding the request toservice2. - Benefit: This guarantees that the distributed tracing chain is unbroken. Developers do not need to write any logic in the Python code to extract and re-inject tracing headers. Envoy handles all of this transparently.
3. Unified Network Control Layer
- Problem: If you want to add more complex network policies for service-to-service calls, such as retries, timeouts, circuit breaking, traffic encryption (mTLS), where should this be implemented?
- Solution: All of this can be declaratively done in the Envoy configuration file. You don’t need to re-implement these complex logics in every service written in Python, Java, Go, etc.
- Benefits:
- Language-agnostic: No matter what language your service is written in, the network behavior is controlled by Envoy.
- Simplified Application: Application developers can focus on business logic rather than handling complex network failure scenarios.
- Centralized Management: Operations personnel can adjust the network policies of the entire system by modifying the Envoy configuration without changing or redeploying any application code.
Summary
In this example, the Envoy process inside the service container, while seemingly just doing simple routing, is actually building the foundation of a micro-sized but fully functional “service mesh”. It acts as an intelligent, configurable local network proxy, decoupling the service itself (Python application) from the complex communication (network) between services.
6. Dockerfile and Envoy Configuration Deep Dive
Dockerfile Analysis (Based on Common Practices)
The Dockerfile in this example is located in the ../shared/ directory, reflecting the idea of image reuse.
../shared/envoy/Dockerfile (For front-envoy)
This Dockerfile is meant to build a pure Envoy proxy image.
# Use the official Envoy image as the base
FROM envoyproxy/envoy:v1.23-latest
# (v1.23-latest is an example version)
# Copy the docker-entrypoint.sh script into the container and grant execute permissions
COPY docker-entrypoint.sh /
RUN chmod +x /docker-entrypoint.sh
# Set the command to be executed when the container starts
ENTRYPOINT ["/docker-entrypoint.sh"]
# Default command, can be overridden by command in docker-compose.yaml
CMD ["/usr/local/bin/envoy", "-c", "/etc/envoy/envoy.yaml"]
- Core: It is based on the official
envoyproxy/envoyimage, which already contains the compiled Envoy binary. ENTRYPOINT: Uses a customdocker-entrypoint.shscript as the entry point, usually to perform some pre-processing tasks (like waiting for other services to be ready) before starting Envoy.
../shared/python/Dockerfile (For service1 and service2)
This Dockerfile is more complex as it needs to package both the Python application and the Envoy proxy into the same image.
# Use a base image with Python environment
FROM python:3.9-slim
# Install Envoy
# (This would include steps to download and install the Envoy binary from the internet)
# For example:
# RUN apt-get update && apt-get install -y curl
# RUN curl -L https://getenvoy.io/cli | bash -s -- -b /usr/local/bin
# RUN getenvoy fetch envoy:v1.23-latest --path /usr/local/bin/envoy
# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy application code and startup script
COPY app.py /
COPY start_service.sh /
RUN chmod +x /start_service.sh
# Set the command to be executed when the container starts
CMD ["/start_service.sh"]
- Multi-stage Build: It first ensures a Python environment and then installs Envoy.
- Packaging: It copies the Python application (
app.py), dependencies (requirements.txt), and startup script (start_service.sh) into the image. CMD: The container executes thestart_service.shscript on startup, which is responsible for starting the backgroundenvoyprocess and then the foregroundpythonapplication process.
Key Fields in Envoy Configuration
envoy.yaml (front-envoy)
# ...
tracing:
provider:
name: envoy.tracers.zipkin
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig
collector_cluster: jaeger
collector_endpoint: "/api/v2/spans"
shared_span_context: false
collector_endpoint_version: HTTP_JSON
# ...
clusters:
- name: service1
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service1
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service1
port_value: 8000
tracing:provider: Defines the tracer.envoy.tracers.zipkinis a built-in Envoy tracer compatible with the Zipkin protocol, which Jaeger supports.collector_cluster: Key Connection. It tells Envoy to send tracing data to the upstream cluster namedjaeger.collector_endpoint: Specifies the API path of the Jaeger Collector that receives the data.
clusters:type: STRICT_DNS: Indicates that Envoy will use standard DNS queries to discover the backend addresses in this cluster. In the Docker Compose environment, service names (likeservice1) are automatically registered in Docker’s internal DNS.address: service1: Envoy will DNS query the host namedservice1, and Docker DNS will resolve it to the IP address of thejaeger-tracing-service1-1container.
service1-envoy-jaeger.yaml
# ...
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service2
decorator:
operation: checkStock
decorator:operation: checkStock: This is a crucial field. It gives a business namecheckStockto this routing operation (i.e., the call toservice2). When you view tracing data in the Jaeger UI, the Span representing this step will be namedcheckStockinstead of a vague HTTP URL. This greatly enhances the readability of tracing data.
7. End-to-End Tracing Process Detailed Explanation: The Complete Lifecycle of a Request
To fully understand how it works, let’s follow an external request and see how the tracing information is generated, propagated, and finally displayed.
Step 1: The Birth of Tracing (Initiation)
- External Request: A client (e.g.,
curl) sends a regular HTTP request tohttp://localhost:10000/. This request does not contain any tracing information. - Decision by
front-envoy: The request reachesfront-envoy. With tracing configured inenvoy.yaml, Envoy checks the request headers. - Generate Tracing Context:
front-envoydetects the absence of incoming tracing headers (likex-b3-traceid) and determines that this is the start of a new request. It will:- Generate a globally unique Trace ID. This ID will span the entire request chain.
- Generate a Span ID for the first operation it will perform (routing the request to
service1). - Package these IDs and other information (collectively called “tracing context”) into standard HTTP headers (such as
x-b3-traceid,x-b3-spanid,x-b3-sampled, etc.).
- Inject Tracing Headers:
front-envoyinjects these newly generated tracing headers into the request before forwarding it toservice1.
Thus, a distributed trace is born.
Step 2: Context Propagation
- Arrival at
service1: The request, now with tracing headers, arrives at theservice1container and is received by its internalenvoyprocess. - Transparent Proxying: The
envoyofservice1checks the request headers, finds the tracing context, and since its own configuration (service1-envoy-jaeger.yaml) does not have atracingblock, it does not attempt to initiate a new trace or modify the trace IDs. Its duty is to propagate. - Forward to Application: The
envoyofservice1forwards the request (along with all tracing headers) to thepythonapplication listening within the same container. - Internal Service Call: After processing the business logic, the
pythonapplication inservice1needs to callservice2. It sends the request to its localenvoy. - Propagation Continues: The
envoyofservice1receives this outbound request and continues to add the tracing headers to the request being sent toservice2. This process is completely transparent to the Python application. - Arrival at
service2: The request arrives atservice2, and its internalenvoysimilarly and transparently propagates the tracing headers to thepythonapplication.
Step 3: Data Collection and Aggregation
- Span Generation: Throughout the request-response chain, only
front-envoyactively generates tracing data. It creates Spans at several key points:- Entry Span: When it receives an external request and prepares to forward it to
service1. - Exit Span: When
service1callsservice2through it. - And it records information like duration and HTTP status code at each step, completing the corresponding Span.
- Entry Span: When it receives an external request and prepares to forward it to
- Asynchronous Reporting:
front-envoydoes not block the request to report tracing data. It places the completed Spans in a buffer and asynchronously sends them to thejaegercluster (defined bycollector_cluster: jaeger). - Jaeger Collector: The Collector component of Jaeger receives these Span data over the Zipkin-compatible protocol on port
9411. - Aggregation: The Collector groups the Spans with the same Trace ID together as belonging to the same request chain. It uses the
Span IDandParent Span IDfields of each Span to assemble them into a parent-child tree structure (Trace).
Step 4: Visualization
- Querying: When you open the Jaeger UI in your browser and query for the
front-proxyservice, the Jaeger Query component retrieves the complete Trace data from the storage. - Rendering: The Jaeger UI renders this tree-structured Trace data into an intuitive Flame Graph or Gantt chart.
- You can see the root Span representing the entry operation at
front-envoy. - Its child Span for the call to
service1. service1’s child Span for the call toservice2.- The length of each Span represents its duration, helping to easily identify performance bottlenecks in the chain.
- Clicking on each Span reveals detailed tags (like
http.method,http.status_code) attached by Envoy, and the business operation namecheckStockdefined bydecorator.operation.
- You can see the root Span representing the entry operation at
Through this complete lifecycle, Envoy and Jaeger collaborate to provide a powerful and clear distributed tracing solution with minimal intrusion to the application code.
8. Conclusion
This article, by dissecting the official Envoy Jaeger Tracing example, has 深入探讨了分布式追踪的实现原理和架构设计。我们从架构概览开始,了解了如何通过 docker-compose 启动一个包含多个容器的微服务环境,并通过 Envoy 代理实现追踪上下文的传播。接着,我们分析了各个配置文件的功能,特别是如何通过 Envoy 的路由和追踪配置来实现服务间的追踪信息传递。最后,我们详细描述了一个请求的完整生命周期,从追踪的诞生、上下文的传播,到数据的收集与聚合,直至可视化展示,全面展示了 Envoy 和 Jaeger 在分布式追踪中的强大协作能力。
这种设计不仅有助于理解分布式追踪的实现,还为深入学习如 Istio 等服务网格技术奠定了基础。通过这个示例,读者可以更好地掌握服务网格中的追踪上下文传播与数据可视化,为实际应用中的可观测性打下坚实的基础。
如果你对分布式追踪、服务网格或 Envoy 有任何疑问,欢迎在评论区留言讨论。希望本文能为你的学习和实践提供帮助!