This article provides a detailed analysis of the official Envoy example jaeger-tracing
, explaining its construction process, related files, and functionality. It aims to supplement the Envoy documentation on Jaeger tracing. The example deploys services and Envoy within the same container, with Envoy handling service discovery, tracing, and network logic, showcasing the architectural concept of service mesh. This design not only helps understand distributed tracing implementation but also lays the foundation for learning service mesh technologies like Istio.
Architecture Overview
This example uses docker-compose
to launch a microservice environment with multiple containers. Instead of the traditional Sidecar model, it adopts an entry proxy combined with internal routing proxies.
Core Architecture
An entry Envoy proxy: Named
front-envoy
, it serves as the unified entry point for all external requests and initiates tracing.Two backend services:
service1
andservice2
. These service containers run an Envoy proxy internally, which handles inter-service routing and propagates tracing context rather than generating tracing data.A Jaeger All-in-One instance: Responsible for receiving, storing, and visualizing tracing data sent by
front-envoy
.
Request and Tracing Flow
An external request first reaches front-envoy
. front-envoy
initiates tracing, generates a Span, and adds tracing headers before routing the request to service1
. The Envoy inside service1
propagates tracing headers and routes the request to service2
. Finally, all Span data is sent to Jaeger by front-envoy
.
2. File Functionality Breakdown
Let’s analyze the purpose of each file.
docker-compose.yaml
This is the “orchestrator” of the example, defining four core services: jaeger
, service1
, service2
, and front-envoy
.
front-envoy
: Usesenvoy.yaml
as its configuration and exposes port10000
.service1
&service2
: Backend services. Each mountsservice1-envoy-jaeger.yaml
andservice2-envoy-jaeger.yaml
respectively for their internal Envoy processes.jaeger
: Jaeger service for receiving and visualizing tracing data.
envoy.yaml
(front-envoy configuration)
This is the core configuration file for the entry proxy front-envoy
.
- Key Configurations:
tracing
: Enables and initiates tracing.provider
: Specifies the tracing service provider aszipkin
(compatible with Jaeger).collector_cluster
: Points to thejaeger
service for sending tracing data.route_config
: Defines routing rules, routing all incoming requests (/
) to theservice1
cluster.clusters
: Defines upstream clusters, includingservice1
andjaeger
.
service1-envoy-jaeger.yaml
(service1 internal Envoy configuration)
This configuration file is loaded by the Envoy inside the service1
container.
- Purpose: Purely for routing.
- Key Configurations:
- It does not have a
tracing
block, so it does not initiate tracing but propagates tracing headers from upstream (front-envoy
). route_config
: Routes all incoming requests to theservice2
cluster.
- It does not have a
service2-envoy-jaeger.yaml
(service2 internal Envoy configuration)
Similar to service1
, this configuration is loaded by the Envoy inside the service2
container.
- Purpose: Endpoint routing.
- Key Configurations:
- Also does not have a
tracing
block. route_config
: Routes requests toservice_log
(considered the endpoint in this example).
- Also does not have a
verify.sh
An automated script to verify if the example is working correctly.
- Execution Flow:
curl localhost:10000/
: Sends a request tofront-envoy
.- Waits a few seconds for
front-envoy
to asynchronously send tracing data to Jaeger. - Queries Jaeger API (
http://localhost:16686/api/traces?service=front-proxy
) for tracing data generated by thefront-proxy
service (defined inenvoy.yaml
) and verifies its completeness.
3. Build and Run Process
- Start all services:
docker-compose up --build -d
- Send a test request:
./verify.sh
- View tracing in Jaeger UI:
- Open Jaeger UI in your browser: http://localhost:16686
- Select
front-proxy
from the “Service” dropdown menu in the top left corner. - Click the “Find Traces” button.
- Click on a trace record to view a complete call chain with multiple Spans, clearly showing the flow from
front-envoy
toservice1
and then toservice2
.
4. Internal Processes in Containers
Below is an analysis of the core processes running inside each container.
jaeger-tracing-front-envoy-1
- Container Role: Entry Envoy proxy.
- Running Process:
envoy
: The only core process in this container. Thedocker-entrypoint
script starts the Envoy proxy, which loads theenvoy.yaml
configuration file, listens on port10000
, processes all incoming requests, and initiates and reports tracing data.
jaeger-tracing-service1-1
- Container Role: Backend application service
service1
. - Running Processes:
envoy
(background process): The container’s startup script (/usr/local/bin/start_service.sh
) first starts an Envoy proxy process in the background. This Envoy process loads theservice1-envoy-jaeger.yaml
configuration and routes incoming requests toservice2
.python
(foreground process): The script then starts a Python application server (based onaiohttp
), which listens on an internal port (e.g.,8000
) and processes requests forwarded by its internal Envoy proxy.
jaeger-tracing-service2-1
- Container Role: Backend application service
service2
. - Running Processes:
envoy
(background process): Similar toservice1
, the container’s startup script first starts an Envoy proxy in the background, loading theservice2-envoy-jaeger.yaml
configuration.python
(foreground process): The script then starts anaiohttp
Python application server as the business logic forservice2
.
jaeger-tracing-jaeger-1
- Container Role: Jaeger distributed tracing system.
- Running Process:
/go/bin/all-in-one
: This is the official Jaeger “all-in-one” executable. This single process contains all the core components of Jaeger, making it easy to deploy in development and testing environments:- Jaeger Agent: Listens for and receives Span data (e.g., via Zipkin protocol on port
9411
). - Jaeger Collector: Receives data from the Agent, validates, processes, and stores it.
- Jaeger Query: Provides an API for querying and retrieving tracing data.
- Jaeger UI: Offers a web interface (on port
16686
) for visualizing tracing data.
- Jaeger Agent: Listens for and receives Span data (e.g., via Zipkin protocol on port
5. In-depth Discussion: Why Does the Service Container Need Envoy?
This is a crucial question that touches on the essence of the “Service Mesh” architecture. In simple terms, running an Envoy process inside the service
container is to decouple the complex network communication logic from the application code.
Even though in this example, front-envoy
handles the “initiation” of tracing, the Envoy processes inside service1
and service2
still play a critical role:
1. Service Discovery
- Problem: How does the Python code in
service1
know whereservice2
is? In a dynamic container environment, IP addresses change, and hardcoding addresses is not feasible. - Solution: The Python code in
service1
is configured to send all outbound requests to its own container’s Envoy proxy (usually sent tolocalhost
). The internal Envoy proxy, based on its configuration file (service1-envoy-jaeger.yaml
), knows the logical name of theservice2
service and resolves it to the correct container address via Docker’s internal DNS. - Benefit: The application code becomes extremely simple; it doesn’t need to care about the network topology, just the logical name of the next service.
2. Trace Context Propagation
- Problem:
front-envoy
initiates tracing and generates tracing headers (e.g.,x-b3-traceid
). Ifservice1
directly callsservice2
, who ensures these tracing headers are passed correctly? - Solution: The Envoy proxy in
service1
, upon receiving the request, automatically recognizes these tracing headers and ensures they are included when forwarding the request toservice2
. - Benefit: This guarantees that the distributed tracing chain is unbroken. Developers do not need to write any logic in the Python code to extract and re-inject tracing headers. Envoy handles all of this transparently.
3. Unified Network Control Layer
- Problem: If you want to add more complex network policies for service-to-service calls, such as retries, timeouts, circuit breaking, traffic encryption (mTLS), where should this be implemented?
- Solution: All of this can be declaratively done in the Envoy configuration file. You don’t need to re-implement these complex logics in every service written in Python, Java, Go, etc.
- Benefits:
- Language-agnostic: No matter what language your service is written in, the network behavior is controlled by Envoy.
- Simplified Application: Application developers can focus on business logic rather than handling complex network failure scenarios.
- Centralized Management: Operations personnel can adjust the network policies of the entire system by modifying the Envoy configuration without changing or redeploying any application code.
Summary
In this example, the Envoy process inside the service
container, while seemingly just doing simple routing, is actually building the foundation of a micro-sized but fully functional “service mesh”. It acts as an intelligent, configurable local network proxy, decoupling the service itself (Python application) from the complex communication (network) between services.
6. Dockerfile and Envoy Configuration Deep Dive
Dockerfile Analysis (Based on Common Practices)
The Dockerfile
in this example is located in the ../shared/
directory, reflecting the idea of image reuse.
../shared/envoy/Dockerfile
(For front-envoy
)
This Dockerfile
is meant to build a pure Envoy proxy image.
# Use the official Envoy image as the base
FROM envoyproxy/envoy:v1.23-latest
# (v1.23-latest is an example version)
# Copy the docker-entrypoint.sh script into the container and grant execute permissions
COPY docker-entrypoint.sh /
RUN chmod +x /docker-entrypoint.sh
# Set the command to be executed when the container starts
ENTRYPOINT ["/docker-entrypoint.sh"]
# Default command, can be overridden by command in docker-compose.yaml
CMD ["/usr/local/bin/envoy", "-c", "/etc/envoy/envoy.yaml"]
- Core: It is based on the official
envoyproxy/envoy
image, which already contains the compiled Envoy binary. ENTRYPOINT
: Uses a customdocker-entrypoint.sh
script as the entry point, usually to perform some pre-processing tasks (like waiting for other services to be ready) before starting Envoy.
../shared/python/Dockerfile
(For service1
and service2
)
This Dockerfile
is more complex as it needs to package both the Python application and the Envoy proxy into the same image.
# Use a base image with Python environment
FROM python:3.9-slim
# Install Envoy
# (This would include steps to download and install the Envoy binary from the internet)
# For example:
# RUN apt-get update && apt-get install -y curl
# RUN curl -L https://getenvoy.io/cli | bash -s -- -b /usr/local/bin
# RUN getenvoy fetch envoy:v1.23-latest --path /usr/local/bin/envoy
# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy application code and startup script
COPY app.py /
COPY start_service.sh /
RUN chmod +x /start_service.sh
# Set the command to be executed when the container starts
CMD ["/start_service.sh"]
- Multi-stage Build: It first ensures a Python environment and then installs Envoy.
- Packaging: It copies the Python application (
app.py
), dependencies (requirements.txt
), and startup script (start_service.sh
) into the image. CMD
: The container executes thestart_service.sh
script on startup, which is responsible for starting the backgroundenvoy
process and then the foregroundpython
application process.
Key Fields in Envoy Configuration
envoy.yaml
(front-envoy)
# ...
tracing:
provider:
name: envoy.tracers.zipkin
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig
collector_cluster: jaeger
collector_endpoint: "/api/v2/spans"
shared_span_context: false
collector_endpoint_version: HTTP_JSON
# ...
clusters:
- name: service1
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service1
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service1
port_value: 8000
tracing
:provider
: Defines the tracer.envoy.tracers.zipkin
is a built-in Envoy tracer compatible with the Zipkin protocol, which Jaeger supports.collector_cluster
: Key Connection. It tells Envoy to send tracing data to the upstream cluster namedjaeger
.collector_endpoint
: Specifies the API path of the Jaeger Collector that receives the data.
clusters
:type: STRICT_DNS
: Indicates that Envoy will use standard DNS queries to discover the backend addresses in this cluster. In the Docker Compose environment, service names (likeservice1
) are automatically registered in Docker’s internal DNS.address: service1
: Envoy will DNS query the host namedservice1
, and Docker DNS will resolve it to the IP address of thejaeger-tracing-service1-1
container.
service1-envoy-jaeger.yaml
# ...
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service2
decorator:
operation: checkStock
decorator
:operation: checkStock
: This is a crucial field. It gives a business namecheckStock
to this routing operation (i.e., the call toservice2
). When you view tracing data in the Jaeger UI, the Span representing this step will be namedcheckStock
instead of a vague HTTP URL. This greatly enhances the readability of tracing data.
7. End-to-End Tracing Process Detailed Explanation: The Complete Lifecycle of a Request
To fully understand how it works, let’s follow an external request and see how the tracing information is generated, propagated, and finally displayed.
Step 1: The Birth of Tracing (Initiation)
- External Request: A client (e.g.,
curl
) sends a regular HTTP request tohttp://localhost:10000/
. This request does not contain any tracing information. - Decision by
front-envoy
: The request reachesfront-envoy
. With tracing configured inenvoy.yaml
, Envoy checks the request headers. - Generate Tracing Context:
front-envoy
detects the absence of incoming tracing headers (likex-b3-traceid
) and determines that this is the start of a new request. It will:- Generate a globally unique Trace ID. This ID will span the entire request chain.
- Generate a Span ID for the first operation it will perform (routing the request to
service1
). - Package these IDs and other information (collectively called “tracing context”) into standard HTTP headers (such as
x-b3-traceid
,x-b3-spanid
,x-b3-sampled
, etc.).
- Inject Tracing Headers:
front-envoy
injects these newly generated tracing headers into the request before forwarding it toservice1
.
Thus, a distributed trace is born.
Step 2: Context Propagation
- Arrival at
service1
: The request, now with tracing headers, arrives at theservice1
container and is received by its internalenvoy
process. - Transparent Proxying: The
envoy
ofservice1
checks the request headers, finds the tracing context, and since its own configuration (service1-envoy-jaeger.yaml
) does not have atracing
block, it does not attempt to initiate a new trace or modify the trace IDs. Its duty is to propagate. - Forward to Application: The
envoy
ofservice1
forwards the request (along with all tracing headers) to thepython
application listening within the same container. - Internal Service Call: After processing the business logic, the
python
application inservice1
needs to callservice2
. It sends the request to its localenvoy
. - Propagation Continues: The
envoy
ofservice1
receives this outbound request and continues to add the tracing headers to the request being sent toservice2
. This process is completely transparent to the Python application. - Arrival at
service2
: The request arrives atservice2
, and its internalenvoy
similarly and transparently propagates the tracing headers to thepython
application.
Step 3: Data Collection and Aggregation
- Span Generation: Throughout the request-response chain, only
front-envoy
actively generates tracing data. It creates Spans at several key points:- Entry Span: When it receives an external request and prepares to forward it to
service1
. - Exit Span: When
service1
callsservice2
through it. - And it records information like duration and HTTP status code at each step, completing the corresponding Span.
- Entry Span: When it receives an external request and prepares to forward it to
- Asynchronous Reporting:
front-envoy
does not block the request to report tracing data. It places the completed Spans in a buffer and asynchronously sends them to thejaeger
cluster (defined bycollector_cluster: jaeger
). - Jaeger Collector: The Collector component of Jaeger receives these Span data over the Zipkin-compatible protocol on port
9411
. - Aggregation: The Collector groups the Spans with the same Trace ID together as belonging to the same request chain. It uses the
Span ID
andParent Span ID
fields of each Span to assemble them into a parent-child tree structure (Trace).
Step 4: Visualization
- Querying: When you open the Jaeger UI in your browser and query for the
front-proxy
service, the Jaeger Query component retrieves the complete Trace data from the storage. - Rendering: The Jaeger UI renders this tree-structured Trace data into an intuitive Flame Graph or Gantt chart.
- You can see the root Span representing the entry operation at
front-envoy
. - Its child Span for the call to
service1
. service1
’s child Span for the call toservice2
.- The length of each Span represents its duration, helping to easily identify performance bottlenecks in the chain.
- Clicking on each Span reveals detailed tags (like
http.method
,http.status_code
) attached by Envoy, and the business operation namecheckStock
defined bydecorator.operation
.
- You can see the root Span representing the entry operation at
Through this complete lifecycle, Envoy and Jaeger collaborate to provide a powerful and clear distributed tracing solution with minimal intrusion to the application code.
8. Conclusion
This article, by dissecting the official Envoy Jaeger Tracing example, has 深入探讨了分布式追踪的实现原理和架构设计。我们从架构概览开始,了解了如何通过 docker-compose
启动一个包含多个容器的微服务环境,并通过 Envoy 代理实现追踪上下文的传播。接着,我们分析了各个配置文件的功能,特别是如何通过 Envoy 的路由和追踪配置来实现服务间的追踪信息传递。最后,我们详细描述了一个请求的完整生命周期,从追踪的诞生、上下文的传播,到数据的收集与聚合,直至可视化展示,全面展示了 Envoy 和 Jaeger 在分布式追踪中的强大协作能力。
这种设计不仅有助于理解分布式追踪的实现,还为深入学习如 Istio 等服务网格技术奠定了基础。通过这个示例,读者可以更好地掌握服务网格中的追踪上下文传播与数据可视化,为实际应用中的可观测性打下坚实的基础。
如果你对分布式追踪、服务网格或 Envoy 有任何疑问,欢迎在评论区留言讨论。希望本文能为你的学习和实践提供帮助!