Distributed tracing powered by PurePath® technology
Distributed tracing is a method of observing requests as they propagate across distributed systems and micro-services, generating high-quality data about those requests, and make them available for analysis. It does this by tagging an interaction with a unique identifier and collecting data on every interaction with every service the request touches. Distributed tracing is essential for monitoring, debugging, and optimizing distributed software architecture, especially in dynamic microservices architectures as it helps teams understand more quickly how each microservice is performing.
Dynatrace has been pioneering distributed tracing since 2006 with the patented PurePath® technology. PurePath® technology combines distributed tracing information with additional insights like user experience information, logs, metrics, topology information, metadata, and even code-level profiling information to provide the highest level of data fidelity and granularity. In its latest increment, PurePath® technology was opened to integrate OpenTelemetry tracing signals to traces that are fully automatically captured via OneAgent.
How does distributed tracing work?
Each activity—called a segment or span—triggered by a request is recorded as it moves both through and across services. The information that is collected includes a name, start and end timestamps, and other attributes. When one activity—a "parent" span—is completed, the next activity passes to its "child" span. The distributed trace places these spans in their correct order.
Applications need to be instrumented to produce trace data and propagate a unique identifier for a specific request. This can be done automatically without the need for any code changes (for example, via the OneAgent or existing OpenTelemetry instrumentation libraries) or manually (for example, via OpenTelemetry) by using an instrumentation SDK.
How is the trace data structured?
Each trace contains semantically different elements that comprise a single trace and makes it possible to interpret and understand the collected data.
A span represents a single operation and is the main building block of a trace. Spans contains an operation name, start and end timestamp, a list of attributes as key value pairs, and the parent's span identifier. Spans are typed with a span kind that describe useful information such as if a span represents a serverside operation, a remote call to an external system, or an interaction with a messaging system. A span without a parent span is called a "root span" and indicates the start of a trace.
The span context is needed to put all spans and events in context with each other. The span context allows a child span to relate to the trace and its parent span. Therefore, the context needs to be propagated within a service (across different threads) but also across services and process boundaries. This typically happens via HTTP headers (like the W3C trace context) or via unique IDs in messaging systems.
W3C trace contextconsists of two HTTP headers: traceparent and tracestate are supported for HTTP and gRPC requests
x-dynatraceheader for HTTP requests
dtdTraceTagInfocustom property for Java-based messaging services and messages produced by IIB/ACE placed in the
- A unique key for message queues (based on message properties)
A trace traverses at least one service and, in a modern microservice environment, typically multiple services. On horizontally scaled services, each span is processed on specific Service Instances. Services are determined and named based on available resource attributes like
service.name or properties that are collected along with the spans.
A service call aggregates key information such as error codes, topology information, and request- and span attributes that are related with a specific trace and service combination. This key information is available for service analyses.
The available service information on the spans is used to put a trace into the correct topological context. This makes it possible to understand topological relations such as the datacenters, hosts, Kubernetes namespaces, pods, or process related to a specific span. This information is an important input for the Dynatrace Davis® AI. In the example above, this information allows Davis AI to understand that the CartService and the Checkout service are calling the same database.
Attributes are key/value pairs that provide details about a span, service (call), or resources such as response codes, HTTP methods, and URLs. Attributes allow you to group, query, find, and analyze your traces and spans.
- OneAgent automatically collects a number of attributes such as HTTP method, URL, response codes, topology data, and details about the underlying technologies. Additionally, you can add custom request attributes that OneAgent will capture and associate with a span without any local configuration or code changes.
- OpenTelemetry defines standard attributes that should be present on spans. Additionally, user defined attributes can be added to allow you to understand and analyze your system.
Logs are log data that is automatically put into context and shown in the trace perspective when it is contextualized with the related trace and span information.
- OneAgent can automatically contextualize log entries that are produced by prominent log frameworks.
- In OpenTelemetry, several existing instrumentation libraries can automatically contextualize log entries. Additionally, it can be used to manually contextualize log entries if there is no automated way available.
Code-level profiling information
OneAgent is able to continuously capture profiling information for spans and for the whole service. This enables you to understand how the duration of a span is affected by CPU time, network time, or just waiting for other threads, and understand which code was executed in the context of a span. This data is available in the single trace view under Code level and can be analyzed in an aggregated way via the Method hotspot analysis.