Traces
Traces represent single execution flows in your application or service. They start and end at defined points in your code and record relevant details, such as timings, events, attributes, executions of sub-routines, as well as custom data. The data is saved in a hierarchical tree structure reflecting the execution flow of that code logic.
While a trace typically represents the standalone invocation of some business logic (for example, a request), it primarily serves as a container for spans and ties them together under an overall identifier (the trace ID). Spans reflect the individual steps of that business logic. For example, consider the following trace of a web request.
This is a trace of a REST call to a /cart
endpoint. Our trace starts when we receive the request from the client, follows along the entire flow of execution, and eventually completes when the response is provided back to the client.
Note the trace's distributed nature (tracing is often also referred to as distributed tracing). Traces do not stop at service boundaries but work across monolithic services and processes. Context propagation plays an important role here.
Web requests are one of the common use cases, but tracing is not limited to the web and can be employed in any other scenario where you can define a start and end point in your business logic, such as scheduled jobs.
Spans
While traces define the global scope of an execution flow, spans represent the individual tasks performed within that scope.
Revisiting our previous /cart
example, let's drill down a bit more into our trace and take a closer look at its spans.
- The top half of the picture below shows the request flow and components involved in the request.
- The bottom half of the picture shows the trace and its spans, each marking the start and finish of the separate steps and calls of our checkout process.
Note the span hierarchy. On the top, we have our root span (in black), from which the child spans drill down into individual calls.
Spans support the following metadata:
- Name—describes the operation
- Start and finish timestamp—when the operation started and when it finished
- Parent span ID—the ID of the parent span; none if the span is the root span
- Attributes—a set of key-value pairs containing additional, custom information
- Events—a set of custom-defined events, containing a name, a timestamp, and attributes.
- Links—a list of links to related spans, either within the same trace or across traces. For details, see the OpenTelemetry documentation on links between spans.
Span kinds
Every span has two characteristics that express the basic nature of the span's underlying operation and can be configured with the span kind:
- Whether the operation is synchronous or asynchronous. If synchronous, the parent operation typically waits for the child operation to complete.
- Whether a network call is involved.
To configure these two characteristics, OpenTelemetry provides the following four span kinds:
Span kind | Operation type | Network request | Description |
---|---|---|---|
CLIENT | Synchronous | Outbound | For spans that initiate a network request |
SERVER | Synchronous | Inbound | For spans that receive network requests |
PRODUCER | Asynchronous | Possibly outbound | For spans that generate data and publish it asynchronously (for example, via a message queue system) |
CONSUMER | Asynchronous | Possibly inbound | For spans that receive data asynchronously (for example, from a message queue system) |
A fifth span kind, INTERNAL
, leaves both characteristics undefined and is set by default if no span kind is provided.
For details, see the OpenTelemetry SpanKind specification.
Span status
Whether the underlying operation was executed successfully can be expressed via a span's status. For this purpose, each span has a StatusCode
field, which defaults to Unset
and can be set to one of two values (in order of priority):
Ok
—if the operation was successfulError
—if the operation encountered an error
If an error status is set, you can also optionally provide a descriptive message in the span field Description
.
Attributes and events
As mentioned earlier, spans can also contain attributes and events.
-
Attributes are a list of key-value pairs, whose value format is very similar to JSON and accepts strings, numbers, Booleans, and arrays.
When naming attributes, follow OpenTelemetry's semantic conventions and naming guidelines to ensure cross-platform compatibility.
-
Events denote particular points in time throughout your span. These can be any events for which you can define a name, a timestamp, and event-specific attributes (in addition to span attributes).
Attributes and events are directly linked to their respective spans and serve as a method of annotation in the backend. In the Dynatrace UI, they are displayed as part of the span details.
For details, see the OpenTelemetry documentation on:
For security purposes, Dynatrace does not show attribute values by default.
Please make sure all relevant attributes are part of the allowlist. For details, see Dynatrace configuration.
Context
The span context is a set of unique identifiers representing the request that each span is a part of.
- New spans are created to represent the work being done by the components and services involved in an application.
- The span context is passed throughout the trace via context propagation.
In contrast to the state of the span, the context represents the information that identifies a span within the trace and is transferred to child spans across process boundaries. A context can contain:
- A trace ID—identifies the trace itself. It is a unique 16-byte array with at least one non-zero byte. All spans within that trace carry the same Trace ID.
- A span ID—identifies the span itself. It is a unique 8-byte array with at least one non-zero byte. When the context is passed to a child span, the span ID of the first span becomes the second span's parent span ID.
- Trace flags—contain details about the trace. They are present in all traces.
- Trace state—carries vendor-specific trace identification data in key-value pairs. Multiple tracing systems can participate in the same trace.
Context propagation
Context propagation is a crucial element of tracing in OpenTelemetry and is particularly important for distributed tracing.
Distributed tracing takes place in a truly distributed manner, where tracing is not centrally managed but each involved service only knows about its own spans. Given this level of isolation, and even though the information in a single trace may originate from various services, we still need to be able to tie together all these pieces of information and maintain a coherent trace state in the analytics backend. That's the problem that context propagation solves.
With context propagation, each service passes on the current state of its trace when it calls another (instrumented) service. With this information, the next service can build on top of the existing trace and continue with its own spans and telemetry information.
On the technical side, context propagation does not provide the trace in its entirety, but focuses on the IDs of the trace and the current span. With these two pieces of information, the other side can continue the trace and use the provided span ID as the parent of its own spans.
While the underlying concept is not exclusive to the web, the W3C specification focuses on HTTP and defines the traceparent
and tracestate
headers as the principal means of context transport.
Baggage
Baggage is an interprocess communication mechanism that allows your services to exchange information across context propagation in a standardized key-value format.
Because it focuses on data exchange between the involved services, and not on the telemetry data itself, it is of less relevance for the analytics backend, so there is no specific backend support required or foreseen.
Nonetheless, baggage can be tremendously useful within its own scope. To learn more about when and how to use baggage, see the OpenTelemetry baggage documentation.