OpenTelemetry is an observability framework for cloud-native software used for instrumenting components and for creating and managing telemetry data (traces, metrics, and logs).
OpenTelemetry provides vendor-agnostic instrumentation libraries, end-to-end implementations, open-standard semantic conventions, a way to connect the collected data (context propagation), and more. (However, OpenTelemetry does not support backend capabilities.)
Nowadays, huge distributed and polyglot architectures are everywhere. They tend to be complex, and technologies can be challenging to maintain. Still, availability and performance issues need to be resolved quickly. End-to-end visibility across application boundaries, even including components you do not own (for example, cloud services and third parties), is crucial. This is where observability becomes important.
Observability is the ability to understand what's happening inside a system based on knowledge of the external data it produces, such as logs, metrics, and traces.
Observability can be aided through telemetry data, which can be provided through Dynatrace, as well as through open-source projects like OpenTelemetry. OpenTelemetry is a Cloud Native Computing Foundation (CNCF) sandbox project with the goal of providing a unified set of vendor-agnostic libraries/APIs, SDKs, and other tools. One of its key contributors is Dynatrace.
Using OpenTelemetry, IT teams can instrument their applications and generate, collect, and export telemetry data to analyze and understand software performance and behavior.
Just as Kubernetes has become the de facto standard for container orchestration, OpenTelemetry is now the de facto standard for adding observability to cloud-native applications. This means that companies do not need to spend valuable time developing a mechanism for collecting application telemetry, and can instead focus on their primary products.
Three pillars of observability
Application's recording of an event, including timestamps and other data about the nature of the event.
Application's recording of a data point. A metric event consists of the measurement, the time it was captured, and associated metadata (labels).
The progression of a single request as it is handled by various services throughout an application.
A trace represents a single request as it is being handled by numerous services within an application.
Distributed tracing is a method of observing requests as they propagate across distributed systems and micro-services, generating high-quality data about those requests and making them available for analysis. It shows the trace as it follows the flow of execution and can help teams understand how each involved service is performing.
Spans are created by the tracer and represent a single operation within the trace.
- New spans are created to represent operations on each individual service and component involved.
- The span itself contains context (metadata), which is a set of unique identifiers representing the request that each span is a part of.
- The span context is passed throughout the trace via context propagation.
OpenTelemetry metrics allow recording of either raw measurements or measurements with predefined aggregation and attributes.
- Raw measurements allow you to decide which aggregation algorithm to use for the recorded metrics.
- Predefined aggregation and attributes allow you to collect values like CPU and memory usage, as well as simple things such as queue length.
OpenTelemetry logs allow for the representation of logs from numerous sources, including application log files and system logs.
Since most programming languages have built-in logging capabilities or use logging libraries, logs have the biggest legacy of all observability signals. The legacy load means OpenTelemetry has to manage competing goals:
- Support existing logs and logging libraries
- Provide improvements and integration
The OpenTelemetry architecture revolves around a few key components, some of which can be implemented flexibly. The main components include:
- Cross-language specifications
- The API is used to instrument code to generate telemetry data.
- The SDK is an implementation of the API; it is used to configure OpenTelemetry for specific environments.
- The data defines the OpenTelemetry Protocol (OTLP) and semantic conventions.
- Tools to collect, transform, and export telemetry data (for example, collector)
- A collector is a proxy that can receive, process, and export telemetry data, as per the official OpenTelemetry documentation. It can receive data in various formats and send data to one or more backends. A collector's components (receiver, processor, and exporter) are configured in the service/pipelines section.
- Automatic instrumentation and contrib packages.
- Per-language SDKs (which can replace vendor-specific SDKs)
- These SDKs allow you to generate telemetry data with the language of your choice and export that data to a preferred backend. They also enable automatic instrumentation.
OpenTelemetry is a merger of two industry-recognized projects: OpenTracing and OpenCensus.
Factors that drove this merger:
- With two competing solutions on the market, it would be difficult for companies to achieve end-to-end visibility in components that they do not own.
- Data portability and maintenance would also suffer.
OpenTelemetry solves these issues by being the sole specification that everybody agrees on.
Standardization is also achieved by implementing the W3C Trace Context as the default trace-propagation mechanism. Adding specific HTTP headers to the transaction enables tracing across all tiers of your application, providing vendor-agnostic data.
The W3C specifies the headers to use to propagate trace context and therefore reduce problems with lost context. Dynatrace has made significant contributions to the W3C project.