OpenCensus, OpenTracing, and OpenTelemetry. These are just a few of the open-source technologies you may encounter as you research observability solutions for managing complex multicloud IT environments and the services that run on them. In fact, these technologies have become so prevalent that anybody who may not know the full scope of the topic may be afraid to ask. If you happen to fall into this category, fear not, this post has you covered!
Of these open-source observability tools, one stands out. Let’s take a closer look and clear up any confusion about what it is, the telemetry data it covers, and the benefits it can provide.
What is OpenTelemetry?
OpenTelemetry (also referred to as OTel) is an open-source observability framework made up of a collection of tools, APIs, and SDKs. Otel enables IT teams to instrument, generate, collect, and export telemetry data for analysis and to understand software performance and behavior.
To appreciate what OTel does, it helps to understand observability. Loosely defined, observability is the ability to understand what’s happening inside a system from the knowledge of the external data it produces, which are usually logs, metrics, and traces.
Having a common format for how observability data is collected and sent is where OpenTelemetry comes into play. As a Cloud Native Computing Foundation (CNCF) incubating project, OTel aims to provide unified sets of vendor-agnostic libraries and APIs — mainly for collecting data and transferring it somewhere. Since the project’s start, many vendors, including Dynatrace, have come on board to help make rich data collection easier and more consumable.
To understand why observability and OTel’s approach to it are so important, let’s take a deeper look at telemetry data itself, and how it can help organizations transform how they do business.
What is telemetry data?
Capturing data is critical to understanding how your applications and infrastructure are performing at any given time. This information is gathered from remote, often inaccessible points within your ecosystem and processed by some sort of tool or equipment. Monitoring begins here. The data is incredibly plentiful and difficult to store over long periods due to capacity limitations — a reason why private and public cloud storage services have been a boon to DevOps teams.
Logs, metrics, and traces make up the bulk of all telemetry data.
Logs are important because you’ll naturally want an event-based record of any notable anomalies across the system. Structured, unstructured, or in plain text, these readable files can tell you the results of any transaction involving an endpoint within your multicloud environment. However, not all logs are inherently reviewable — a problem that’s given rise to external log analysis tools.
Metrics are numerical data points represented as counts or measures that are often calculated or aggregated over a period of time. Metrics originate from several sources including infrastructure, hosts, and third-party sources. While logs aren’t always accessible, most metrics tend to be reachable via query. Timestamps, values, and even event names can preemptively uncover a growing problem that needs remediation.
Traces are the act of following a process (for example, an API request or other system activity) from start to finish, showing how services connect. Keeping watch over this pathway is critical to understanding how your ecosystem works, if it’s working effectively, and if any troubleshooting is necessary. Span data is a hallmark of tracing — which includes information such as unique identifiers, operation names, timestamps, logs, events, and indexes.
How does OpenTelemetry work?
OTel is a specialized protocol for collecting telemetry data and exporting it to a target system. Since the CNCF project itself is open source, the end goal is making data collection more system-agnostic than it currently is. But how is that data generated?
The data life cycle has multiple steps from start to finish. Here are the steps the solution takes, and the data it generates along the way:
- Instruments your code with APIs, telling system components what metrics to gather and how to gather them
- Pools the data using SDKs, and transports it for processing and exporting
- Breaks down the data, samples it, filters it to reduce noise or errors, and enriches it using multi-source contextualization
- Converts and exports the data
- Conducts more filtering in time-based batches, then moves the data onward to a predetermined backend
Ingestion is critical to gathering the data we care most about. There are two principal ways to go about this:
- Local ingestion. This occurs once data is safely stored within a local cache. This is common in on-premises or hybrid deployments, where time series data and tags are transmitted to the cloud. Cloud databases excel at storing large volumes of information for later reference, and this data often has business value or privacy restrictions.
- Span ingestion. We can also ingest trace data in span format. Depending on the vendor, this data may be ingested either directly or indirectly. Spans are typically indexed and consist of both root spans and child spans. This data is valuable because it contains key metadata, event information, and more.
These methods are pivotal to the entire pipeline, as the process cannot work without tapping into this information.
Benefits of OpenTelemetry
Collecting application data is nothing new. However, the collection mechanism and format are rarely consistent from one application to another. This inconsistency can be a nightmare for developers and SREs who are just trying to understand the health of an application.
OTel provides a de facto standard for adding observable instrumentation to cloud-native applications. This means companies don’t need to spend valuable time developing a mechanism for collecting critical application data and can spend more time delivering new features instead.
It’s akin to how Kubernetes became the standard for container orchestration. This broad adoption has made it easier for organizations to implement container deployments since they don’t need to build their own enterprise-grade orchestration platform. Using Kubernetes as the analog for what it can become, it’s easy to see the benefits it can provide to the entire industry.
What happened to OpenTracing and OpenCensus?
OpenTracing became a CNCF project back in 2016, with the goal of providing a vendor-agnostic specification for distributed tracing, offering developers the ability to trace a request from start to finish by instrumenting their code. Then, Google made the OpenCensus project open source in 2018. This was based on Google’s Census library that was used internally for gathering traces and metrics from their distributed systems. Like the OpenTracing project, the goal of OpenCensus was to give developers a vendor-agnostic library for collecting traces and metrics.
This led to two competing tracing frameworks, which led to the informal reference “the Tracing Wars.” Usually, competition is a good thing for end-users since it breeds innovation. However, in the open-source specification world, competition can lead to poor adoption, contribution, and support.
Going back to the Kubernetes example, imagine how much more disjointed and slow-moving container adoption would be if everybody was using a different orchestration solution. To avoid this, it was announced at KubeCon 2019 in Barcelona that the OpenTracing and OpenCensus projects would converge into one project called OpenTelemetry and join the CNCF.
The first beta version was then released in March 2020, and it continues to be the second most active CNCF project after Kubernetes.
OTel consists of a few different components as depicted in the following figure. Let’s take a high-level look at each one from left to right:
These are core components and language-specific (such as Java, Python, .Net, and so on). APIs provide the basic “plumbing” for your application.
This is also a language-specific component and is the middleman that provides the bridge between the APIs and the exporter. The SDK allows for additional configuration, such as request filtering and transaction sampling.
This allows you to configure which backend(s) you want it sent to. The exporter decouples the instrumentation from the backend configuration. This makes it easy to switch backends without the pain of re-instrumenting your code.
The collector receives, processes, and exports telemetry data. While not technically required, it is an extremely useful component to the OpenTelemetry architecture because it allows greater flexibility for receiving and sending the application telemetry to the backend(s).
The collector has two deployment models:
- An agent that resides on the same host as the application (for example, binary, DaemonSet, sidecar, and so on)
- A standalone process completely separate from the application
Since the collector is just a specification for collecting and sending telemetry, it still requires a backend to receive and store the data.
Landscape of OpenTelemetry contributors
While OTel has several smaller users and individual contributors, large companies are really moving the development needle by investing time, reviews, comments, and commits. In just one month, there were over 11,000 total contributions to the project.
Dynatrace, Splunk, and Microsoft are all top-10 contributors. Overall, more than 100 companies and vendors contribute regularly — or have contributed — to the CNCF’s brainchild.
The community behind it is both diverse and strong. Platforms, such as GitHub, Slack, and Twitter, have dedicated communities or workspaces. Stack Overflow also remains a great place for answers on the project. And those seeking firsthand data can even consult the CNCF DevStats dashboard for more information.
What are the future plans for OpenTelemetry?
The project released v1.0 in February 2021 and currently only supports traces and metrics, with logs still in the initial planning stages. The plan for the immediate future is to continue to extend coverage in addition to ensuring a smooth transition from OpenTracing and OpenCensus.
Dynatrace and OpenTelemetry together can deliver more value
As a key contributor to the OpenTelemetry project, Dynatrace is committed to making observability seamless for technical teams.
Data plus context are key to supercharging observability. Dynatrace is the only observability solution that combines high-fidelity distributed tracing, code-level visibility, and advanced diagnostics across cloud-native architectures. By integrating OTel data seamlessly into PurePath—Dynatrace’s distributed tracing technology—the Dynatrace OneAgent automatically picks up OTel data, and provides the instrumentation for all the important frameworks beyond the scope of OTel.
Start a Dynatrace free trial!