Advanced observability

Get actionable answers powered by explainable AI and automation

Download the eBook Try for free

Platform wheel
Platform wheel

What's inside


Observability, when combined with AI and automation, holds the promise to deliver the actionable answers needed to ensure cloud-native applications work perfectly and deliver the best experience and value possible to their users.

Chapter 1

From observability to getting answers


The concept of observability is gaining rapid momentum as companies accelerate their digital transformation by building out massive cloud environments that are inherently hard to observe and operate, due to their dynamism and complexity.

At Dynatrace, we went through our own digital transformation in 2015 and reinvented ourselves as an agile, cloud-native company. We rebuilt our product from the ground up to meet the observability, automation, and intelligence requirements of some of the most advanced enterprise cloud environments, we saw on the horizon.

From observability to getting answers

We recognized that while observability is important, it’s not enough to just “observe” data– it’s important to use data to deliver better business outcomes. As microservice environments become highly dynamic and grow to thousands of hosts, the real challenge becomes making sense of data and deriving answers toperformance problems in real time. This can be a daunting task that quickly surpasses the capacity of human operators.

That’s why Dynatrace developed a radically different Software Intelligence Platform, expanding traditional observability with automated, AI-powered answers that scales across hundreds of thousands of hosts. This platform is used today by many of the word’s largest enterprises.

*In software, observability refers to the extent that the internal status and performance of a system can be inferred from its externally available data.

Chapter 2

Modern cloud environments need a different approach to observability


Conventional application performance monitoring (APM) emerged when software was mostly monolithic and update cycles were measured in years, not days. Manual instrumentation and performance baselining, though cumbersome, were once adequate—particularly since fault patterns were generally known and well understood.

As monoliths get replaced by cloud-native applications, that are rapidly growing in size, traditional monitoring approaches are no longer enough. Rather than instrumenting for a predefined set of problems, enterprises need complete visibility into every single component of these dynamically scaling microservice environments. This includes multi-cloud infrastructures, container orchestration systems like Kubernetes, service meshes, functions-as-a-service and polyglot container payloads.

Such applications are more complex and unpredictable than ever. System health problems are rarely well understood from the outset and IT teams spend a significant amount of time manually solving problems and putting out fires after the fact. The challenge with modern cloud environments is to address the unknown unknowns—the kind of unique glitches that have never occurred in the past. These are the growing pains that the concept of observability attempts to tackle.

Modern cloud environments need a different approach to observability

Chapter 3

Extend traditional observability with actionable answers


Observability addresses the challenges of cloud-native applications by proposing a better way of collecting data from all system components to gain complete and seamless visibility. Most conventional tools focus on collecting three principal data types—metrics, logs, and traces—the so-called three pillars of observability.

Dynatrace has pioneered and expanded the collection of observability data in highly dynamic cloud environments with the OneAgent. In addition to metrics, logs and traces, we are also collecting user experience data for full, end-to-end visibility.

Most importantly, Dynatrace delivers answers, not just more data, through three distinct capabilities:

  • Circle maze icon

Automatic discovery and instrumentation
Ensure scalability and complete coverage in highly dynamic environments without manual configuration.

  • Squares connected icon

Topology information
Understand the interdependencies between different entities and the data being observed, in context and across the full stack.

  • Root cause icon

Causation-based AI engine
Provide actionable answers to performance problems through a precise root-cause analysis.

Chapter 4

Automation provides scalability and completeness


Most observability approaches require developers to manually instrument their code. In environments with tens of thousands of hosts and microservices that dynamically scale across global, multi-cloud infrastructure, this becomes a futile effort.

The Dynatrace platform automates data collection and analysis for enterprise-grade scalability and complete observability.

  • Circle maze icon

Auto-discovery
Upon installation, the Dynatrace OneAgent automatically and instantly detects all applications, containers, services, processes, and infrastructure at start-up time.

  • Squares connected icon

Auto-instrumentation
System components are instrumented automatically with zero configuration or code change. Collection of high-fidelity data such as metrics, logs, traces, and user experience, in addition to topology data, start as soon as a system component becomes available.

  • Root cause icon

Auto-baselining
Dynatrace’s smart baselining automatically learns “normal” performance and adapts dynamically as the environment changes.

  • Root cause icon

Auto-updates
For enterprise-grade maintainability, the OneAgent automatically and securely updates throughout the entire environment.

Chapter 5

Real-time topology mapping provides context across the full stack


Metrics, logs, and traces are frequently stored without meaningful context that ties them together. With such data silos, a holistic system health assessment is impossible. For example, you might get an alert for an increased failure rate of service A and another alert because process B has an increase in CPU usage. But it’s impossible to tell if these two alerts are related and how end users are impacted by them.

To avoid such data silos, Dynatrace automatically detects and collects a rich set of context metadata to create a real-time topology map called Smartscape. It captures the relationships and dependencies for all system components, both vertically up and down the stack and horizontally between services, processes, and hosts. Within large enterprise systems, there are billions of ever-changing dependencies, and Smartscape keeps track of them all.

The topology map enables Dynatrace to understand the actual connection between all captured metrics, traces, logs, and user experience data. Other than mere time-based correlation topology mapping reveals the actual causal dependencies between captured data. This is the basis for Dynatrace’s radically different AI engine, Davis.

Real-time topology mapping provides context across the full stack

Chapter 6

Causation-based AI delivers precise answers


Traditional observability solutions offer little information beyond dashboard visualizations. At the end, it remains to human experts to analyze the data in time-consuming war rooms. Despite all efforts, too many user complaints stay unresolved. Dynatrace is the only software intelligence platform that reliably takes that burden off human operators. Davis, the Dynatrace causation-based AI engine, automates anomaly root-cause analysis and is custom built for highly dynamic microservice environments.

Causation-based AI delivers precise answers

  • Built at the core of the Dynatrace platform Davis processes all observability data across the full technology stack, independent of origin.
  • Precise technical root-cause analysis. Davis pinpoints malfunctioning components by probing billions of dependencies in milliseconds.
  • Identification of bad deployments. Davis knows exactly what deployment or config change has introduced the anomaly in the first place.
  • Discovery of unknown unknowns. Davis does not rely on predefined anomaly thresholds but automatically detects any unusual “change points” in the data.
  • Automatic hypothesis testing by systematically working through the complete fault tree.
  • No repetitive model learning or guessing. Unlike machine learning approaches, Davis’ causation-based AI relies on a topology map, which is updated in real-time.

Chapter 7

Looking ahead: OpenTelemetry for better coverage


The OpenTelemetry open-source project is spearheaded by the Cloud Native Computing Foundation (CNCF), with the aim of making software more observable and to establish telemetry as a built-in feature of cloud-native software. OpenTelemetry focuses on improving the collection of observability data, specifically metrics, and distributed traces for some of the emerging and increasingly adopted cloud frameworks.

This initiative is broadly supported by the open source community, as well as leading contributors including Dynatrace, Google, and Microsoft. Dynatrace is actively contributing and sharing its expertise with auto-instrumentation, interoperability, and enterprise grade solutions. Once OpenTelemetry is more widely adopted as a standard, it will serve as an additional data source that further extends the breadth of Dynatrace’s technology coverage.

Looking ahead: OpenTelemetry for better coverage

The Dynatrace platform will help enterprises leverage OpenTelemetry by providing the highest possible scalability through automation, full-stack topology mapping, and most importantly, causation-based analytics through our AI engine, Davis, to deliver answers, not just more data.

Start your free trial now


Get ready to be amazed in 5 minutes or less

Dynatrace dashboard

Observability resources