One key feature of Dynatrace is its ability to continuously monitor every aspect of your applications, services, and infrastructure and to automatically learn all the baseline metrics related to the performance of these components. Dynatrace automatically learns the baseline response times of your applications and services, factoring in variables such as geo-location, browser type, operating system, connection bandwidth, and user actions.
Such multidimensional reference values—for example, all users from New York viewing your application with a Chrome browser on Windows—are collected for all statistically meaningful combinations of these factors. Such intelligent and automatic baselining allows Dynatrace to detect anomalies at a highly granular level and to notify you of problems in real-time. Typical application and service-level anomalies reported by Dynatrace include failure rate increases, response time degradations, and spikes or drops in application traffic. On top of this automatic learning of reference values, Dynatrace allows you to define specific thresholds that specify at what levels deviations above baseline performance are severe enough to generate problem alerts. Keep in mind that these threshold settings only adjust the levels at which Dynatrace alerts you to detected anomalies. These settings don’t affect automatic performance baselining.
There are some use cases for which parameterization of automatic baselining algorithms may be beneficial:
- Set higher thresholds for applications and services that are still in development or are in the testing stage.
- Set lower thresholds for mission-critical services within your infrastructure (where default thresholds may be too tolerant).
Dynatrace distinguishes between an absolute threshold and a relative threshold for the median and the slowest 10 percent of each given metric. As you can see in the example below, the median thresholds for response time degradation are set to 100 ms (absolute) and 50% (relative) above the auto-learned baseline.
The threshold for the slowest 10% of the requests is set to 1,000 ms (absolute) and 10% (relative) above the auto-learned baseline. Dynatrace anomaly detection threshold settings also allow you to specify how many actions per minute should be observed before Dynatrace sends out problem alerts related to anomalies. This setting allows you to disable alerting for low traffic applications and services—baselining and alerting on low traffic applications often leads to unnecessary alerts.
Dynatrace offers anomaly detection thresholds for three types of anomalies: action duration degradation, traffic spikes/drops, and failure rate increases in failure rates, as shown below:
As an alternative to defining thresholds globally across your entire environment, you can disable global settings and instead fine tune threshold settings for individual applications and services using the application- and service-specific settings pages (see the Application setup settings below). To do this, set the Use global anomaly detection settings switch to the Off position and set your custom settings. You can reverse this action anytime to return to globally defined thresholds.
Custom service thresholds work the same way, except that Dynatrace doesn’t alert you to traffic spikes/drops for services.