Metric events for alerting

Dynatrace Davis automatically analyzes abnormal situations within your IT infrastructure and attempts to identify all relevant impacts and root-causes. Davis relies on a wide spectrum of information sources, such as a transactional view of your services and applications, as well as all events that are raised on individual nodes within your Smartscape topology.

There are two main sources for single events in Dynatrace: events that are triggered by a series of measurements (known as metric-based events) and events that are independent of any metric (for example, process crash, deployment changes, and VM motion events).

Custom metric events are configured on the environment global level. This means that an event, once defined and raised, is shown for all Dynatrace users within your environment.

Scope of the event

The essential aspect of a metric-based event is selecting the metric that should be monitored. Many of the available metrics provided by Dynatrace are comprised of multiple dimensions. You can chose which dimensions and their values should be considered for the event. For example, you can only chose to consider user actions coming from iOS devices for your custom alert based on Action count metric.

You can further fine-tune your event by selecting monitored entities to which it applies. By default the event applies to all entities supplying the metric, some of which might not be interesting for you in scope of the event. You can filter those out by rule-based filter based on host group, management zone, name, and tag filters depending on how you organize your entities. For example, for host-based metrics you can only include hosts that have a certain tag assigned.

Metric event scope

Alerting scope preview displays up to 100 entities that deliver the selected metric and fit to all specified filters. You can nevertheless set a threshold on a larger group of entities, but preview will not be available.

We don't recommend that you define a threshold for a huge and heterogeneous collection of entities as this would result in a large number of alerts.

Topology awareness

The topologic relationship between entities and events is vital information that the Dynatrace Davis AI causation engines uses to identify the root causes of detected problems. All built-in Dynatrace metrics are topology-aware by default. This means that each measurement has a logical link to a specific Smartscape entity (for example, a host, process, service, or application). For example, a metric that is related to the CPU usage of a host will automatically raise an event on the affected host if any violations are detected. Any events that result from the raising of the initial event will be raised on the entities where the resulting metric measurements are made. Meaning, process network metrics raise events on processes while service response time metrics raise events on services.

The custom metric ingest channel allows for the ingestion of all types of metric measurements, regardless of the number of entities they relate to. In short, the following metric ingest possibilities exist:

  • No entity: Measurements not related to any entity (for example, revenue numbers measured for all retail shops per geographic region). If you define a metric event on a non-topological metric, the resulting event will be raised on the monitoring environment itself rather than on any specific Smartscape entity. Here are two example measurements that report revenue for two shops located in different cities:
business.revenue,shop=shop111,city=NewYork 234
business.revenue,shop=shop999,city=Atlanta 499
  • One entity: Measurements related to a single entity (for example, batch job executions measured on a monitored host). In this example, the measurement is related to one entity (i.e., the host). If you define a metric event on a measurement that is related to a single entity, the resulting event will be opened on that entity. Here are two example measurements that report the number of batch job runs for two hosts:
batchjob.executions,dt.entity.host=HOST-1111111,hostname=hostA,ip=53.43.23.12 23  
batchjob.executions,dt.entity.host=HOST-2222222,hostname=hostB,ip=53.43.23.12 23
  • Multiple entities: Measurements related to multiple entities (for example, the number of batch job runs measured for a process on a monitored host). In this example, the measurement is related to multiple entities, the process and the host.
    When multiple entities are specified for each measurement, Dynatrace selects the best entity on which it should raise the event. In the case of a host and a process, the measurement presumably relates to the process rather than the host, so the event is raised on the process.

Within the advanced configuration section (Go to Settings > Anomaly detection > Custom events for alerting and select Edit), you can override the default to gain further flexibility on which entity an event is to be raised. Here are two examples for two processes on two hosts reporting their batch job runs:

batchjob.executions,dt.entity.host=HOST-1,dt.entity.process=PROCESS-GROUP-INSTANCE-1,hostname=hostA,ip=53.43.23.12 23
batchjob.executions,dt.entity.host=HOST-2222222,dt.entity.process=PROCESS-GROUP-INSTANCE-2,hostname=hostB,ip=53.43.23.12 23

Event severity

The severity of an event determines if a problem should be raised or not and if Davis AI should try to find the root cause of your given event. The following table summarizes the semantics of all available event severities, which severity triggers a problem and which severities are analyzed by Davis:

Severity Problem raised? Davis analyzed? Semantic
Availability Yes Yes Reports any kind of severe component outage
Error Yes Yes Reports any kind of degradation of operational health due to errors
Slowdown Yes Yes Reports a slowdown of an IT component
Resource Yes Yes Reports a lack of resources or a resource-conflict situation
Info No Yes Reports any kind of interesting situation on a component, such as a deployment change
Custom alert Yes No Triggers an alert without causation and Davis AI involved

For more information about built-in events and their severity levels, see Event types.

Create a metric event

To create a metric event configuration

Go to Settings > Anomaly Detection > Custom events for alerting and click Create custom event for alerting.

Configure the scope of your event.

  1. Choose the metric for your metric event. You can select category of the metric or just start typing metric name to search among all categories.
  2. Select the aggregation of the metric that will be compared for event evaluation.
  3. Optional Select the dimensions to be considered by the event.
  4. Optional Set rule-based entity filters.

Define the monitoring strategy. You have two options:

For either option you must define the sliding window for comparison. It defines how often the threshold (automatically calculated or manually specified) must be violated within a sliding window of time to raise an event (violations don't have to be successive). It helps you to avoid too aggressive alerting on single violations. You can set the sliding window of up to 60 minutes. Additionally you must specify:

  • Auto-adaptive baseline—how much the signal fluctuation must be added to the baseline.
  • Static thresholds—the threshold. Dynatrace suggests the threshold based on the previous data.

There's a limit on metric event configurations based on an adaptive baseline—100 per environment.

There's no limit for static threshold configurations.

The preview shows the potential alerts for your input for 12 hours, 1 day, and 7 days timeframes. Here you can evaluate how effective your configuration is. Preview is not available if the scope of your event is more than 100 entities.

Give a title to your event. Make it short, easy-readable string that describes the situation, for example High network activity or CPU saturation. Event titles are used throughout the Dynatrace UI as well as within problem alerting.

Create an event message.

Event message helps you to understand what the event is about. To make it easier to grasp it at first sight you should provide a meaningful message for your event. You can use the following placeholders: {metricname}, {severity}, {alert_condition}, and {baseline} or {threshold}.

Click Create custom event for alerting to save your new event.

Metric events API

All the functionality that was shown within the above sections is also fully available through the Anomaly detection API - Metric events.

As usual, the Configuration API offers methods for listing all the defined configurations, updating individual configurations, creating new configurations, and deleting existing configurations.