Metric events for alerting

Dynatrace Davis automatically analyzes abnormal situations within your IT infrastructure and attempts to identify any relevant impact and root cause. Davis relies on a wide spectrum of information sources, such as a transactional view of your services and applications, as well as all on events raised on individual nodes within your Smartscape topology.

There are two main sources for single events in Dynatrace:

  • Metric-based events (events that are triggered by a series of measurements)
  • Events that are independent of any metric (for example, process crashes, deployment changes, and VM motion events)

Custom metric events are configured in the global settings of your environment and are visible to all Dynatrace users in your environment.

Scope of the event

Customize the metric

Many of the available metrics provided by Dynatrace are composed of multiple dimensions. You can select what dimensions and values should be considered for the event. For example, you can select only user actions coming from iOS devices for your custom alert, based on the Action count metric.

Select monitored entities

You can further fine-tune an event by selecting monitored entities to which it applies. By default, an event applies to all entities providing the respective metric. Using a rule-based filter, you can organize the entities by host group, management zone, name, and tag. For example, for host-based metrics you can include only those hosts that have a certain tag assigned. The actual set of available criteria depends on the metric.

Metric event scope

Alerting scope preview can display up to 100 entities that deliver the selected metric and correspond to all specified filters.

Note: If you set a threshold on more than 100 entities, preview won't be available and it would result in a considerable number of alerts.

Topology awareness

Topology awareness and context is one of the key themes of the Dynatrace observability platform. Data, such as metrics, traces, events, and logs is not simply reported and stored within the platform, it includes references to the topology where the data originated. For example, with process metrics, each measurement comes with a reference to the associated hosts and processes. The Davis AI uses this topological information to automatically perform root-cause detection and impact analysis for detected anomalies. The same applies to all custom metric events that are configured in a monitoring environment.

When an anomaly detection configuration raises an event, Dynatrace automatically identifies the most relevant entity to map the event to. If multiple entity references are detected, the most relevant one is automatically selected. For example, if a metric with a reference to both a host and a process leads to an event, the event is raised on the process

Topology awareness for ingest metrics

Metric ingestion enables you to submit all types of metric measurements, regardless of the number of entities they relate to. The following scenarios exist:

If you define a metric event on a non-topological metric, the resulting event will be raised on the monitoring environment itself, and not on a specific Smartscape entity.

Example: revenue numbers measured for all retail shops per geographic region

business.revenue,shop=shop111,city=NewYork 234
business.revenue,shop=shop999,city=Atlanta 499

If you define a metric event on a measurement that is related to a single entity, the resulting event will be opened on that entity.

Example: batch job executions measured on a monitored host, where the measurement is related to the host.

batchjob.executions,,hostname=hostA,ip= 23  
batchjob.executions,,hostname=hostB,ip= 23

When multiple entities are specified for each measurement, Dynatrace selects the most appropriate entity on which it should raise the event. In the case of a host and a process, the measurement presumably relates to the process rather than the host, so the event is raised on the process.

Example: the number of batch job runs measured for a process on a monitored host, where the measurement is related to both the process and the host.

batchjob.executions,,dt.entity.process=PROCESS-GROUP-INSTANCE-1,hostname=hostA,ip= 23
batchjob.executions,,dt.entity.process=PROCESS-GROUP-INSTANCE-2,hostname=hostB,ip= 23

Event severity

The severity of an event determines if a problem should be raised or not, and if Davis AI should determine the root cause of the given event.

Types of severities

The following types of severities are available.

Severity Problem raised Davis analysis Semantic
Availability Yes Yes Reports any kind of severe component outage
Error Yes Yes Reports any kind of degradation of operational health due to errors
Slowdown Yes Yes Reports a slowdown of an IT component
Resource Yes Yes Reports a lack of resources or a resource-conflict situation
Info No Yes Reports any kind of interesting situation on a component, such as a deployment change
Custom alert Yes No Triggers an alert without causation and Davis AI involved

For more information about built-in events and their severity levels, see Event types.

Create a metric event

To create a metric event configuration

  1. In the Dynatrace menu, go to Settings > Anomaly Detection > Custom events for alerting and select Create custom event for alerting.
  2. Switch to the Build tab.
  3. Select the metric for your metric event. You can select the metric by the category it belongs to or by the exact metric name.
  4. Select a type of aggregation for the metric (where applicable).
  5. optional Select the dimensions to be considered by the event.
  6. optional Add rule-based entity filters.
  7. Define the monitoring strategy.
    1. Choose the strategy:
      • Auto-adaptive baseline—Dynatrace calculates the threshold automatically and adapts it dynamically to your metric's behavior.
      • Static threshold—threshold that doesn't change through time.
    2. Specify a sliding window for comparison. The sliding window defines how often the threshold (whether automatically calculated or manually specified) must be violated within a sliding window of time to raise an event (violations don't have to be successive). It helps you to avoid overly aggressive alerting on single violations. You can set a sliding window of up to 60 minutes.
    3. Depending on the selected strategy, specify:
      • Auto-adaptive baseline—how many times the signal fluctuation is added to the baseline.
      • Static threshold—the threshold value. Dynatrace suggests a value based on the previous data.
    4. Choose the missing data alert behavior. If missing data alert is enabled, it is combined with the baseline/threshold condition by the OR logic.
  8. Select the timeframe of the preview. You can receive alerts for 12 hours, one day, or seven days, and evaluate how effective your configuration is.
  9. Select a title for your event. The title should be a short, easy-to-read string describing the situation, such as High network activity or CPU saturation.
  10. In the Event description section, create a meaningful event message. Event messages help you understand the nature of the event. You can use the following placeholders: {metricname}, {severity}, {alert_condition}, {missing_data_samples}, and {baseline} or {threshold}.
  11. Select Create custom event for alerting to save your new event.

Metric events API

The same metric events functionality is available through the Anomaly detection—metric events API. Using the API, you can list, update, create, and delete configurations.