Metric events for alerting

Dynatrace Davis automatically analyzes abnormal situations within your IT infrastructure and attempts to identify any relevant impact and root cause. Davis relies on a wide spectrum of information sources, such as a transactional view of your services and applications, as well as all on events raised on individual nodes within your Smartscape topology.

There are two main sources for single events in Dynatrace:

  • Metric-based events (events that are triggered by a series of measurements)
  • Events that are independent of any metric (for example, process crashes, deployment changes, and VM motion events)

Custom metric events are configured in the global settings of your environment and are visible to all Dynatrace users in your environment.

Scope of the event

Customize the metric

Many of the available metrics provided by Dynatrace are composed of multiple dimensions. You can select what dimensions and values should be considered for the event. For example, you can select only user actions coming from iOS devices for your custom alert, based on the Action count metric.

Select monitored entities

You can further fine-tune an event by selecting monitored entities to which it applies. By default, an event applies to all entities providing the respective metric. Using a rule-based filter, you can organize the entities by host group, management zone, name, and tag. For example, for host-based metrics you can include only those hosts that have a certain tag assigned.

Metric event scope

Alerting scope preview can display up to 100 entities that deliver the selected metric and correspond to all specified filters.

Note: If you set a threshold on more than 100 entities, preview won't be available and it would result in a considerable number of alerts.

Topology awareness

The topological relationship between entities and events is vital information that the Dynatrace Davis AI causation engines use to identify the root causes of detected problems. All built-in Dynatrace metrics are topology-aware by default. This means that each measurement has a logical link to a specific Smartscape entity (host, process, service, or application). For example, a metric that is related to the CPU usage of a host automatically raises an event on the affected host if any violations are detected. Process network metrics raise events on processes, while service response time metrics raise events on services.

The custom metric ingest channel allows for the ingestion of all types of metric measurements, regardless of the number of entities they relate to.

Metric ingest possibilities

  • Measurements aren't related to any entity

Example: revenue numbers measured for all retail shops per geographic region.

business.revenue,shop=shop111,city=NewYork 234
business.revenue,shop=shop999,city=Atlanta 499

Explanation: if you define a metric event on a non-topological metric, the resulting event will be raised on the monitoring environment itself, and not on a specific Smartscape entity.

  • Measurements are related to a single entity

Example: batch job executions measured on a monitored host, where the measurement is related to the host.

batchjob.executions,dt.entity.host=HOST-1111111,hostname=hostA,ip=53.43.23.12 23  
batchjob.executions,dt.entity.host=HOST-2222222,hostname=hostB,ip=53.43.23.12 23

Explanation: if you define a metric event on a measurement that is related to a single entity, the resulting event will be opened on that entity.

  • Measurements are related to multiple entities

Example: the number of batch job runs measured for a process on a monitored host, where the measurement is related to both the process and the host.

batchjob.executions,dt.entity.host=HOST-1,dt.entity.process=PROCESS-GROUP-INSTANCE-1,hostname=hostA,ip=53.43.23.12 23
batchjob.executions,dt.entity.host=HOST-2222222,dt.entity.process=PROCESS-GROUP-INSTANCE-2,hostname=hostB,ip=53.43.23.12 23

Explanation: when multiple entities are specified for each measurement, Dynatrace selects the most appropriate entity on which it should raise the event. In the case of a host and a process, the measurement presumably relates to the process rather than the host, so the event is raised on the process.

Configuration

To gain further flexibility on which entity an event is to be raised, you can override the default configuration. To edit the configuration, go to Settings > Anomaly detection > Custom events for alerting, select the event you want to modify, and then select Edit.

Event severity

The severity of an event determines if a problem should be raised or not, and if Davis AI should determine the root cause of the given event.

Types of severities

The following types of severities are available.

Severity Problem raised Davis analysis Semantic
Availability Yes Yes Reports any kind of severe component outage
Error Yes Yes Reports any kind of degradation of operational health due to errors
Slowdown Yes Yes Reports a slowdown of an IT component
Resource Yes Yes Reports a lack of resources or a resource-conflict situation
Info No Yes Reports any kind of interesting situation on a component, such as a deployment change
Custom alert Yes No Triggers an alert without causation and Davis AI involved

For more information about built-in events and their severity levels, see Event types.

Create a metric event

To create a metric event configuration

  1. Go to Settings > Anomaly Detection > Custom events for alerting and select Create custom event for alerting.
  2. Select the metric for your metric event. You can select the metric by the category it belongs to or by the exact metric name.
  3. Select a type of aggregation for the metric (where applicable).
  4. Optional Select the dimensions to be considered by the event.
  5. Optional Add rule-based entity filters.
  6. Define the monitoring strategy. You have two options:

For either option, you must specify a sliding window for comparison. This defines how often the threshold (whether automatically calculated or manually specified) must be violated within a sliding window of time to raise an event (violations don't have to be successive). It helps you to avoid overly aggressive alerting on single violations. You can set a sliding window of up to 60 minutes. Additionally, you must specify:

  • Auto-adaptive baseline—how many times the signal fluctuation is added to the baseline.
  • Static thresholds—the threshold. Dynatrace suggests a threshold based on the previous data.
  1. Select the timeframe of the preview. You can receive alerts for 12 hours, one day, or seven days, and evaluate how effective your configuration is.
  2. Select a title for your event. The title should be a short, easy-to-read string describing the situation, such as "High network activity" or "CPU saturation".
  3. Create a meaningful event message. Event messages help you understand the nature of the event. You can use the following placeholders: {metricname}, {severity}, {alert_condition}, and {baseline} or {threshold}.
  4. Select Create custom event for alerting to save your new event.

Metric events API

The same metric events functionality is available through the Anomaly detection—metric events API. Using the API, you can list, update, create, and delete configurations.