• Home
  • How to use Dynatrace
  • Problem detection and analysis
  • Problem detection
  • Metric events

Metric events for alerting

Dynatrace Davis® automatically analyzes abnormal situations within your IT infrastructure and attempts to identify any relevant impact and root cause. Davis relies on a wide spectrum of information sources, such as a transactional view of your services and applications, as well as on all events raised on individual nodes within your Smartscape® topology.

There are two main sources for single events in Dynatrace:

  • Metric-based events (events that are triggered by a series of measurements)
  • Events that are independent of any metric (for example, process crashes, deployment changes, and VM motion events)

Custom metric events are configured in the global settings of your environment and are visible to all Dynatrace users in your environment.

Scope of the event

The essential aspect of a custom metric event is the correctly configured metric to be monitored. Many Dynatrace metrics are composed of multiple dimensions. You can choose which dimensions to consider for the event. For example, you can select only user actions coming from iOS devices for your custom alert, based on the Action count metric.

You can further fine-tune an event by selecting monitored entities to which it applies. By default, an event applies to all entities providing the respective metric. Using a rule-based filter, you can organize the entities by host group, management zone, name, and tag. For example, for host-based metrics you can include only those hosts that have a certain tag assigned. The actual set of available criteria depends on the metric.

Metric event scope

Alerting scope preview can display up to 100 entities that deliver the selected metric and correspond to all specified filters.

If you set a threshold on more than 100 entities, preview won't be available and it would result in a considerable number of alerts.

Topology awareness

Topology awareness and context is one of the key themes of the Dynatrace observability platform. Data, such as metrics, traces, events, and logs is not simply reported and stored within the platform, it includes references to the topology where the data originated. For example, with process metrics, each measurement comes with a reference to the associated hosts and processes. The Davis AI uses this topological information to automatically perform root-cause detection and impact analysis for detected anomalies. The same applies to all custom metric events that are configured in a monitoring environment.

When an anomaly detection configuration raises an event, Dynatrace automatically identifies the most relevant entity to map the event to. If multiple entity references are detected, the most relevant one is automatically selected. For example, if a metric with a reference to both a host and a process leads to an event, the event is raised on the process

Metric ingestion enables you to submit all types of metric measurements, regardless of the number of entities they relate to. The following scenarios exist:

Measurements aren't related to any entity

If you define a metric event on a non-topological metric, the resulting event will be raised on the monitoring environment itself, and not on a specific Smartscape entity.

Example: revenue numbers measured for all retail shops per geographic region
http
business.revenue,shop=shop111,city=NewYork 234 business.revenue,shop=shop999,city=Atlanta 499

Measurements are related to a single entity

If you define a metric event on a measurement that is related to a single entity, the resulting event will be raised on that entity.

Example: batch job executions measured on a monitored host, where the measurement is related to the host
http
batchjob.executions,dt.entity.host=HOST-1111111,hostname=hostA,ip=53.43.23.12 23 batchjob.executions,dt.entity.host=HOST-2222222,hostname=hostB,ip=53.43.23.12 23

Measurements are related to multiple entities

When multiple entities are specified for each measurement, Dynatrace selects the most appropriate entity on which it should raise the event. In the case of a host and a process, the measurement presumably relates to the process rather than the host, so the event is raised on the process.

Example: the number of batch job runs measured for a process on a monitored host, where the measurement is related to both the process and the host
http
batchjob.executions,dt.entity.host=HOST-1,dt.entity.process=PROCESS-GROUP-INSTANCE-1,hostname=hostA,ip=53.43.23.12 23 batchjob.executions,dt.entity.host=HOST-2222222,dt.entity.process=PROCESS-GROUP-INSTANCE-2,hostname=hostB,ip=53.43.23.12 23

Event severity

The severity of an event determines if a problem should be raised or not, and if Davis AI should determine the root cause of the given event.

SeverityProblem raisedDavis analysisSemantic
AvailabilityYesYesReports any kind of severe component outage.
ErrorYesYesReports any kind of degradation of operational health due to errors.
SlowdownYesYesReports a slowdown of an IT component.
ResourceYesYesReports a lack of resources or a resource-conflict situation.
InfoNoYesReports any kind of interesting situation on a component, such as a deployment change.
Custom alertYesNoTriggers an alert without causation and Davis AI involved.

For more information about built-in events and their severity levels, see Event types.

Event duration

In the configuration of a metric event, you specify how many one-minute slots must violate the threshold or baseline during a certain time period (the evaluation window). When this happens, Dynatrace raises an event.

The event remains open until the metric stays within the threshold or baseline for a certain number of one-minute slots within the same evaluation window, at which point Dynatrace closes the event. By default, the number of such de-alerting slots equals the size of the evaluation window. For example, if the size of the evaluation window is set to 5, the metric has to stay within the threshold or baseline for 5 consecutive one-minute time slots to close the event. You can modify the number of de-alerting slots via the Metric events API.

Create a metric event

To create a metric event configuration

  1. In the Dynatrace menu, go to Settings > Anomaly Detection > Custom events for alerting and select Create custom event for alerting.
  2. Switch to the Build tab.
  3. Select the metric for your metric event. You can select the metric by the category it belongs to or by the exact metric name.
  4. Select a type of aggregation for the metric (where applicable).
  5. optional Select the dimensions to be considered by the event.
  6. optional Add rule-based entity filters.
  7. Define the monitoring strategy.
    1. Choose the strategy:
      • Auto-adaptive baseline—Dynatrace calculates the threshold automatically and adapts it dynamically to your metric's behavior.
      • Static threshold—threshold that doesn't change through time.
    2. Specify a sliding window for comparison. The sliding window defines how often the threshold (whether automatically calculated or manually specified) must be violated within a sliding window of time to raise an event (violations don't have to be successive). It helps you to avoid overly aggressive alerting on single violations. You can set a sliding window of up to 60 minutes.
    3. Depending on the selected strategy, specify:
      • Auto-adaptive baseline—how many times the signal fluctuation is added to the baseline.
      • Static threshold—the threshold value. Dynatrace suggests a value based on the previous data.
    4. Choose the missing data alert behavior. If missing data alert is enabled, it is combined with the baseline/threshold condition by the OR logic.
  8. Select the timeframe of the preview. You can receive alerts for 12 hours, one day, or seven days, and evaluate how effective your configuration is.
  9. Select a title for your event. The title should be a short, easy-to-read string describing the situation, such as High network activity or CPU saturation.
  10. In the Event description section, create a meaningful event message. Event messages help you understand the nature of the event. You can use the following placeholders:
    • {alert_condition}—the condition of the alert (above/below the threshold).
    • {baseline}—the violated value of the baseline.
    • {dims}—a list of all dimensions (and their values) of the metric that violated the threshold. You can also specify a particular dimension: {dims:dt.entity.<entity>}. To fetch the list of available dimensions for your metric, query it via GET metric descriptor.
    • {entityname}—the name of the affected entity.
    • {metricname}—the name of the metric that violated the threshold.
    • {missing_data_samples}—the number of samples with missing data. Only available if missing data alert is enabled.
    • {severity}—the severity of the event.
    • {threshold}—the violated value of the threshold.
  11. Select Create custom event for alerting to save your new event.
  1. In the Dynatrace menu, go to Settings > Anomaly Detection > Custom events for alerting and select Create custom event for alerting.
  2. Switch to the Code tab.
  3. Type in the required metric selector. For reference, see Metric selector transformation and Metric expressions.
  4. optional Under Advanced entity settings, define the entity type to which the raised events should be mapped.
  5. Define the monitoring strategy.
    1. Choose the strategy:
      • Auto-adaptive baseline—Dynatrace calculates the threshold automatically and adapts it dynamically to your metric's behavior.
      • Static threshold—threshold that doesn't change through time.
    2. Specify a sliding window for comparison. The sliding window defines how often the threshold (whether automatically calculated or manually specified) must be violated within a sliding window of time to raise an event (violations don't have to be successive). This helps you to avoid overly aggressive alerting on single violations. You can set a sliding window of up to 60 minutes.
    3. Depending on the selected strategy, specify:
      • Auto-adaptive baseline—how many times the signal fluctuation is added to the baseline.
      • Static threshold—the threshold value. Dynatrace suggests a value based on the previous data.
    4. Choose the missing data alert behavior. If missing data alert is enabled, it is combined with the baseline/threshold condition by the OR logic.
  6. Select the timeframe of the preview. You can receive alerts for 12 hours, one day, or seven days, and evaluate how effective your configuration is.
  7. Select a title for your event. The title should be a short, easy-to-read string describing the situation, such as High network activity or CPU saturation.
  8. In the Event description section, create a meaningful event message. Event messages help you understand the nature of the event. You can use the following placeholders:
    • {alert_condition}—the condition of the alert (above/below the threshold).
    • {baseline}—the violated value of the baseline.
    • {dims}—a list of all dimensions (and their values) of the metric that violated the threshold. You can also specify a particular dimension: {dims:dt.entity.<entity>}. To fetch the list of available dimensions for your metric, query it via GET metric descriptor.
    • {entityname}—the name of the affected entity.
    • {metricname}—the name of the metric that violated the threshold.
    • {missing_data_samples}—the number of samples with missing data. Only available if missing data alert is enabled.
    • {severity}—the severity of the event.
    • {threshold}—the violated value of the threshold.
  9. Select Create custom event for alerting to save your new event.

Metric events API

The same metric events functionality is available through the Anomaly detection—metric events API. Using the API, you can list, update, create, and delete configurations.

Related topics
  • Anomaly detection API - Metric events

    Learn what the Dynatrace Anomaly detection API for metric events offers.

  • Metrics

    Learn about the various metrics that Dynatrace offers.