Metric events for alerting
Dynatrace Davis® automatically analyzes abnormal situations within your IT infrastructure and attempts to identify any relevant impact and root cause. Davis relies on a wide spectrum of information sources, such as a transactional view of your services and applications, as well as on all events raised on individual nodes within your Smartscape® topology.
There are two main sources for single events in Dynatrace:
- Metric-based events (events that are triggered by a series of measurements)
- Events that are independent of any metric (for example, process crashes, deployment changes, and VM motion events)
Custom metric events are configured in the global settings of your environment and are visible to all Dynatrace users in your environment.
Scope of the event
The essential aspect of a custom metric event is the correctly configured metric to be monitored. Many Dynatrace metrics are composed of multiple dimensions. You can choose which dimensions to consider for the event. For example, you can select only user actions coming from iOS devices for your custom alert, based on the Action count metric.
You can further fine-tune an event by selecting monitored entities to which it applies. By default, an event applies to all entities providing the respective metric. Using a rule-based filter, you can organize the entities by host group, management zone, name, and tag. For example, for host-based metrics you can include only those hosts that have a certain tag assigned. The actual set of available criteria depends on the metric.
Alerting scope preview can display up to 100 entities that deliver the selected metric and correspond to all specified filters.
If you set a threshold on more than 100 entities, preview won't be available and it would result in a considerable number of alerts.
Topology awareness
Topology awareness and context is one of the key themes of the Dynatrace observability platform. Data, such as metrics, traces, events, and logs is not simply reported and stored within the platform, it includes references to the topology where the data originated. For example, with process metrics, each measurement comes with a reference to the associated hosts and processes. The Davis AI uses this topological information to automatically perform root-cause detection and impact analysis for detected anomalies. The same applies to all custom metric events that are configured in a monitoring environment.
When an anomaly detection configuration raises an event, Dynatrace automatically identifies the most relevant entity to map the event to. If multiple entity references are detected, the most relevant one is automatically selected. For example, if a metric with a reference to both a host and a process leads to an event, the event is raised on the process
Metric ingestion enables you to submit all types of metric measurements, regardless of the number of entities they relate to. The following scenarios exist:
Measurements aren't related to any entity
If you define a metric event on a non-topological metric, the resulting event will be raised on the monitoring environment itself, and not on a specific Smartscape entity.
Measurements are related to a single entity
If you define a metric event on a measurement that is related to a single entity, the resulting event will be raised on that entity.
Measurements are related to multiple entities
When multiple entities are specified for each measurement, Dynatrace selects the most appropriate entity on which it should raise the event. In the case of a host and a process, the measurement presumably relates to the process rather than the host, so the event is raised on the process.
Event severity
The severity of an event determines if a problem should be raised or not, and if Davis AI should determine the root cause of the given event.
Severity | Problem raised | Davis analysis | Semantic |
---|---|---|---|
Availability | Yes | Yes | Reports any kind of severe component outage. |
Error | Yes | Yes | Reports any kind of degradation of operational health due to errors. |
Slowdown | Yes | Yes | Reports a slowdown of an IT component. |
Resource | Yes | Yes | Reports a lack of resources or a resource-conflict situation. |
Info | No | Yes | Reports any kind of interesting situation on a component, such as a deployment change. |
Custom alert | Yes | No | Triggers an alert without causation and Davis AI involved. |
For more information about built-in events and their severity levels, see Event types.
Event duration
In the configuration of a metric event, you specify how many one-minute slots must violate the threshold or baseline during a certain time period (the evaluation window). When this happens, Dynatrace raises an event.
The event remains open until the metric stays within the threshold or baseline for a certain number of one-minute slots within the same evaluation window, at which point Dynatrace closes the event. By default, the number of such de-alerting slots equals the size of the evaluation window. For example, if the size of the evaluation window is set to 5
, the metric has to stay within the threshold or baseline for 5 consecutive one-minute time slots to close the event. You can modify the number of de-alerting slots via the Metric events API.
Create a metric event
To create a metric event configuration
- In the Dynatrace menu, go to Settings > Anomaly Detection > Custom events for alerting and select Create custom event for alerting.
- Switch to the Build tab.
- Select the metric for your metric event. You can select the metric by the category it belongs to or by the exact metric name.
- Select a type of aggregation for the metric (where applicable).
- optional Select the dimensions to be considered by the event.
- optional Add rule-based entity filters.
- Define the monitoring strategy.
- Choose the strategy:
- Auto-adaptive baseline—Dynatrace calculates the threshold automatically and adapts it dynamically to your metric's behavior.
- Static threshold—threshold that doesn't change through time.
- Specify a sliding window for comparison. The sliding window defines how often the threshold (whether automatically calculated or manually specified) must be violated within a sliding window of time to raise an event (violations don't have to be successive). It helps you to avoid overly aggressive alerting on single violations. You can set a sliding window of up to 60 minutes.
- Depending on the selected strategy, specify:
- Auto-adaptive baseline—how many times the signal fluctuation is added to the baseline.
- Static threshold—the threshold value. Dynatrace suggests a value based on the previous data.
- Choose the missing data alert behavior. If missing data alert is enabled, it is combined with the baseline/threshold condition by the OR logic.
- Choose the strategy:
- Select the timeframe of the preview. You can receive alerts for 12 hours, one day, or seven days, and evaluate how effective your configuration is.
- Select a title for your event. The title should be a short, easy-to-read string describing the situation, such as
High network activity
orCPU saturation
. - In the Event description section, create a meaningful event message. Event messages help you understand the nature of the event. You can use the following placeholders:
{alert_condition}
—the condition of the alert (above/below the threshold).{baseline}
—the violated value of the baseline.{dims}
—a list of all dimensions (and their values) of the metric that violated the threshold. You can also specify a particular dimension:{dims:dt.entity.<entity>}
. To fetch the list of available dimensions for your metric, query it via GET metric descriptor.{entityname}
—the name of the affected entity.{metricname}
—the name of the metric that violated the threshold.{missing_data_samples}
—the number of samples with missing data. Only available if missing data alert is enabled.{severity}
—the severity of the event.{threshold}
—the violated value of the threshold.
- Select Create custom event for alerting to save your new event.
- In the Dynatrace menu, go to Settings > Anomaly Detection > Custom events for alerting and select Create custom event for alerting.
- Switch to the Code tab.
- Type in the required metric selector. For reference, see Metric selector transformation and Metric expressions.
- optional Under Advanced entity settings, define the entity type to which the raised events should be mapped.
- Define the monitoring strategy.
- Choose the strategy:
- Auto-adaptive baseline—Dynatrace calculates the threshold automatically and adapts it dynamically to your metric's behavior.
- Static threshold—threshold that doesn't change through time.
- Specify a sliding window for comparison. The sliding window defines how often the threshold (whether automatically calculated or manually specified) must be violated within a sliding window of time to raise an event (violations don't have to be successive). This helps you to avoid overly aggressive alerting on single violations. You can set a sliding window of up to 60 minutes.
- Depending on the selected strategy, specify:
- Auto-adaptive baseline—how many times the signal fluctuation is added to the baseline.
- Static threshold—the threshold value. Dynatrace suggests a value based on the previous data.
- Choose the missing data alert behavior. If missing data alert is enabled, it is combined with the baseline/threshold condition by the OR logic.
- Choose the strategy:
- Select the timeframe of the preview. You can receive alerts for 12 hours, one day, or seven days, and evaluate how effective your configuration is.
- Select a title for your event. The title should be a short, easy-to-read string describing the situation, such as
High network activity
orCPU saturation
. - In the Event description section, create a meaningful event message. Event messages help you understand the nature of the event. You can use the following placeholders:
{alert_condition}
—the condition of the alert (above/below the threshold).{baseline}
—the violated value of the baseline.{dims}
—a list of all dimensions (and their values) of the metric that violated the threshold. You can also specify a particular dimension:{dims:dt.entity.<entity>}
. To fetch the list of available dimensions for your metric, query it via GET metric descriptor.{entityname}
—the name of the affected entity.{metricname}
—the name of the metric that violated the threshold.{missing_data_samples}
—the number of samples with missing data. Only available if missing data alert is enabled.{severity}
—the severity of the event.{threshold}
—the violated value of the threshold.
- Select Create custom event for alerting to save your new event.
Metric events API
The same metric events functionality is available through the Anomaly detection—metric events API. Using the API, you can list, update, create, and delete configurations.