In the Incidents pane of the System Profile, click Create Incident Rule to define an incident's rule properties, conditions, and actions, or select an existing rule and click Edit to update an existing rule's properties and actions. You can also link a dashboard to the incident rule for email reporting. The procedure for editing an existing rule is generally the same as creating a new rule.
Configuring incident rules
Use the Create Incident Rule dialog box to define rule properties.
Configure general rule settings
- Name: Give the rule a unique name that describes its purpose.
- Description: Optionally, provide a description of the rule. The description can include the measures associated with it and the action that occurs if the rule is violated.
- Evaluation Timeframe: Select the duration used to evaluate if the defined conditions are met.
For example, if you select a timeframe of one minute and use a PurePath duration measure with an average aggregation as input for the incident rule, the average PurePath duration of the last minute is checked for violation every 10 seconds. Measures remain in memory for one hour.
The following graphic illustrates how timeframe is used to check for incidents.
Incidents close only when their condition hasn’t been met for a minute. This more effectively deals with measurements oscillating around their thresholds. Once this timespan has passed without further violation, incidents are closed with the closing time set to fit actual measurements.
- Incident Severity: Choose the severity of incidents to show in incident-related dashlets. Incident severities are Informational, Warning, or Severe. Select the level that requires a response to notifications.
- Incident Suppression: Set the incident delay in seconds. Incidents are suppressed during the configured number of seconds after an occurrence, to avoid sending redundant notifications.
- Store incidents in Performance Warehouse: Set by default, this setting guarantees a complete incident history for the available charting data time frame. Disable this option if an incident should trigger actions without a historical record. All configured actions are still performed. Closed incidents are deleted no later than the next AppMon Server restart.
Use the Conditions tab to define measure-based conditions. An incident rule has one or more conditions based on Warning and Severe thresholds and one or more measure aggregations. Logical operators are used for multiple conditions. The incident rule's actions execute whenever the specified measures exceed the threshold.
Each measure used for a condition displays by name in the conditions list and includes the following details:
- Agent Group or Monitor: The agent group or monitor for which the measure was configured.
- Threshold: The threshold type for a condition that triggers an incident. Types include no threshold, warning or severe, and severe.
- no threshold: Condition is ignored when the incident rule is evaluated.
- warning or severe: A warning triggers when the Warning threshold is exceeded. No additional incident occurs if the Severe threshold is exceeded. If you want an incident to be thrown if a Severe condition occurs after a Warning condition, you need to define a separate incident rule with the severe threshold.
- severe: An incident triggers when the severe threshold is exceeded.
- Aggregation: The aggregate value that is used for the evaluation timeframe: avg (average), count, last, max (maximum), min (minimum), sum, or first.
A measure can also occur multiple times per PurePath.
- Logic: The logical operator to combine multiple conditions: and, or. You can select an operator only if at least two measures are listed.
Adding a measure to an incident rule's conditions initializes its threshold state with the most severe level defined for the measure. If a value is only defined for the warning threshold, the incident rule threshold initializes to warning or severe. If the severe threshold or both warning and severe thresholds are set, the incident rule threshold initializes to severe.
Changed threshold settings apply to all incident rules using this measure. If different threshold settings are required, create an appropriately named copy of the measure with the desired threshold values.
- Right-click your System Profile and click Edit System Profile > Incidents.
- Click Create Incident Rule.
- Specify the name, evaluation timeframe, and severity of the incident.
- Under the Conditions table, click Add to add a measure to evaluate.
- In the Add Measure dialog box, select the required measure and click Add. This example uses the CPU Total Time measure.
- Double-click the newly added measure to configure its thresholds in the Measure Properties dialog box.
- Define threshold usage under Thresholds column. In this example it is warning of severe.
- Repeat steps 4 to 7 to add another measure. In this example it is Memory Used.
- In the Logic column, select the logic for incident evaluation. In this example it is or.
- At the Actions tab, define the action on the incident. In this example it is notification using email.
- Click Create to create the incident rule.
Click the Actions tab to configure and customize incident email notification.
A message displays if the general AppMon email configuration is missing. If this happens, you can configure email by clicking Yes in the message box. Configure email general use in AppMon in the Email tab of the Dynatrace Server Settings for Services.
Configuring email notification for incidents includes basic and advanced configurations. The following figures show the screens you use to define basic and advanced actions.
For basic configuration, you can only configure email notification for incidents. Basic configuration for incident rule actions includes the following:
- Send notification upon Incident Rule violation: Select this check box to allow email to be sent to the specified recipients when the configured incident rule is violated. Selecting this check box enables all other basic configuration settings.
- Email: Specify the email address from which notifications are sent.
- Email Recipients: Add or delete email addresses of those who should receive incident notification by email by clicking Add to add emails, or selecting and email and clicking Delete.
- Smart Alerting: Sends only one email upon Incident Rule violation and then disables the action until the raised incident is confirmed.
To refine the incident response, click Advanced Configuration in the top left corner of the Actions tab. Click Add and use the Rule Action Editor to add a rule action to the advanced configuration. You can also select an existing action and click Edit to edit an existing rule action.
When you add or edit an action to the advanced Actions table, you can continue to configure it in the Actions tab with the following settings:
- Smart Alerting: Select the check box in this column to trigger the defined action just once if multiple incidents occur, until the raised incident is confirmed by a user. See Incidents Overview Dashlet for more information.
- Action Severity: A severity level — Informational, Warning, Severe — can be set for each action. This severity level is forwarded to the Plugins that execute the action.
- Execution: Specify when the action should be executed in relation to the incident. The action can be triggered when the incident is raised, when the incident is ended, or every time an incident is raised or ended.
Incident rules include the option to link a dashboard to the incident rule. This dashboard is used for reporting the incident by email. It can be opened through the incidents drilldown menu. You can only select dashboards stored on the server where the System Profile is stored. The default for new incident rules is the Incident Dashboard, which is deployed with every AppMon Server.
For baseline incident rules, the dashboard configuration includes an option to open the affected splitting in the Application Overview. For host incidents, the configuration includes an option to open the Infrastructure Monitoring Dashboards displaying the affected host.
The table lists the host and baseline incident rules.
|Host Incident Rules||Baseline Incident Rules|
|Application Process Unhealthy||Failure Rate Too High|
|Host CPU Unhealthy||Response Time Degraded for Slow Requests|
|Host Disk Unhealthy||Response Time Degraded|
|Host Memory Unhealthy|
|Host Network Unhealthy||Host monitoring has detected a potential problem in the network (more than 90% of bandwidth of a network interface are in use)|