Create and Configure incident rules

An incident rule in AppMon is a mapping between measured thresholds and the actions to be taken when such thresholds are violated. If all the thresholds defined in the incident rule are violated, this is an incident, even if no actions have been configured for the incident rule.

You need to configure the following mandatory settings for an incident rule to work:

Additionally, you can configure:

  • Actions: Automatic actions to be taken when the rule triggers.
  • Link a dashboard to quickly access it when the rule triggers.

To create a new incident rule, click Create Incident Rule in the Incidents item of the System Profile Preferences dialog box.

To edit an existing rule, double-click in the same item. You can also right-click the required rule in the Incidents Overview dashlet, and select Edit incident rule from the context menu.

The procedure for editing an existing rule is generally the same as creating a new rule. However, you can't edit conditions for some of the built-in incidents.

Incident rule configuration
Incident rule configuration

General settings

  • Name: Give the rule a unique name that describes its purpose.
  • Description: Optionally, provide a description of the rule. The description can include the measures associated with it and the action that occurs if the rule is violated.
  • Evaluation Timeframe: Select the duration used to evaluate if the defined conditions are met.

    For example, if you select a timeframe of one minute and use a PurePath duration measure with an average aggregation as input for the incident rule, the average PurePath duration of the last minute is checked for violation every 10 seconds. Measures remain in memory for one hour.

    The following graphic illustrates how timeframe is used to check for incidents.

    The longer the evaluation timeframe is, the smoother the averaged values will be. Use longer timeframes to reduce the impact of short periods of unusual values that would cause violations with shorter timeframes. The following graphic shows this smoothing effect for different evaluation timeframes. Note how the 10s-values fluctuate wildly while the 60s average provides a much smoother impression. 3m and 10m averages are even smoother and in this example don't show much detail at all.

    Smoothing effect of different incident evaluation timeframes
    Smoothing effect of different incident evaluation timeframes

    The evaluation timeframe doesn't have any direct impact on incident duration though. Incidents close only when their condition hasn’t been met for a minute. This more effectively deals with measurements oscillating around their thresholds. Once this timespan has passed without further violation, incidents are closed with the closing time set to fit actual measurements.

  • Incident Severity: Choose the severity of incidents to show in incident-related dashlets. Incident severities are Informational, Warning, or Severe. Select the level that requires a response to notifications.
  • Incident Suppression: Set the incident delay in seconds. Incidents are suppressed during the configured number of seconds after an occurrence, to avoid sending redundant notifications.
  • Store incidents in Performance Warehouse: Set by default, this setting guarantees a complete incident history for the available charting data time frame. Disable this option if an incident should trigger actions without a historical record. All configured actions are still performed. Closed incidents are deleted no later than the next AppMon Server restart.

Conditions

A condition for an incident rule is a measure with defined Warning and/or Severe thresholds. Each rule must have at least one condition. If the incident rule has several conditions, you must define logic to concatenate them.

Whenever the specified measures exceed the threshold, the incident triggers, and its actions, if any, execute.

Conditions table
Conditions table

Click Add to select a measure for the condition.
Click Edit to change the configurable properties of a selected measure.
Click Remove to remove a selected measure for a condition from the list.

Each measure used for a condition displays by name in the conditions list and includes the following details:

  • Agent Group or Monitor: The Agent Group or the monitor for which the measure was configured.
  • Threshold: The threshold type for a condition that triggers an incident. Types include no threshold, warning or severe, and severe.
    • no threshold: Condition is ignored when the incident rule is evaluated.
    • warning or severe: A warning triggers when the warning threshold is exceeded. No additional incident occurs if the severe threshold is exceeded.
      If you want an incident to be thrown if a severe condition occurs after a warning condition, you need to define a separate incident rule with the severe threshold.
    • severe: An incident triggers when the severe threshold is exceeded.
      Please note that for dynamic measures, not only the values of individual measures are compared to the threshold, but also the aggregate (sum) of all individual measures is. This can in some cases have surprising results. Incidents created due to only the aggregate violating the threshold won't have any agent information set.
  • Aggregation: The aggregation method for measure values for the evaluation timeframe: avg (average), count, last, max (maximum), min (minimum), sum, or first.
    A measure can also occur multiple times per PurePath.
  • Logic: The logical operator to combine multiple conditions: and, or. You can select an operator only if two or more measures are listed.

    Multiple conditions can be concatenated logically. When the condition is evaluated, no operator precedence (AND stronger than OR) is applied:

    • If the first FALSE condition is followed by an AND concatenation, then the complete expression evaluates to FALSE.
    • If the first FALSE condition occurs after an AND concatenation, then the complete expression evaluates to FALSE.
    • If the first TRUE condition is followed by an OR concatenation, then the complete expression evaluates to TRUE.

    For example:

    • true AND false OR true — evaluates to false
    • true OR false AND true — evaluates to true
    • true OR false AND false — evaluates to true
    • false AND true — evaluates to false
    If measures are combined with AND, their splittings have to match. If the first condition is violated by a measurement from an Agent, and the second condition is violated by a measurement from a different Agent, the combination will not be seen as violating.
    The matching takes into account the available splittings of each measure. For example, if an agent-based measure is combined with a monitor-based measure by the AND operator, it will only need matching hosts. Similarly, if one of the measurements is agent-based and the other is host-based (the example screenshot above shows such a scenario, as Current CPU load is agent-based and CPU Total Time is host-based), only the host will have to match.
    Some measures might contain additional splitting information beyond Agent/host/application, for example Garbage Collector type. If both relevant measures have these splittings, the splitting values must match; if only one of the involved measures do, the values will be ignored.

If you change thresholds in a measure, it will affect all incident rules which are using this measure. When you need the same measure with different thresholds, create a copy of it, and set the new thresholds there.

Dynamic Measures and combining conditions with and

If you have multiple conditions combined with and, AppMon smartly matches by splitting. For a violation to occur, all ANDed conditions need to be violated by the same split - if condition A is violated due to a measurement from agent X and condition B is violated due to a measurement from agent Y, this does not constitute a violation of the overall incident rule. Some measures provide very specific splittings, e.g. the Garbage Collection measures are split by individual garbage collectors. If you combine two such measures, these split values need to match for a violation to occur. If you combine such a measure with a different one that does not have this kind of splitting, that specific splitting value is ignored and only splitting values existing in both measures have to match.

Actions

You can add automatic actions to be performed on the incident:

The Actions tab has two modes: basic, where you can only configure email notification, and advanced, where you can configure any action. Click Advanced Configuration or Basic Configuration to switch between them.

A message displays if the general AppMon email configuration is missing. If this happens, you can configure email by clicking Yes in the message box. The Email tab of Services item of the Dynatrace Server Settings dialog box opens.

Basic configuration

Actions - Basic configuration
Actions - Basic configuration

For basic configuration, you can only configure email notification for incidents. The Default From user of the Email settings is the sender. You can change is in the Advanced Settings mode.

  • Send notification upon Incident Rule violation: Select to activate email notifications.
  • Email: Type in email recipients. You can use the actual email address, or AppMon user names and user groups. In that case, AppMon sends the notification to the email address from the user account.
    The list of recipients appears in the Email Recipients field.
  • Smart Alerting: Sends email just once if multiple incidents occur, until the raised incident is confirmed by a user. Otherwise notification goes out for each incident.

Linked dashboard

You can link a dashboard to the incident rule. This dashboard is used for reporting the incident by email. You can also quickly navigate to the dashboard from the Incidents dashlet, via context menu of the incident. You can only select dashboards stored on the server where the System Profile is stored. The default for new incident rules is the Incident Dashboard, which is deployed with every AppMon Server.

Linked dashboard
Linked dashboard

For built-in baseline incident rules, the linked dashboard configuration includes an option to open the affected splitting in the Applications dashboard.

For built-in host incidents, the configuration includes an option to open the Infrastructure monitoring dashboard for the affected host.

Rule creation example