An incident rule in AppMon is a mapping between measured thresholds and the actions to be taken when such thresholds are violated. If all the thresholds defined in the incident rule are violated, this is an incident, even if no actions have been configured for the incident rule.
You need to configure the following mandatory settings for an incident rule to work:
- General settings: Basic settings, for example name or severity.
- Rule conditions: Conditions for the rule to trigger.
Additionally, you can configure:
- Actions: Automatic actions to be taken when the rule triggers.
- Link a dashboard to quickly access it when the rule triggers.
To create a new incident rule, click Create Incident Rule in the Incidents item of the System Profile Preferences dialog box.
To edit an existing rule, double-click in the same item. You can also right-click the required rule in the Incidents Overview dashlet, and select Edit incident rule from the context menu.
The procedure for editing an existing rule is generally the same as creating a new rule. However, you can't edit conditions for some of the built-in incidents.
- Name: Give the rule a unique name that describes its purpose.
- Description: Optionally, provide a description of the rule. The description can include the measures associated with it and the action that occurs if the rule is violated.
- Evaluation Timeframe: Select the duration used to evaluate if the defined conditions are met.
For example, if you select a timeframe of one minute and use a PurePath duration measure with an average aggregation as input for the incident rule, the average PurePath duration of the last minute is checked for violation every 10 seconds. Measures remain in memory for one hour.
The following graphic illustrates how timeframe is used to check for incidents.
The longer the evaluation timeframe is, the smoother the averaged values will be. Use longer timeframes to reduce the impact of short periods of unusual values that would cause violations with shorter timeframes. The following graphic shows this smoothing effect for different evaluation timeframes. Note how the 10s-values fluctuate wildly while the 60s average provides a much smoother impression. 3m and 10m averages are even smoother and in this example don't show much detail at all.
The evaluation timeframe doesn't have any direct impact on incident duration though. Incidents close only when their condition hasn’t been met for a minute. This more effectively deals with measurements oscillating around their thresholds. Once this timespan has passed without further violation, incidents are closed with the closing time set to fit actual measurements.
- Incident Severity: Choose the severity of incidents to show in incident-related dashlets. Incident severities are Informational, Warning, or Severe. Select the level that requires a response to notifications.
- Incident Suppression: Set the incident delay in seconds. Incidents are suppressed during the configured number of seconds after an occurrence, to avoid sending redundant notifications.
- Store incidents in Performance Warehouse: Set by default, this setting guarantees a complete incident history for the available charting data time frame. Disable this option if an incident should trigger actions without a historical record. All configured actions are still performed. Closed incidents are deleted no later than the next AppMon Server restart.
A condition for an incident rule is a measure with defined Warning and/or Severe thresholds. Each rule must have at least one condition. If the incident rule has several conditions, you must define logic to concatenate them.
Whenever the specified measures exceed the threshold, the incident triggers, and its actions, if any, execute.
Each measure used for a condition displays by name in the conditions list and includes the following details:
- Agent Group or Monitor: The Agent Group or the monitor for which the measure was configured.
- Threshold: The threshold type for a condition that triggers an incident. Types include no threshold, warning or severe, and severe.
- no threshold: Condition is ignored when the incident rule is evaluated.
- warning or severe: A warning triggers when the warning threshold is exceeded. No additional incident occurs if the severe threshold is exceeded.
If you want an incident to be thrown if a severe condition occurs after a warning condition, you need to define a separate incident rule with the severe threshold.
- severe: An incident triggers when the severe threshold is exceeded.
Please note that for dynamic measures, not only the values of individual measures are compared to the threshold, but also the aggregate (sum) of all individual measures is. This can in some cases have surprising results. Incidents created due to only the aggregate violating the threshold won't have any agent information set.
- Aggregation: The aggregation method for measure values for the evaluation timeframe: avg (average), count, last, max (maximum), min (minimum), sum, or first.
A measure can also occur multiple times per PurePath.
- Logic: The logical operator to combine multiple conditions: and, or. You can select an operator only if two or more measures are listed.
Multiple conditions can be concatenated logically. When the condition is evaluated, no operator precedence (
OR) is applied:
- If the first
FALSEcondition is followed by an
ANDconcatenation, then the complete expression evaluates to
- If the first
FALSEcondition occurs after an
ANDconcatenation, then the complete expression evaluates to
- If the first
TRUEcondition is followed by an
ORconcatenation, then the complete expression evaluates to
true AND false OR true— evaluates to false
true OR false AND true— evaluates to true
true OR false AND false— evaluates to true
false AND true— evaluates to false
AND, their splittings have to match. If the first condition is violated by a measurement from an Agent, and the second condition is violated by a measurement from a different Agent, the combination will not be seen as violating.
The matching takes into account the available splittings of each measure. For example, if an agent-based measure is combined with a monitor-based measure by the
ANDoperator, it will only need matching hosts. Similarly, if one of the measurements is agent-based and the other is host-based (the example screenshot above shows such a scenario, as
Current CPU loadis agent-based and
CPU Total Timeis host-based), only the host will have to match.
Some measures might contain additional splitting information beyond Agent/host/application, for example Garbage Collector type. If both relevant measures have these splittings, the splitting values must match; if only one of the involved measures do, the values will be ignored.
- If the first
If you change thresholds in a measure, it will affect all incident rules which are using this measure. When you need the same measure with different thresholds, create a copy of it, and set the new thresholds there.
Dynamic Measures and combining conditions with and
If you have multiple conditions combined with and, AppMon smartly matches by splitting. For a violation to occur, all ANDed conditions need to be violated by the same split - if condition A is violated due to a measurement from agent X and condition B is violated due to a measurement from agent Y, this does not constitute a violation of the overall incident rule. Some measures provide very specific splittings, e.g. the Garbage Collection measures are split by individual garbage collectors. If you combine two such measures, these split values need to match for a violation to occur. If you combine such a measure with a different one that does not have this kind of splitting, that specific splitting value is ignored and only splitting values existing in both measures have to match.
You can add automatic actions to be performed on the incident:
The Actions tab has two modes: basic, where you can only configure email notification, and advanced, where you can configure any action. Click Advanced Configuration or Basic Configuration to switch between them.
A message displays if the general AppMon email configuration is missing. If this happens, you can configure email by clicking Yes in the message box. The Email tab of Services item of the Dynatrace Server Settings dialog box opens.
For basic configuration, you can only configure email notification for incidents. The Default From user of the Email settings is the sender. You can change is in the Advanced Settings mode.
- Send notification upon Incident Rule violation: Select to activate email notifications.
- Email: Type in email recipients. You can use the actual email address, or AppMon user names and user groups. In that case, AppMon sends the notification to the email address from the user account.
The list of recipients appears in the Email Recipients field.
- Smart Alerting: Sends email just once if multiple incidents occur, until the raised incident is confirmed by a user. Otherwise notification goes out for each incident.
Click Add to add a rule action or select an existing one and click Edit. Find the properties of the action rules in the Configure incident rule actions section.
For each rule action, you can configure the following parameters:
- Smart Alerting: Action triggers just once if multiple incidents occur, until the raised incident is confirmed by a user. Otherwise action performs for each incident.
- Action Severity: A severity level—Informational, Warning, Severe—can be set for each action. This severity level is forwarded to the Plugins that execute the action.
- Execution: Specify when the action should be executed in relation to the incident. The action can be triggered when the incident is raised, when the incident is ended, or every time an incident begins or ends.
If execution is set for the beginning and end of an incident, and the incident has a duration of 0 seconds (the start and end time are the same), then the action will only be executed once.
You can link a dashboard to the incident rule. This dashboard is used for reporting the incident by email. You can also quickly navigate to the dashboard from the Incidents dashlet, via context menu of the incident. You can only select dashboards stored on the server where the System Profile is stored. The default for new incident rules is the Incident Dashboard, which is deployed with every AppMon Server.
For built-in baseline incident rules, the linked dashboard configuration includes an option to open the affected splitting in the Applications dashboard.
For built-in host incidents, the configuration includes an option to open the Infrastructure monitoring dashboard for the affected host.
Rule creation example
- Open the System Profile Preferences dialog box and click Incidents.
- Click Create Incident Rule.
- Specify the name, evaluation timeframe, and severity of the incident.
- Under the Conditions table, click Add to add a measure to evaluate.
- In the Add Measure dialog box, select the required measure and click Add. This example uses the CPU Total Time measure.
- Double-click the newly added measure to configure its thresholds in the Measure Properties dialog box.
- Define threshold usage under Thresholds column. In this example it is warning of severe.
- Repeat steps 4 to 7 to add another measure. In this example it is Memory Used.
- In the Logic column, select the logic for incident evaluation. In this example it is or.
- At the Actions tab, define the action on the incident. In this example it is email notification.
- Click Create to create the incident rule.