Error analysis

Overview

An error can be any kind of unexpected event that happens while a user interacts with a system. Some examples of errors that can occur are resources that cannot be found on a web server, HTTP 500 response codes sent by a web service, or exceptions that occur within an application.

The figure below illustrates some errors and the parts of the system architecture they may affect.

Errors

Detecting errors

AppMon detects errors by defining rules based on captured HTTP response codes, Java/.NET exceptions, Java/.NET log messages, or JavaScript errors (browser errors). Each error detection rule further defines the consequences of the detected error. A detected error has one of the following impacts:

The error is simply detected and does not have further consequences.
The error indicates that the whole transaction (PurePath) where the error was detected has failed.
The error indicates that the whole PageAction (including the transaction) to which the PurePath belongs has failed.

The following are examples of how rules can be defined:

  • All HTTP 5xx response codes that are detected on a transaction entry point indicate errors that cause the whole PageAction to be marked as failed. PurePaths with HTTP 5xx response codes are also automatically grouped as an HTTP response sub-category in the Web UI as part of problem pattern detection. See Problem Pattern Detection and Memory Diagnostics for more information.
  • All exceptions that happen on database calls indicate errors.
  • When a log message with a certain text is captured, the transaction (PurePath) where this log message has occurred is marked as failed.

The AppMon installation includes a predefined set of error detection rules that can be easily extended by custom rules. See the System Profile - Error Detection for details on creating or revising error detection rules.

Monitoring functional health

AppMon evaluates error detection rules on all PurePaths that arrive at the AppMon Server, and applies the defined consequences. Transactions and user actions are marked as failed depending on the detected errors. AppMon presents the detected errors either actively, by raising incidents based on failed transaction rates or failed user action rates, or passively, by presenting error information in various dashlets and charts.

For alerting and charting based on failure rates, AppMon provides out-of-the-box measures subscribed as Server Side Performance - Functional Health. These measures are:

  • Failed Transaction Count
  • Failed Transaction Percentage
  • Failed User Action Count
  • Failed User Action Percentage

These measures reflect the overall count or percentage of failed transactions and user actions. In addition to the overall failed rates, each Business Transaction can track its own failure rate and count. Two auto-subscribed measures for each Business Transaction are configured to calculate its failure rates, one for the failed percentage and one for the failed count.

Error notifications

AppMon offers alerting for failed transactions and user action, either overall or for Business Transactions.

Out-of-the-box alerting based on the overall failure rates is enabled by default for a failed transaction rate of more than 3% within 5 minutes and for a failed user action rate of more than 3% within 5 minutes. These defaults can be changed in the Error Detection configuration of the System Profile.

For more advanced alerting scenarios, you can create an incident rule based on the measures for failed transactions user actions that are available for a Business Transaction. You can define incident rules to raise incidents if the failure rate of the Business Transaction exceeds a specified threshold. For example, an incident may be raised when more than 1% of all purchase transactions fail within a time range of 15 minutes.

Functional Health dashlet

It is essential to monitor the current functional health of a system and compare it with the past health, particularly after deploying a new release or patch. The starting point for monitoring errors is the Functional Health dashlet, which provides:

  • An overview of the failing transactions within a certain timeframe compared with another timeframe.
  • An overview of the error hotspots that have been detected, which is the starting point for analyzing the root cause of the most common errors.

Custom error charts

All failed transaction count or rate measures, and all failed user action count or rate measures, can be added to charts. You can chart the overall failed count and rate measures as well as the failed count and rate measures for Business Transactions.

Error root cause analysis

After identifying errors, you need to find their cause.

The Errors dashlet gives an overview of all errors that have occurred. From this dashlet, you can drill down to the transactions where the errors were detected. You can select the transaction to step through the errors that were detected on method calls of the transaction.

Another approach to root cause analysis is to use analysis dashlets such as Transaction Flow dashlet to identify components that are causing errors.

Many other dashlets provide error information. For details about the information provided by these dashlets, see the following topics: