The first task in any diagnostic process is to isolate the problem. You can do this in high volume and production environments. Your goal is to quickly narrow down the root cause area.
For example, if a transaction has a slow response time, follow these steps:
- Isolate the tier that produces the problem. The quickest way to do this is to use the Transaction Flow and look at the biggest contributors.
- Dig deeper and look at the biggest contributors inside the tiers.
- Dig again and look at the transaction load trend and resource utilization.
This approach quickly traces the response time problem to a particular component or API inside a particular Agent group or specific transaction type Agent. You may also discover that the network latency is at fault.
Goal of this tutorial
To explain how to use AppMon to find out which applications and transactions have problems, and if those problems are related to the application or the infrastructure.
The easyTravel application is running and has some problems. AppMon captures these problems and helps to identify root cause.
- easyTravel demo application installation.
- AppMon with default easyTravel System Profile.
Run any scenario in easyTravel, then click Problem Patterns tab, and select several problem patterns. Let it go for some time, to allow AppMon to gather data.
1 Identify the application problem
AppMon automatically detects applications based on the host or application ID. It also lets you manually configure an application, if automatic configuration doesn't work for you. See System Profile - Applications for more information.
Open the Applications built-in dashboard to see an overview of all the selected System Profile applications. AppMon looks at throughput, response time, failure rate, infrastructure health and user experience. Applications with problems, like a slow response time that violates the automatic calculated baseline, appear at the top of the list.
The following shows that the easyTravel mobile application violates the Response Time baseline. That means that certain business transactions have a response time that violates the baseline. Notice that the underlying infrastructure has health problems that impact the application also. The associated host is unhealthy. It also affects user experience.
2 Identify the Business Transaction problem
Click the problematic application. The Application Details built-in dashboard and displays the problematic business transactions at the top:
By default problematic transactions are shown, as well as automatically detected most important ones. You can manually add Business Transactions to the dashboard. To do so, click the Configuration icon, select the Monitored Business Transaction item. Then select which Business Transactions you want to add.
See How to Use Business Transactions for more information.
3 Analyze the transaction problem details
Click the problematic Business Transaction. The Business Transaction details built-in dashboard tells you if the problem is related to performance, failure or process health. It also tells you if you currently have more throughput than usual, which can also cause unusual behavior. The following shows that the response time surpasses the automatically calculated baseline. This is why AppMon is alerted.
AppMon automatically calculates baselines for median response time (50%) and the slowest 10% of response time (90%). It also calculates a baseline for the failure rate and calculates the expected throughput per transaction, based on historical throughput data. If any of these values cause a baseline violation, drill into more details such as the Response Time Hotspots dashlet, Transaction Flow dashlet or Errors dashlet.
4 Analyze response time hotspots
If the problem is related to degrading response time, the first step is to drill down into it. Click the Response Time Hotspots link in the chart.
The following image shows that one particular API is responsible for almost 80% of the response time that executes on an Agent:
Click on the problematic API to see the Method Hotspots dashlet. This displays which methods are slow to execute, and who is calling these methods.
5 Analyze Transaction Flow and host health
Another way to analyze problems is the Transaction Flow. It provides an overview of the application tiers associated with this business transaction. Use it to determine the following:
- Do certain tiers have a problem?
- Is there too much time spent between tiers?
- Is there too much traffic on the database?
- Are there problems outside the data center, such as third party content?
In addition to analyzing the Transaction Flow, you can get information about the supporting infrastructure. Every node shows host and process health. This gets calculated on different dimensions.
To access the Transaction Flow, go back to the Applications Details built-in dashboard, by clicking it in the breadcrumb, and then click Transaction Flow link in the problematic Business Transaction.
The following image shows that the host that, runs the Customer Web Frontend tier, has a CPU Problem.
Click Show Host Health. The Host Health Overview shows that the host has a very high CPU utilization. You can also see which processes run on that host (top right). This lets you determine if the CPU overhead is the result of these components, or another process running on the machine.
See Follow Your Transactions with Transaction Flow for more information on how to analyze the Transaction Flow. See Host Health Monitoring, Process Health Monitoring for more information on host and process health.
The Applications built-in dashboard makes it very easy to locate problems in individual applications, a specific business transaction or in your infrastructure. The following tasks help to identify problems in your application: