Whether it is JMeter, SoapUI, Load Runner, SilkTest, Neotys or one of the cloud-based load testing solutions such as Dynatrace Load, BlazeMeter or others, breaking an application under heavy load is easy these days. Finding the problem based on automatically generated load testing reports is not. Can you tell me what is wrong based on the following reports?
My Key Metrics from Web Server to Database
I’ve helped engineering organizations over the last 10 – 15 years to either run or analyze load tests. In this blog post I share my best practices and metrics I typically look when analyzing load testing results. I am not relying on the out-of-the box load testing reports, but instead I extend them based on the tools and capabilities, or put in an APM tool such as Dynatrace to capture this type of data while the load testing tool drives the load.
Now! If a load testing tasks is coming up for you I hope you find most of my described steps useful as I believe it will make analyzing your results easier. Feel free to use Dynatrace (or any other APM tool if you already have such a tool) in order to capture and analyze the following metrics from the different application tiers & components in your application:
Now – let me go into the details of these metrics, where to capture them from, and what they tell us. In this blog I focus on the Web Server, Application Server, Hosts and the Application Layers. The next blog will focus on the Database as well as Errors and Logging.
1. Top Web Server Metrics
On the Web Server (Apache, IIS, Nginx, …) the following key metrics have proven extremely valuable to identify problems in your deployment:
- Busy and Idle Threads
- Do you need more worker threads per web server?
- Do you need more web servers?
- Are threads busy for too long because of application performance hotspots?
- How many transactions / minute can we handle?
- When do we need to scale out and add more web servers?
- Bandwidth Requirements
- Is the network the bottleneck?
- Are our average pages too heavy?
- Can we offload content to CDNs?
For example below we have a Web Server Process Health Dashboard– showing all of the metrics that are key for me. They get captured through a module placed in the Web Server:
2. Top App Server Metrics
On the Application Server (Java, .NET, PHP) I focus on the following key metrics to identify any deployment or configuration problems on your application servers:
- Load Distribution
- How many transactions are handled by each JVM/CLR/PHP engine?
- Are they equally load balanced?
- Do we need more Application Servers to handle the load?
- CPU Hotspots
- How much CPU is needed for this tested load?
- Is high CPU caused by bad programming and can be fixed?
- Or do we need more CPU power?
- Worker Threads
- Is the number of worker threads correctly configured?
- Are worker threads busy because the application servers are not ready?
- Are there any web server modules that block these threads?
- Memory Issues
- Do we see bad memory patterns? Do we have a memory leak?
- What’s the impact of Garbage Collection on CPU and Transaction Throughput?
The following screenshot shows my Process Health Dashboard. All data is automatically captured via an Agent and injected in your Java, .NET, PHP or node.js Engine:
3. Top Host Health Metrics
Web and Application Servers obviously run on physical or virtualized hosts. That’s why it is also important to do a sanity check on all hosts that are involved. My key metrics are:
- CPU, Memory, Disk, I/O
- Are we exhausting the physical or virtual resources on that box?
- Any disks that are flooded with useless log files?
- Any problems on network interfaces?
- Key Processes
- Which processes run on that box?
- Who is taking resources away from other processes?
- Shall we move processes to other machines?
The following screenshot shows a Host Health Dashboard. The data automatically gets captured by any of the agents you install in your JVMs, CLRs, PHP Engines or Web Servers:
4. Top Application Metrics
The executed application code also needs our attention as we can get some great insights. The key metric for me is the time spent in each individual logical tier as it tells me which tiers are scaling well under load and which ones aren’t:
- Time Spent in Logical Tier / Layer
- Which tier is proportionally spending more time with increasing load?
- Which tiers are scaling with load – which ones are not scaling?
- Number of Calls into a Logical Tier
- How often do we call internal Web Services?
- How often do we call into critical APIs, e.g.: Hibernate, Spring, SharePoint, etc.
The following screenshot shows a Layer Breakdown Chart showing which methods/packages/components spend the time in the executed code:
Your Key Metrics?
I am sure most of you have been testing applications for years and you probably have your set of metrics you look at and tools to capture it. Fill me in and share it with other readers. I will be compiling some more metrics for you to consider next week. Be on the lookout for part two!
I also did a How to do Load Testing with Dynatrace Performance Clinic and put it on my YouTube Channel. Check it out and see how I live analyze some of these screenshots shown here.