Why we spend too much time with load testing

I have been working with many clients that perform load testing – ranging from small to large scale – using open source or commerical tools. Load-Testing itself has become easier and more affordable due to growing list of available tools as well as the availability of online load testing services. Despite the fact that load testing can be done easier and faster and that more load can be generated with less costs there is one problem that all of my clients have in common: the load testing reports do not show what caused the problem. They only show that there was a problem, e.g.: search page is slow with 50 concurrent users but don’t tell what caused the problem. Captured performance metrics like CPU and Memory counters only highligh potential problem areas but don’t point to the actual root cause in the application.

Standard Load Testing Reports only provide a Black-Box View

If you look at the following load testing report we see that we have a problem when we hit a certain hits/second threshold on our tested application.

Load Testing Results
Load Testing Results

We can also see that CPU on the server is probably related to the problem as it ramped up similar to our generated load and dropped when we started to see errors. If you present these graphs to your engineers they will probably be surprised that the application could not handle more hits/second but they won’t be able to tell you if there is a problem in the application (and where the problem is) or whether the application just cannot handle more load in the current deployment.

Too many test iterations slow you down

So we’ve learned that the standard output that we get from a load testing tool is not helping us to analyze a problem down to its root cause. How is this problem usually solved? The following illustration shows a typical testing cycle that I’ve seen with our clients.

Multiple Test Iterations needed to resolve identified Performance Problems
Multiple Test Iterations needed to resolve identified Performance Problems

After every test run they sit down with their engineers and discuss the results. Engineers try to reproduce the problems that are highlighted in the reports. In most of the cases these problems only happen under load and cannot be reproduced on a local developers machine with an attached debugger or profiling tool. The logical consequence is that metric capturing during the test has to be refined.  Refine Capturing can range from gathering additional available performance counters like CPU, Memory, I/O, Garbage Collection Activity, Database Stats, … to exporting application specific counters like #of orders processed, process queue lenghts, active user sessions, … or to extend application logging with performance trace information like method execution times, executed SQL statements, …

After the refinement is finished the test is re-run. If you are lucky you get the results you need after the first refinement iteration. What I’ve seen though is that several iterations are necessary to get the results that really allow you to analyze and fix the problem. These additional test iterations that involve testers as well as developers consume time. If you have dislocated teams or if you have outsourced testing you have additional overhead to deal with between test iterations.

Our Goal: Test more apps in less time

The goal is to get rid of all the overhead that is currently involved in refining and analyzing data captured during load testing. Novell was a perfect example of how to improve their load-testing process in their distributed agile development team. You can read the details in the published Case Study.

Application Performance Management in Testing enables you to unleash the power of load testing. The following illustration shows what a real load testing process has to look like:

Eliminate Test Iterations with Application Performance Management
Eliminate Test Iterations with Application Performance Management

Yes we can! unleash the power of load-testing

This blog only scratches the surfaces of a problem that we’ve seen at our clients – check out a Case Study we did with Novell that talks about how Novell increased their testing throughput by 2-3x. Too many test iterations and no real visibility into the application prevented them from unleashing the power of load-testing and making it an efficient process that helped with application performance management. Download the full White Paper on How to Transform the Load Testing Process that discusses all problems we have seen in detail as well as showing how to get to the goal of testing more applications in less time. The key elements are visibility into the application as well as automating the capturing and analysis process.

Feedback from your side is always welcome – let me know how our approach to efficient load testing looks like.

Andreas Grabner has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on blog.dynatrace.com. You can follow him on Twitter: @grabnerandi