Practical Performance Testing Tips for

In recent testimony to the House Energy and Commerce Committee, Health and Human Services Secretary Kathleen Sebelius admitted that HHS failed to perform enough testing to ensure a working system and revealed that parts of the website did not get “loaded” until the third week of September, one week before its launch.  “We did not adequately do end-to-end testing”, Sebelius said.

As a performance professional for more than 20 years, I’ve seen mission critical projects succeed and fail.  And all the successful projects incorporate a performance and scalability testing plan that starts in development, flows into heavier and heavier load testing, and rolls into production day one, and everyday thereafter.  What steps should have been followed to create a successful launch of the site?   What are the steps and best practices that forward-thinking IT organizations execute to ensure success?   And what do you do if you, like the folks responsible for the government HealthCare site, are in firefighting  mode – trying to rescue a failed launch?

There are a number of best practices that should be followed to implement a proper performance engineering process.  I will focus on one of the most important from a perspective – effective load testing.

Key activities for an effective load test are:

  • Generate load representative of real users
  • Follow user transactions and isolate bottlenecks
  • Communicate findings with development & operations
  • Compare test results

Let’s look at the 4 steps of this load testing triage in greater detail:

1. Generating Load

Automated load generation tools differ greatly in the methods used to generate load.  The most popular tools generate load from inside the firewall with test machines sending traffic over the local area network to the system under test (SUT).  This approach is very beneficial in the early stages of component and integration testing, but falls short of providing a complete performance picture because it does not account for key variables outside the firewall, including network delays, DNS lookup, firewall rules, load balancing, CDN’s, and other  third-party components.

The most accurate method of generating end-user load is through a geographically dispersed load testing network that drives load from two sources:

  • Cloud Data Centers – Large volumes of load can be generated by harnessing the resources of commercially available cloud data centers.   Select locations that represent the geographies of your actual users.
  • Last Mile –Last mile endpoints utilize machines that are connected to the internet using local ISPs at various bandwidth capabilities and are used as testing agents during a load test to provide the most accurate end-user measurement available.  While datacenter load is needed for high volume and repeatability, last mile load provides a much more realistic view of user experience.

Why is external load testing important?  We’ve conducted synthetic monitor of the site from several datacenters and last mile peers across the U.S. over the past several days to test for availability and performance.  These single-user synthetic transactions are a great way to baseline performance of critical pages and transactions.

Figure 1 shows that the average response time of the home page as measured from datacenters in 10 cities was 3.4 seconds.


Average Backbone Response Time

Figure 1- Average Backbone Response Time

Similar monitoring from the last mile (Figure 2) shows an average response time of 11.4 seconds.


Average Lastmile Response Time

Figure 2- Average Lastmile Response Time

This monitoring data shows an 8 second difference between response times as measured from the datacenter versus the last mile.  If we were to measure this transaction from inside the firewall, we may see a response time of 1.5 seconds.  So what does this have to do with load testing?

Consider the impact on the infrastructure of a transaction that takes 1.5 seconds versus 3 seconds versus 11 seconds to execute.  This increased session length has performance implications on all tiers of the infrastructure as queues, connections, processes, memory allocations, pools, etc. stay open longer.  Now extrapolate this impact across the thousands of users that access the site concurrently and its clear to see how resource usage and users concurrency can increase exponentially with a realistic load versus a load generated inside the firewall.  Internal load testing can give a false sense of security over the reality of an external load test with last mile users.

External load testing is essential to pre-launch testing of customer-facing web applications because it:

  • Applies the most realistic load on the infrastructure
  • Shows user experience based on geography
  • Exercises any geo-based technologies (CDN, load balancing)
  • Details the impact of 3rd parties (often geo based) on performance
  • Web 2.0 technology (Flash, Ajax, Javascript) as it functions in the browser
  • Mobile infrastructure

2. Isolate Bottlenecks

The generation of realistic load can be quite painless with the advances in load testing technologies as described above.  The purpose of generating load, however, is to isolate the root cause of performance issues as reported by the load testing tool.  Isolation is only possible with 1) detailed monitoring of the system under test and 2) integration between the load testing tool and the monitoring solution.

Since we don’t have access to the systems hosting the site, we’ll use a travel booking application to illustrate.


Load Test Performance Report

Figure 3- Load Test Performance Report

Figure 3 shows a load test report that identifies a problem with the Destination Search transaction.  Clicking on the hotspot on the graph allows a drill down to the waterfall chart of that transaction.  The waterfall chart, shown in Figure 4, details each object as processed by the browser on the Destination Search page.  Each object is color coded to indicate the resource that contributed to the response time of the object.  In this example, the worst performing object is a javascript call.  The 19-second execution time is made up of 17 seconds of first byte time and two seconds of content download time.  This object spent most of its time waiting on a response from the server.

Destination Search Waterfall Chart

Figure 4- Destination Search Waterfall Chart

We can further drill into the object in question to obtain detailed information about  the execution path of this object and a host of other performance data, as shown in Figure 5 below.

The diagram shows the JavaScript call initiating in the browser, making calls to three third-party services (Google, Twitter, Facebook) totaling 18% of the transaction time, and spending the majority of its time (77%) waiting on the internal Java server, whose actual execution contributed to 4% of the total execution time.  Also note the red semi-circle at the Java server tier, indicating a memory bottleneck in the JVM.  If our issue were code related, we could drill down into the offending tier to see the exact line of code or SQL statement that is causing the issue.


Destination Search PurePath Diagram

Figure 5- Destination Search PurePath Diagram

A strong integration of performance monitoring with web load testing removes the guesswork from bottleneck analysis and immediately points to the root cause of performance problems. These are the types of processes and tools required to quickly understand the source of performance problems in complex web applications.

In my next post, I’ll discuss the final steps in the performance testing process – Communicating with Development/Operations and Comparing Test Results.  For more information, you can also download a detailed white paper entitled “Graduating from Load Testing to Performance Engineering.”

Duane Dorch is a senior performance engineer with Compuware Corp. He has more than 20 years of experience in the testing, tuning and deployment of mission critical applications.