As I was learning how to design and execute load tests, an experienced load tester told me the one rule you needed to be able to judge the effectiveness of a test execution: If something didn’t go wrong with some part of the application and/or infrastructure during the test, there was likely something wrong with the test you ran.

Your company has spent months designing and writing a new application build and is ready to start testing it. You pull the teams together. You have enabled enhanced logging and monitoring for all parts of the application stack, the network, the outside perspective, and even your third-party providers. A time is schedule and a conference bridge is opened. The final approvals are given and the load is generated.

Success! The test goes off without a hitch and everything is given the final approval for release. But you have this nagging concern: Was the test a success or did it miss the mark entirely? Was all of the third-party content tested in a way that truly verified that it won’t effect customer experience when you site is at peak load? Was the test configured to truly stress the application in a way that matches how customers use it?

Which test is successful: Left or Right? Only your test design can answer that for you
Which test is successful: Left or Right? Only your test design can answer that for you

Being able to execute the load test is the easiest part of the process. The hardest work is in ensuring that the test you run answers the critical questions you have:

  • What and Why are we testing?
  • How do we determine the load to use?
  • How do we want to simulate customer behavior?
  • Who will be involved in the test executions?
  • What’s next?

Designing a load test that, regardless of the technology you use, can answer these questions has a higher probability of generating meaningful results than any other approach I’ve used.

What and why are we testing?

The goal of a load test is to learn something about the complex interactivity and behavior of the application environment under load. But what will be tested will vary greatly. The team could be

  • Verifying a new database layer that should reduce data insertion bottlenecks that are slowing customer transactions and having a direct effect on company revenue
  • Testing an update to the CMS that should make page rendering faster while reducing the load on the system and removing the immediate need for a capital spend on new hardware
  • Ensuring that a new datacenter for load balancing traffic geographically performs as well as the existing datacenter under a full system load in case it has to absorb all traffic in a failure situation
  • Validating that the entire end-to-end application can withstand projected Holiday Season traffic volumes based off of the previous year’s analytics
  • Gathering data on the integration of a new tag aggregation and management system to support the traffic volumes of the post-Holiday sale period.

Each scenario starts with a supposition, theory, or something that the team believes to be true. The goal of the load test process is to put the system into a simulated scenario that has the potential to prove one or more of the performance beliefs your team holds incorrect while providing the data necessary to resolve the issue.

Gathering data during a load test is critical, but can be a problem. The desire of all testing teams is to gather as much detailed data as possible while not affecting the results of the testing by performing this data collection. Not having the right data after a test may require another round of testing; finding an issue and discovering the instrumentation in place for the testing was the culprit causes the same issue. But turning on too much logging, and leaving it on by accident after testing is complete, might have a negative effect on application performance (see Andi Grabner’s post Top Performance Mistakes when moving from Test to Production: Excessive Logging).

Designing a test scenario to answer specific questions about the performance of the system, application, or end-to-end experience provides the What and Why of the load test.

How do we determine the load to use?

With the team knowing what and why they are performing the load test, the next question is how much load to use. This is perhaps the most complex component of the load testing process as there is no one standardized metric of load test volume used.
When working with customers to plan load tests, some typical statements I have heard to describe the amount of load the organization needs to generate include

  • We need to support 55,000 users
  • Our average hourly usage is 200,000 transactions
  • Our system can currently support 2,000 distinct user sessions
  • Customers average 5 minutes per session on our site

Let’s look at the first two of these to demonstrate why these statements on there may not be enough information to design the load test traffic profile.

Example 1: We need to support 55,000 users

While this statement makes it seem like the testing team will need to spin up 55,000 virtual users to successfully load test the system, it’s not quite enough information to make that decision yet. Load test teams should always ask a few more questions to make sure that everyone is on the same page before test begins.

Where did this number come from? If the application already exists and all that the load test does is test some changes or revisions, then there must be analytics data to support the 55,000 user number. If the application is brand new but there are similar applications in existence, then marketing team should be able to extrapolate potential figures from public information – and then share that data with you! If it is a new creation entirely, then some educated invention may be required based on type of launch, marketing, and business expectations.

When does this system see 55,000 users? In an hour? A day? A minute? This question is a follow-up to the first question as it helps the load test team determines the time period the application should be tested for. If the application sees 55,000 users over an hour, then a much lower level of virtual users is required for testing than if the application supports 55,000 users every second of every peak hour.

Were these concurrent or simultaneous users? If the site requires you to test 55,000 concurrent users over the span of an hour, then you may need far fewer virtual users than if the requirements state that 55,000 users simultaneously perform actions on the system over the span of an hour.

Is this a flash crowd or increasing volume? Some applications can encounter flash crowds where usage leaps from very low to well beyond maximum in an incredibly short period of time, usually due to some marketing event – Holiday sale, ticket on-sale, Super Bowl advertising, etc. A site must test the end-to-end system with a load that goes up to 55,000 users (and beyond) as quickly as possible because this is how customers will use the system. The event may only last 15 minutes, but a failure during this time will cost the company millions in lost revenue, wasted marketing spend, and public brand damage.

Ramped Load
Ramped Load
“Flash” Crowd - How the load is delivered against a system must reflect the manner in which customers actually increase over time
“Flash” Crowd – How the load is delivered against a system must reflect the manner in which customers actually increase over time

Just these few questions will help narrow and specify the load volume and delivery for the planned test. Just assuming that the number is as it is told the load testing team may lead to either an ineffective or under-loaded test or a test that cripples and breaks the systems because it overwhelms the expected volume.

Stefan Karytko adds some additional ramping examples at the Compuware APM Blog: Web Load Test Ramping Best Practices: Part 1 (http://goo.gl/1hsbZ) and Part 2 (http://goo.gl/pSPhJ).

Example 2: Our average hourly usage is 200,000 transactions

To the untrained ear, this will sound as though the current load test should aim to reach 200,000 user sessions over an hour. However, the goal of a load test isn’t to test an average hour; load tests are designed to test peak hours. Averages can (sometimes) smooth out the bumps and valleys of your traffic and disguise the actual value you need to test to.

As an example, take an organization that states that its average number of transactions per hour is 200,000. After a little more investigation (and a lot of free beverages for the marketing team!), you discover that for 20 hours every day, the application actually sees no more than 12,000 transactions an hour. However, during the four busiest hours of the day, analytics shows that there is actually an average of 1,140,000 transactions per hour. Imagine what would happen when the load team announces that the application passed a 200,000 transactions/hour test and then saw the application fall over when a peak day hit 1,145,000 transactions in an hour!

Load Testing requires a deeper look into the analytics for your site. The goal is to test Peak load not Average load
Load Testing requires a deeper look into the analytics for your site. The goal is to test Peak load not Average load

A formula that can be used to determine peak hours is to determine the volume of traffic for the busiest 6 hours on the busiest day since the last load test. Then, add in projected traffic increases that the customer wants to aim for. And finally, add a stretch goal that is 25-50% higher than the projected peak for the application.

Why go above the projected peak? The worst thing that can happen is to discover that the system breakpoint is just 3-5% above the peak you tested to when you have a new busiest day for your application (unusual stock market event, unprecedented news story, biggest online sale in the history of the business, etc.).

No matter what the goal of the load test is, one of the critical items is the creation of an accurate peak load that is reached in a manner that reflects how customers actually use your application. If you get the volume wrong, the data you get back will either provide a false sense of security due to a load that is too low to reflect customer behavior or cause unneeded expense to be incurred by testing a load that is beyond what the application will see for a number of years.

How do we want to simulate customer behavior?

Getting the amount of load right is important. Creating a load that emulates what customers do when they’re on the site is just as important as determining how many customers need to be tested. Not all visitors perform the same actions, view the same search results, or buy the same product when they are interacting with the application.

Each application has a unique user profile. Taking the time to review what percentage of customers perform key business functions on the site will help the load test team create a load that matches the volume that you are looking for, but simulates the interactivity of distinct application components under load conditions.

The testing team may test every dark nook and cranny when performing internal testing prior to release. But as the application creeps closer to public release, the test plan needs to become more customer-focused, highlighting the performance of two types of key customer paths:

  1. High value paths in terms of volume. These are the paths that see the most customer traffic and have the highest exposure to the outside world.
  2. High value paths in terms of revenue. These are the customer paths that may not get a high percentage of the total traffic, but that are most important for the company’s bottom line.
Designing your business process percentages helps shape the load test and generate a more realistic user load
Designing your business process percentages helps shape the load test and generate a more realistic user load

Using the filter of highest value transactions, regardless of volume or revenue, most application testing plans can be boiled down to a relatively small number of key business paths. For a retail application, some the typical paths are:

  • Homepage – The entry point for many customers, so you may want to run a small test at a high frequency to see if there are any potential issues at this initial point before moving on to more complex paths.
  • Search Results to Product Page – Without search, many retail sites would be extremely difficult to navigate through. This must be tested with a randomized set of search terms, including terms that customers enter that are not in your database – a failed search still forces the system to perform logic, such as the “Did you mean..?” or “Were you looking for…?” responses to near matches. Also, the item selected from the search results should be randomized to ensure that pages don’t get stored in application cache and appear faster than they really are.
  • Browse – The goal of this script is to push load through the entire retail site in a relatively unpredictable way so that as many categories, sub-categories and product pages as possible are hit.
  • Add-to-Cart – This takes the Product page to the next logical step: putting the product into the shopping cart.
  • Checkout and Abandon – This can have two flavors: Guest (customer does not have a system account) and Registered (customer has a system account and logs in). Most customers only go this far, as they have no way to disconnect the credit card processing system for the test.
  • Checkout and Pay – A small number of customers have a method to check the payment processing system under load, so they can proceed to this step using a test credit card or gift card.

Additional paths may be required for each individual retailer, but for the majority of testing these 5-6 paths cover a substantial percentage of customer traffic and touch on key technical and revenue components that affect the entire business.

This set of paths touches on the majority of third parties: search and catalog indexing systems; advertising; analytics tags; external shopping cart systems; payment processing vendors; and, of course, CDNs. For other types of firms – banking, insurance, media, B2B services, etc. – the revenue and traffic filters can also be applied to shape how a load test can be designed to most closely represent the vast majority of customer traffic. Understanding the transactions that are of most value to the organization helps shape the way that the load you designed interacts with the application being tested.

A critical addition to the test distribution is the use of randomization and variables, seen in the load distribution examples and in the description of the retail business paths. A load test is critically dependent on randomization and variables. Without this ability, a test that does nothing but hit the same page repeatedly using the same parameters does nothing more than the application’s ability to deliver content from its internal caches.

But in some instance, hitting the same page over and over may be the most desirable testing approach. While the totally random scenario may fit most retailers, the testing team may have a goal of determining what happens to the application if 2,000 people try to access a single product page and add the product to their cart at the same time. Sites that offer a small number of once-a-day bargains are an example of this scenario.

Who will be involved in the test executions?

The group that performs the load test is the most critical in the process. When you are running traditional load tests (inside the firewall, with internal application infrastructure and network components) assembling the team is fairly well-know – all appropriate systems and network teams, with some from vendor teams to support specific hardware or software components if needed.

When the focus moved to outside-in or web load testing, the number of people began to grow. Now testing teams had to include not just people from the internal application and network teams, but also people from connectivity providers, hosting providers, third-party services, and CDNs. This often made load tests more chaotic, as the amount of data collected began to balloon in size, with no centralized platform to collect, process, and report on telemetry. While this new customer-focused approach delivered new insights that helped companies avoid potential performance disasters, some critical results may not be immediately found in the large and wildly varied datasets collected during a load test.

With organizations integrating a new generation of application management tools into their systems alongside the outside-in load testing tools, the number of people on a load test call may, for the first time in many years, start to go down. These tools integrate metrics across a number of application layers in to a single interface, highlighting specific network, application, and code information more quickly. These systems can also integrate data from external monitoring systems, such as web performance monitoring services, to further correlate performance issues that appear during load testing to specific points on the end-to-end path of an application.

The key is to identify all of the systems and services your application touches before a load test begins. Even with these new systems, people still need to know that a load test is occurring. If a third-party service begins demonstrating performance issues or an internal system needs to be more deeply analyzed due to a problem identified during the test, key members of the affected teams will need to be brought in to help troubleshoot and resolve the issues.

What next?

No matter the approach your team takes, the goal of load testing is to simulate the visitor traffic that is expected under peak conditions as accurately as possible. If there is a problem with the volume of load or the business process distribution used during the test, the results that could create a misleading picture of how the application and its various components will behave under load.

The rule with load testing is “Assume Nothing”. Just because the responsible team states that the application, web server farm, load balancer, database, or firewall should behave in a certain way during the test, it is better to find out that there is a gap between assumption and reality under intense and complex load scenarios before customers find it for you.

And if a test is successful, dig a little deeper. You and your team can only be truly secure about a load tests when everyone can answer this question: Was the test successful actually successful, or did it just validate that one (or more!) of the test configuration parameters was incorrect?

Resources

Steve Bennett – House MD: Solving Complex IT Issues Using Differential Diagnosis – http://goo.gl/WsfAa