Visual Studio 2010 is almost here – Microsoft just released the first Release Candidate which looks pretty solid and good. Microsoft added new interfaces for performance management solutions like Dynatrace to extend the Web- and Load-Testing capabilities (check out Ed Glas’s blog on what’s in VSTS Load Testing) to go beyond .NET environments and deeper than what Load Testing Reports tell you about the performance of the tested application.But before we go into what can be done by extending Visual Studio – lets have a look of what we get out of the box:
Standard Load Testing Reports from Visual Studio 2010
While running a load-test Visual Studio 2010 is collecting all sorts of information. Starting from the response times of the executed requests, performance counters of the tested application infrastructure (like CPU, Memory, I/O, …) and also the health of your load-testing infrastructure (load controller and agents). In my scenario I run a 4 tier (2 JVMs, 2 CLRs) web application. The 4 tiers communicate via SOAP Web Services (Axis->ASMX). The frontend web application is implemented using Java Servlets. I run a 15 minute load test with increasing load. The test is structured into multiple different transactions, e.g.: Home Page, Search, Login, BuyDirect, … – While running my test I also monitor all relevant performance counters from the application server and the load testing infrastructure. Visual Studio 2010 allows me to monitor the current state of the Load Test via configurable graphs as shown here:
The graphs show that response times of some (not all) of my transactions increase with increasing user load. It also highlights that CPU usage on my application server became a problem (exceeds 80% with ~20 concurrent users). At the end of the load test a summary report highlights what load was executed against the application – which errors happened and which pages performed slowest:
Switching to the Tables view gives a detailed breakdown into individual result dimension, e.g.: Transactions, Pages, Errors, … :
From the table view we can make the following observations:
- 553 page requests exceeded my rule of 200ms per page
- the 553 pages were the menu.do, netpay.do and userlogin.do (you can see this when you look at the individual error requests)
- The LastMinute transaction was by far the slowest with 1.41s average response time and a max of 5.64s
What we don’t know is WHY THESE TRANSACTIONS ARE SLOW: The performance counters indicate that CPU is a potential problem but it doesn’t give us an indication what caused the CPU overhead and whether this is something that can be fixed or whether we are just running against our system performance boundaries.
Performance Reports by Dynatrace captured during Visual Studio 2010 Load Test
dynaTrace customers can download the Visual Studio 2010 plugin on the dynaTrace Community Portal. The package includes a Visual Studio Add-In and a Visual Studio Testing Plugin Library that extends its Web- and Load-Testing capabilities. We also offer the Automatic Session Analysis plugin that helps in analyzing data captured during longer load tests.
I used dynaTrace Test Center Edition on my 4 tier application while running the load test. The Visual Studio 2010 plugin made sure that Dynatrace automatically captured all server-side transactions (PurePath’s) in a Dynatrace Session. It also made sure that the same transaction names used in the Web Test script were passed on to Dynatrace.
While running the load test the Load Testing Performance Dashboard that I’ve created for my application allows me to watch the requests that come in and the memory consumption on each of my JVMs and CLRs. I can also see which Layers of my application contribute to the performance – with layers being ADO.NET, ASP.NET, SharePoint, Servlets, JDBC, Web Services, RMI, .NET Remoting, … – Dynatrace automatically detects these layers and it helps me to understand which components/layers of my app actually consume most of the execution time and how increasing load is affecting these components individually. Besides that I also watch the number of SQL statements executed (whether via Java or .NET) and also the number of Exceptions that happen:
On the top left I see the individual transaction response times and the accumulated transaction counts underneath. These are the number of incoming requests were it is easy to see how VS2010 increased the load during my test.
On the top right I see the memory usage of my two JVMs and underneath the memory usage of my two CLRs (seems I have a nice memory leak in my 2nd JVM and one very “quiet” CLR.
The bottom left chart (titled with Layer Breakdown) now shows me what’s going on within my application with increasing load. I can see that my application scales well untill a certain user load – but then the Web Service Layer (dark gray color) starts performing much worse than all other involved application layers.
On the bottom right the number of database statements and number of exceptions show me that these counters increase linearly with increasing load – but – it seems we have quite a lot database queries (up to 350/second) and we also have quite a lot exceptions that we should investigate.
After the load test is finished the first report that I pull up is a report that shows me the slowest web transactions grouped by the transaction names used in Visual Studio:
I can see that the LastMinute is indeed the slowest transaction with a max of 5.6 seconds. The great thing about this report is that we get a detailed breakdown of these top transactions into application layers, database calls and method calls. We can immediately see that Java Web Services are the highest performance contributor to the Last Minute transaction. We also see that we have several thousand database queries for the 448 requests to this transaction and we also see which Java & .NET methods contributed to the execution time. A click on Slowest Page opens the PurePath Dashlet showing every individual transaction that got executed. Sorting it by duration shows the big variance between the execution times. The PurePath Hot Spot View makes it easy to spot the most contributing methods in the slowest transaction:
With the PurePath Comparison feature I go one step further to find out what the difference between two transactions that show a big execution time difference are:
Visually in the Chart as well as in the PurePath Comparison Tree we see that getting the SpecialOffer’s and all calls in that context (creating the web service and calling it) make up most of the time difference. The difference table on the bottom lists all timing and structural differences between these two PurePaths giving even more insight into where else we have differences.
Show me the PurePath to individual failed Web Requests
In your VS2010 Run Configuration for your load test you can specify to store detailed response results in a SQL Database. This allows you to look up individual failed transactions including the actual HTTP traffic and all associated timings after the load test is finished. In my case I had another slow transaction type called BuyDirect. Via the VS2010 Load Testing Report I open individual failed transactions and analyze the individual requests that were slow:
The result view shows me that the request took 1.988s. The Dynatrace VS2010 Plugin adds a new tab in the Results Viewer allowing me to open the captured PurePath for that particular slow request by clicking on a PurePath link. Clicking on this link opens the PurePath in the Dynatrace Client:
We can easily spot where the time is spent in this transaction – it is the web service call from the 2nd JVM (GoSpaceBackend) to the CLR that hosts the Web Service (DotNetPayFrontend). One of the problems also seems to be related to the exceptions that happen when calling the web service. These are exceptions that didn’t make it up to our own logging framework as they were handled internally by Axis but are caused by a configuration issue (we can look at the full exception stack trace here to find that out). With one further click I go ahead and look at the Sequence Diagram of this transaction. This diagram provides a better overview of the interactions between my 4 different servers:
The sequence diagram goes on beyond what’s in the screenshot – but I guess you get the idea that we have a very chatty transaction here.
The Dynatrace VS2010 Plugin allows me to drill down to the problematic methods in a distributed heterogeneous transaction within a matter of seconds saving me a lot of time analyzing the problem based on the load testing report alone.
Share results with Developers and lookup problems in Source Code
Now we have all this great information and already found several hotspots that our developers should look into. Instead of giving my developers access to my test environment I simply export the captured data to a Dynatrace Session file and attach it to a JIRA issue (or whatever bug tracking tool you use) that I assign my dev. I can either export all captured data (PurePaths and performance counters) or be more specific and only export those PurePaths that have been identified as being problematic.
Development picks up the Dynatrace Session file, imports it into their local Dynatrace Client and analyzes the same granular data as we analyzed in our test environment. Having the Dynatrace Visual Studio 2010 Plugin installed allows the developer to Lookup individual methods in Visual Studio starting from the PurePath or Methods Dashlet in the Dynatrace Client:
The Dynatrace Plugin in Visual Studio – where you have to have your solution file opened – searches for the selected method, opens the source code file and sets the cursor to that method:
The data is easily shareable with anybody that needs to look at it. Within a matter of seconds the developer ends up at the source code line within Visual Studio 2010 that represents a problematic method in terms of performance. The dev also has all the contextual information on hand that shows why individual executions of the same transaction were faster than others, as the PurePath’s include information like method arguments, HTTP parameters, SQL Statements with Bind Variables, Exception Stack Traces, … -> this is all information that developers will love you for 🙂
Identify Regressions across Test Runs
When running continuous load tests against different builds we expect performance to get better and better. But what if that is not the case? What has changed from the last build to the current? Which components don’t perform as well as they did the build before? Has the way we access the database changed? Is it an algorithm in custom code that takes too much time or is it a new 3rd party library that was introduced with this build that slows everything down?
The Automatic Session Analysis plugin also analyzes data across two load testing sessions generating a report that highlights the differences between these two sessions. The following screenshot shows the result of a load testing regression analysis:
It shows us which transactions were actually executed in the latest (top left) and previous (top right) build. In the middle we get an overview about which layers/components contributed to the performance in each of the two sessions and also shows a side by-side comparison (center) where the bars tell us which components performed faster or slower. It seems we had some serious performance decrease in most of our components. On the bottom we additionally see a comparison between executed database statements and methods. Similar to what I showed in the previous sections we would drill into more details from this report to analyze more details.
Visual Studio 2010 is a good tool for performing load tests against .NET or Java Web Applications. The Load Testing Reports have been improved in this version and allow you to get a better understanding about the performance of your application. For multi-tier or heterogeneous applications like the one I used in my scenario it is now easy to go beyond the standard load testing reports by using an Application Performance Management Solution like Dynatrace. The combination of a load testing solution and an APM solution helps you to not only know that you have a performance problem but it allows you to identify the problem faster and therefore reduce test cycles and time spent in the testing phase.
There is more to read if you are interesting in these topics: White Paper on how to Automate Load Testing and Problem Analysis, webinars with Novell and Zappos that use a combination of a Load Testing Solution and Dynatrace to speed up their testing process as well as and additional blog posts called 101 on Load-Testing.
Feedback is always welcome and appreciated – thanks for reading all the way to the end 🙂
UPDATE: We just recorded a video that shows the integration and put it on our website: dynaTrace Plugin for Visual Studio 2010 Ultimate: 5-minute demo