MicroFocus and dynaTrace recently announced “SilkPerformer Assurance” and with that announcement comes a tighter integration between MicroFocus’s Load Testing Tool SilkPerformer and dynaTrace’s Application Performance Management Solution for Load Testing.

Enough has been said about the partnership in the actual press release. Now it is time to look behind the marketing curtain and explore the details of the integration between SilkPerformer and dynaTrace. As I worked on the SilkPerformer Development Team in my “previous life” it is a pleasure for me to walk you through the following use case scenarios:

  • End-to-End Performance Analysis with the new SilkPerformer Browser-Driven Web Load Testing feature
  • Root Cause Analysis of failed load testing transactions and
  • Speeding Up and Reducing Load Testing Cycles

End-To-End Performance Analysis with Browser-Driven Web Load Testing

Load-Testing tools traditionally created load by “simulating” the HTTP Requests a browser would execute when visiting those pages defined in the load-testing script. This worked extremely well and allows executing not only one user that drives a browser but allows the simulation of hundreds or even thousands of virtual users on a single box. With the rise of highly dynamic web 2.0 applications it got harder and harder to create testing scripts that do exactly what the browser would do with an application that is heavy in JavaScript/AJAX. The guys who build SilkPerformer therefore now offer a new feature they call Browser-Driven Web Load Testing.

Combining their functional testing technology from SilkTest with their Load-Testing technology from SilkPerformer allows them to a) use a real browser to execute the test scripts to be as accurate as possible and b) run multiple browser instances on a single box.

In my test scenario I clicked through different pages on my sample application. After replaying this Browser-Driven Test Script I continue to explore the TrueLog to learn more about what happened during the test run. The following screenshot shows TrueLog Explorer with the opened TrueLog of the recent test run. On the left side we see one node for each individual script action, e.g: BrowserNavigate or BrowserLinkSelect. Underneath each node we get a list of all HTTP Requests sent to the server. This gives us insight into what resources really get downloaded per page, where time is spent and how large our individual pages are.

Clicking on one of these action nodes shows the screenshot that was taken at the time of that action. Below the screenshot we get to see additional information for this action, e.g:  page time, download sizes, HTTP Headers or raw HTTP Content. The following image shows us the statistical tab where we get to learn that 7.4 seconds are actually spent in calling the menu.do action. All other resources on that page (like StyleSheets or Images) don’t have a real performance impact:

Statistics highlight that 7.4 seconds are spent on the server in menu.do
Statistics highlight that 7.4 seconds are spent on the server in menu.do

The dynaTrace tab shows whether dynaTrace captured PurePath’s for the currently selected test step – in our case we are especially interested as we want to figure why menu.do took almost 7.4 seconds. The dynaTrace Tab shows the link to the PurePath. Clicking on the link opens this particular PurePath (for the menu.do request) in the dynaTrace Client. If the dynaTrace Client is not yet up and running it gets started for me:

Drill from TrueLog to PurePath to find the root cause of the 7.4s execution time
Drill from TrueLog to PurePath to find the root cause of the 7.4s execution time

The PurePath not only shows us the full transactional trace of that request – it highlights those methods that contribute the most to the transaction execution time making it easier to identify our hotspots. With the contextual information such as log messages, SQL statements or method arguments it is easy for our engineers to identify why in this particular case the execution took 7.4s where on average we have much better execution times.

The benefit of the combination of Browser-Driven Testing and dynaTrace Performance Management is to analyze performance End-to-End, starting from the browser all the way back to the database. The ability to analyze all individual network requests tells us whether the problem is related to slow loading resources (like images, css, javascript files) or slow application-server response times. In the latter case dynaTrace has collected all the data necessary for developers to analyze the root cause.

Root-Cause Analysis of failed load-testing transactions

I already introduced the SilkPerformer TrueLog which contains the “True” Log Information for a simulated user. When running really huge load tests with thousands or millions of executed transactions it is not practical to record the TrueLog information for every transaction. It would consume lots of storage and would put a lot of pressure on your I/O. A TrueLog for every successful transaction would also not be of much value. It is, however, very beneficial to get a TrueLog for those transactions that actually experienced a problem such as a too long page time or failed content verification. SilkPerformer has a feature that is called “TrueLog on Error”. As the name implies – the TrueLog gets only written to disk in case a simulated Virtual User experienced a problem.

After a load test SilkPerformer gives you a nice Performance Report showing you the characteristics of the test, e.g.: number of simulated users, transaction throughput and response times, error count, …

Load Testing Report showing increasing throughput and Errors with increasing load
Load Testing Report showing increasing throughput and Errors with increasing load

I enabled “TrueLog on Error” which now allows me to analyze all errors by exploring the captured TrueLog’s and from there figure out whether the problem is related to e.g.: slow requests on static resources, slow requests to the application server or requests to the application server that returned an error or incorrect page. The following image shows the workflow of getting from an individual error in the load testing report to the TrueLog that shows the error in context of the individual user session and finally to the PurePath that shows us exceptions thrown in a backend-web-service call that caused the load testing transaction to fail:

Drilling from an Error in the Load Testing Report to the Exception in a Backend Web-Service Call
Drilling from an Error in the Load Testing Report to the Exception in a Backend Web-Service Call

The benefit of this integration is that you are no longer stuck with the error information as responded by the tested application. With TrueLog on Error and dynaTrace all the information is available that a developer needs to figure out what went wrong in a particular error situation. TrueLog on Error tells you whether the problem is related in the network or on any static delivered content. In case the problem is in the application the TrueLog on Error has the link to the PurePath for the individual failed request.

Speeding up and Reducing Load Testing Cycles

SilkPerformer is without a doubt a great load testing tool where testers can create new load-testing scenarios very easily with its built-in recording and script customization features. Maintaining the scripts is also easy as the tester is supported with a powerful scripting language and TrueLog Explorer which provides a visual way to customize and maintain testing scripts.

Creating, maintaining and running scripts is, however, only one part of the overall testing process. Analyzing the data and identifying the root cause of an identified performance problem usually takes much longer than the actual test itself. The problem is the lack of in-depth data that gets collected by the load testing tool itself. Knowing that a page slowed down with increasing load and that CPU went up as well gives us the hint that CPU is very likely the problem – but – it doesn’t tell us which component in our application consumed the CPU and whether this can be changed to perform better or whether this is a scalability limitation of our application.

After running a Load Test SilkPerformer generates a great Load Testing Report that contains all necessary information collected from the simulated Virtual Users like Throughput, Response Time or Error Count. The report analyzes all transactions and generates sections in the report highlighting the slowest or most-resource-consuming transactions. SilkPerformer also provides a feature to correlate these measures to measures captured from your application such as CPU, Memory or Network Utilization. If you have an application with very large pages you may simply run out of network bandwidth and SilkPerformer is able to show you this correlation automatically. Very often it is not that easy and the root cause cannot be identified by correlating different performance counters. From the Load Testing Report’s Section showing the slowest web pages SilkPerformer offers a Drill-Down to the data captured by dynaTrace. This Drill-Down allows context-specific analysis of application performance data. Instead of analyzing all data, the Drill Down filters the data to only show it only for the e.g.: Slowest or Largest Web Transaction:

Drill Down to dynaTrace shows which components contribute to the slowest transaction
Drill Down to dynaTrace shows which components contribute to the slowest transaction

The Drill Down shows which Components or Layers of the tested Application contribute how much to the transaction’s response time. From here we can drill down further or use the Auto Session Analysis Feature to let dynaTrace provide us with a report that highlights the biggest problem of this particular transaction:

dynaTrace Auto Analysis highlights the problems of the slowest transactions down to SQL and method level
dynaTrace Auto Analysis highlights the problems of the slowest transactions down to SQL and method level

The benefit of a Load Testing Tool like SilkPerformer in combination with an Application Performance Solution like dynaTrace allows rapidly creating and executing load tests and automatically collecting the information necessary to speed performance problem analysis in order to keep test cycles short. Besides speeding up problem analysis the integration also speeds up regression analysis. When running load tests for every build it is important to identify introduced regressions as fast as possible. Otherwise smaller regressions sum up over time and end up being a huge problem that gets more costly the longer you wait to fix the issues. Both MicroFocus and dynaTrace offer regression analysis across Load Tests. SilkPerformer can compare Load Testing Results such as Response Times, Throughput, CPU, Memory, … across builds. dynaTrace can compare application performance data such as method, SQL or service execution to identify which code change negatively (or maybe positively) impacted performance from build to build. The following image shows a dynaTrace regression dashboard comparing highlighting the differences of two builds down to Component, SQL and method level:

Regression Dashboard shows performance regresssions on component, method and database level
Regression Dashboard shows performance regressions on component, method and database level

Conclusion

Testing – and especially Load and Performance Testing – is a critical task in the development lifecycle. Being equipped with the right set of tools to conduct load tests and to analyze the results makes the testing process more efficient and therefore ensures higher quality and lower costs. SilkPerformer and dynaTrace are well integrated and a great solution for pro-active application performance management.

There is more to read if you are interesting in load testing: A White Paper on how to Automate Load Testing and Problem Analysis, webinars with Novell and Zappos that use a combination of a Load Testing Solution and dynaTrace to speed up their testing process as well as and additional blog posts called 101 on Load-Testing.