Special thanks for this great story to my colleagues Shaun Gautz and Andrew Samuels – two Dynatrace Guardians helping our customers to build better web applications.

Struts is a framework very commonly used for building Java Web Applications. It’s also been used as the main web framework for a new car rental platform this story is based on. During load and performance testing it was discovered that the application couldn’t sustain a fraction of the load that was expected. Response Times went through the roof until the application server actually declined new requests. The Application Servers were properly sized and configured with enough worker threads – yet still the application failed to deliver the promised performance.

Using Dynatrace, it turned out that a well-known bug in Struts 2.3.8 (WW-3971 Apache Jira Ticket) – which causes repeated synchronized read access to the struts configuration files – caused this problem. In case you are running on Struts make sure to watch out for this known issue. If you are not sure whether this – or similar problems – impact your application continue reading this blog and follow the same steps as Shaun and Andrew used to identify that issue:

Step #1: Get Dynatrace installed and setup

Don’t have Dynatrace yet? Get your 30 Days Free Trial and follow the guides on the community to set it up for your application. If you need a longer license let us know. Through our Share Your PurePath program we are happy to give you an extended license for your local environment.

Step #2: Execute some load

This company had their own load testing software in place. If you use load testing tools (commercial or open source) check the Dynatrace Load Testing Integration options.

Step #3: Identify Problematic Layer (=API)

There are different approaches in Dynatrace to analyze the captured data. I really like the approach that Shaun and Andrew followed. They pulled up the Layer Breakdown Dashlet which gives an overview of time spent in the different logical tiers (we also call them APIs) of your application. Dynatrace automatically detects tiers based on the executed code, e.g: Servlet, Web Service, JDBC, … – the following screenshot shows what they had to deal with. It is easy to spot that the OpenSymphony API is the main contributor once there was some load on the system:

The Layer Breakdown is a great way to spot logical application layers in your app that don’t scale with increasing load
The Layer Breakdown is a great way to spot logical application layers in your app that don’t scale with increasing load

Tip: If you don’t like the automatically detected Layers (=APIs) you can define your own APIs. Open your System Profile and specify which code packages, e.g: com.yourcompany.com should be assigned with which logical API.

Step #4: Identify Root Cause

Knowing which Layer is problematic allows us to focus on the methods/code that execute in that layer. The Methods Hotspot Dashlet shows the top methods that contributed to the response time. There is one method that stands out: getConfiguration from the OpenSymphony Configuration Manager. What’s even more interesting is that most of the time is actually spent in synchronization – which means that this method is actually not doing a whole lot but it is waiting most of the time to enter a synchronized code block. We also learn who is calling this method – seems it’s called by every instruction they have on their JSP page:

The Methods Hotspot immediately shows which methods contribute to Response Time, whether it is CPU, Sync, Wait or I/O as well as who is calling these methods. In this case it seems that every JSP instruction is calling getConfiguration which spends most of its time in sync.
The Methods Hotspot immediately shows which methods contribute to Response Time, whether it is CPU, Sync, Wait or I/O as well as who is calling these methods. In this case it seems that every JSP instruction is calling getConfiguration which spends most of its time in sync.

Tip: You can right click on any method and drill directly to the PurePath. You will then automatically see those PurePaths that called that particular method.

Step #5: PurePath – The Ultimate Proof

PurePath is the core technology behind Dynatrace. A PurePath gets captured for every request in your system – whether executed on your local machine, in a load test or in your production environment. It captures context data such as URL, Parameters, Exceptions, SQL Statements and also performance hotspots like these methods calls to getConfiguration – always full End-to-End (from Browser to Database). We don’t sample – nor do we capture averages as we understand that you need to have all data available to analyze real life problems.

So opening the PurePath Dashlet is the next and ultimate step for diagnostics. It shows each individual request that came in during the load test. Selecting one of these Paths shows exactly what we already knew from the Methods Hotspot. Seems like every instruction step on the JSP page gets compiled in a way that it calls getConfiguration. getConfiguration is a synchronized method which is why most of these threads block each other not allowing the application to scale:

The PurePath Dashlet is the ultimate proof on what goes wrong. The parallel requests all block each other because getConfiguration – which is called hundreds of times per request - tries to enter a synchronization block.
The PurePath Dashlet is the ultimate proof on what goes wrong. The parallel requests all block each other because getConfiguration – which is called hundreds of times per request – tries to enter a synchronization block.

Tip: Most times you deal with performance problems in 3rd party code – just like here. Dynatrace allows you to right click on that method in the PurePath and actually view the Source Code. Because we do have the original byte code we can show you the actual Java Source Code for every method in your application. Neat – isn’t it? J

Step #6: Solving the Problem with a little Google Search

Having all this information with a couple of clicks allowed them to perform a Google Search which pointed them to the WW-3971 Apache Jira Ticket. The problem – a thread synchronization problem – is already addressed in a more recent version of Struts. Updating to a newer version now allows this global car rental company to deploy the new version of their car rental site knowing it can handle the expected load.

Call to Action: Let me tell YOUR story!

I think these are great stories –another big thanks to all of you that keep feeding them to me J – If you have a story on your own let me know. If you have PurePaths or screenshots to share check out our Share Your PurePath program. Happy Dynatrace’ing!