If you follow my blog posts you know that I am a big fan of metrics as they tell us what’s going on in our application and even with a single integration, unit or manual test we can uncover architectural issues that will not allow your app to scale. Thanks to A. Alam – a performance engineer working for Infosys Ltd. who conducts large scale load tests for a very large enterprise – I am adding a new metric to my list: # of Dynamically Created Classes.

During a load test Mr. Alam and his team ran both SOASTA (for generating load) and Dynatrace to analyze the performance behavior of their app under constant load. Here is the SOASTA Performance Report:

Load Testing allows us to reveal long term performance behavior of an application. Constantly growing response time with constant load is typically a good sign for resource leaks. Not in this case as we learned!
Load Testing allows us to reveal long term performance behavior of an application. Constantly growing response time with constant load is typically a good sign for resource leaks. Not in this case as we learned!

The load testing report alone doesn’t tell you why the Mobile Search API got gradually slower and after about 6 hours became terrible slow. By looking at the Dynatrace data which captured every single simulated request from SOASTA over the full 10+ hours they identified that the problem was caused by excessive Java Class Loading. Especially after about 6 hours class loading is clearly the performance hotspot as shown in one of my favorite charts in Dynatrace I always start looking at when analyzing longer timeframes of performance data: The Layer Breakdown:

The Layer Breakdown that you get for each individual type of test or business transaction makes it easy to spot that Class Loading is the main culprit responsible for the performance degradation.
The Layer Breakdown that you get for each individual type of test or business transaction makes it easy to spot that Class Loading is the main culprit responsible for the performance degradation.

Before I walk you through their diagnostics steps to find the root cause and how they finally fixed the problem I want to “spoil the fun” by jumping right to the root cause they found by looking at Dynatrace PurePath and some additional log information: The excessive class loading was caused by the XSLT library used to transfer XML Responses of a backend web service in a format their business logic methods required. The significant jump after 6 hours seems to be related when the application crosses the threshold of having 2.5 million dynamic created classes loaded and unloaded. That happened after about 6 hours into the load test.

This is a very interesting problem pattern you can really only find with a long running constant load load-test. The general “excessive” creation of Java Classes is a problem pattern that we can however find with simple unit-, integration- or even manual tests. Simply watch these metrics. If your code starts creating a lot of short living dynamic classes you want to ask yourself the question whether this is intended.

Now more details about their diagnostics steps and the way they solved that problem. If you want to do a sanity check on your application feel free to sign up for the Dynatrace Free Trial and monitor these metrics on your local machine, CI or load test environment.

Step #1: Looking at the right metrics

From the Layer Breakdown they knew that the problem has to be within Java Class Loading. The best metrics here to look at are the number of Active Classes and the Number of Unloaded Classes. The following graph shows that the active classes that remain loaded are rather constant – but – the Total Number of Unloaded Classes is alarming with 2.5 Million within 6 hours:

Excessive amount of Unloaded Classes while the # of Active Classes stays constant means that many dynamic classes are created for a short time. Very common with XML, XSLT but also OR-Mappers
Excessive amount of Unloaded Classes while the # of Active Classes stays constant means that many dynamic classes are created for a short time. Very common with XML, XSLT but also OR-Mappers

Here I hope to get some clarification and feedback from you – my readers: what’s your take on this? 2.5 million Classes within 6 hours. Have you seen this before? Are you aware that certain JVMs are showing general performance issues when you hit a certain peak number of classes created? Any JVM Insiders around that can explain that?

Step #2: Finding the Caller

Because Dynatrace captures every single request with all details available – independent of whether this is a fast, slow, successful or failing request – Mr. Alam’s team had all the evidence to find the code that actually creates all these dynamic Java classes:

The PurePath shows which code is responsible for making calls into the ClassLoader to define new classes.
The PurePath shows which code is responsible for making calls into the ClassLoader to define new classes.

The PurePath itself reveals all the details needed to find the component that creates these classes as PurePath always contains the full execution trace. The team from Infosys went one diagnostic step further and wanted to see the name of the classes created. To achieve that there are two options:

  1. With Dynatrace you can create a custom sensor and can instrument the defineClass method and capture the class name. Then you would also have that information on the PurePath.
  2. They decided to turn on some verbose logging in the JVM to get a list of classes. It is also a good option but I would almost recommend using the Dynatrace Custom Sensor.

They found the main class being created being called gregorsamsa – which – after some googling pointed them to this: Who is Gregor Samsa 🙂

Step #3: Solving the issue by using JSON

Their solution of solving that problem resulted in quite some effort on the engineering side. As the XSLT transformation didn’t work in the long run they decided to convert XML into JSON using org.json. The business logic was then applied on the JSON Object. That step got rid of all XSLT transformation and the overhead of the used library with creating and disposing all these classes.

Step #4: Validating the Improvement

They validated the new approach by monitoring the same key metrics as before: number of loaded and unloaded classes. If you compare the following graph with the one from the initial load test you see that they got rid of the millions of unloaded classes:

This reflects a “normal” behavior of class loading in a regular application. Aligned Loaded and Unloaded classes means that we don’t have the high number of dynamically created classes any longer that impact performance
This reflects a “normal” behavior of class loading in a regular application. Aligned Loaded and Unloaded classes means that we don’t have the high number of dynamically created classes any longer that impact performance

The Layer Breakdown also proofs that Class Loading is no longer impacting response time from the start nor are there any spikes after 6 hours:

Instead of Class Loading we see JSON processing without having the “leaking” performance affect over a longer period of time
Instead of Class Loading we see JSON processing without having the “leaking” performance affect over a longer period of time

What’s your problem that you can only find under heavy load?

As mentioned above – I am a strong believer that metrics-driven performance engineering needs to be applied early on and that we can find most problems with simple tests. Certain long term and load related impacts can however only be seen under constant load. Now – let me know if you have a story like this to tell. Feel free to use Dynatrace and identify a metric that makes it easier to identify the actual problem pattern. Who knows – maybe you find a metric that will help you find these problems earlier and with that reduce your load testing cycles. Feel free to share your data with me through my Share Your PurePath program.