Data-driven performance problems are not new. But most of the time it’s related to too much data queried from the database. O/R mappers like Hibernate have been a very good source for problem pattern blog posts in the past. Last week I got to analyze a new type of data-driven performance problem. It was on an application processing rather complex online documents. A representative document that takes 42s in testing takes almost 5.5 minutes in production – that’s roughly 8 times slower!
I received two stored Dynatrace session files – one from their test environment, the other from production – which I analyzed. A first comparison of these two requests in the two different environments makes you believe that it is a classical data-driven performance problem because we see 3 times as many SQL executions against Oracle. The following two screenshots show the Transaction Flow of the same request in each environment. The highlighted numbers indicate the key performance metrics end user response time, frontend server response time and number of database roundtrips.
So – 3 times as many SQL Statements in production. That’s our main culprit – correct? Almost – but it is just part of the story. The following screenshot shows performance hotspots by API (=Logical Application Layers) and the differences between test and production. It is easy to spot that Hibernate plays a major role – that’s the classical data driven problem. We also see that ClassLoading, XML and Custommonkey (XML Framework) are the next big chunk that doesn’t show up in testing at all. On the other hand – the very typical high time spent in I/O for web based applications – is not among the top hotspots in production:
Let’s have a closer look at Hibernate but especially at the APIs Classloading, Custommonkey and XML Processing. The latter 3 turned out to be related to a web service data-driven performance problem in combination with an inefficient XML Parser.
#1 Problem: Known Hibernate Performance Problem
Instead of spending too much time on Hibernate, the things that can go wrong, and how to best optimize it, I will refer you the following Hibernate blog articles by Alois Reitbauer: Hibernate Performance.
In this particular case, Hibernate spent a significant time in calling Java Reflection to figure out which interfaces a certain object implements. The following screenshot visualizes the exact call chain that leads to the getInterfaces call. A Google search reveals that this is actually a well-known Hibernate Performance Bug: Performance Hotspot in FieldInterceptionHelper – HHH-6735.
This performance problem would also be visible in testing but didn’t show up that significantly because the test database used contained fewer records. Therefore Hibernate didn’t have to load that many objects and didn’t spend that much time in getInterfaces.
The two lessons learned on Hibernate therefore are:
- Stay up-to-date with frameworks such as Hibernate
- Always test with production data sets and not just test data
#2 Problem: XML Payload and Parser Issues
Besides Hibernate we saw 3 APIs that were not in the top hotspots in the test environment: Classloading, Custommonkey and XML Processing. A closer look revealed that all methods executed in these three logical application layers are attributed to processing XML. The following screenshot shows the top method hotspots from each of these layers:
The question that comes up is: What was the difference to the test environment? Why were the same methods not a hotspot in testing? Our detailed analysis resulted in finding two revealed differences:
a) A much larger XML Payload of internal web service calls
b) A different XML Parser being used caused by an upgrade from Oracles JDBC Driver
XML Payload Issue
The database used in testing had significantly fewer records as already highlighted in the Hibernate performance analysis. One of the internal SOAP-based Web services therefore had a much larger XML Payload in production. This resulted in much higher processing time in parsing and transforming the web service response on the frontend application tier. Therefore, the lesson learned here is the same as with Hibernate: Use a production like database and also adapt your tests to use realistic input arguments.
XML Parser Issue
Another interesting finding was that the production system showed totally different XML Parsers being used than in the test environment. It turned out that an upgrade of an Oracle JDBC Driver in production caused Oracles XML Parser to be loaded by default instead of the XML Parser provided by Sun. This was actually a known fact which caused the engineers to implement a workaround to fix this problem. The workaround was to manually overwrite the XML Factories. Due to a bug in that implementation the application loaded yet another set of parsers (XALAN) than what was originally used (SAXON).
Problems Fixed – Performance Back to Normal but Still Not the Same
After addressing the following problems, response time of these requests were significantly cut down:
a) Only load the data from the database that is really needed
b) Optimize usage of Hibernate
c) Going back to the previous used XML Parser
The result after these initial fixes is already very promising. The following transaction flow of the current production environment shows fewer SQL Interactions and a response time of 1.4m instead of 5.5m:
The time in the test environment for the same transaction is still about 50% faster than in production. That however can mainly be explained by not yet using a production like database. Once that problem is solved they can focus on optimizing performance based on real production data in their test environment before they need to firefight in production.
What’s your experience? Did you run into something similar like this as well? Or do you have more examples on classical vs. new data-driven performance problems?
If you are interested in Dynatrace register, download and evaluate it on your own application: Get your own Dynatrace Personal License.