As a performance architect I conduct many performance tests against Java Enterprise apps, analyzing performance, scalability and stability issues of the application under load. One of our recent tests was against an application that kept crashing with “Out of Memory Exceptions” whenever we ran our JMeter-based Load Tests. Typically, the first thought in such a case is usually: this is probably a memory leak!

In this blog I’ll walk through the steps taken to analyze the issue, which tools were used, explaining background information on Java Garbage Collection, as well as how this problem was resolved. Note: it wasn’t a Memory Leak but a Logging mechanism that allocated too much memory, crashing our Java Virtual Machines (JVMs).

The Test Environment

Java performance testing is the science of discovering problem areas in application under varying load conditions. That said, selecting the right tools is key to performance load test success.

For this scenario our Application Under Test (AUT) was a Java-based application hosted in Amazon AWS. My team was tasked to analyze some serious memory issues. In order to help identify the problem we used JMeter 2.12 to generate synthetic load, Dynatrace AppMon for Performance Monitoring, as well as CloudWatch!

Out-of-Memory Exceptions: Observed Application Issues

The application was observed to crash within the first few minutes of steady state load test execution. To better understand the issue from a higher level we executed the following tasks:

  • Gradually decreasing the load: Application still continued to crash even at 1 transaction per second (TPS)
  • Increasing the number of servers: Being hosted in AWS gave us the opportunity to auto scale new machines during the test; however that only delayed the environment crash
  • Upgrading server configuration: This defies the aim of test to enable best performance with optimum hardware usage

A peek into Dynatrace revealed high ‘Response Time’ (Time taken for the application to successfully process incoming HTTP Requests) and ‘System Utilization’ (CPU, Memory, Disk and Network on the host machines) that erroneously exceeded target SLA (Service Level Agreement) values.

A deeper analysis on each EC2 server instance highlighted that within a few minutes of test execution, the ‘Used Memory’ (Dynatrace captures Memory Measures exposed via JMX) showed a steep increase until it reached the maximum allocated size (4GB). At this point GC (Garbage Collection) was triggered, and the JVM ultimately shut down with an Out Of Memory (OOM) Exception. As a result, CPU (time spent in GC) and Network Utilization (crash information is logged,) increased significantly.

To gain a better understanding of  the memory usage patterns of the application, and to determine why the Garbage Collector caused high CPU use but, ultimately, couldn’t clean up enough memory resulting in the OOM Exception. If you analyze memory-related problems it is always useful to have at least a basic understanding of how memory management and Garbage Collection actually works.

Brief Intro to Java Garbage Collector

The Java Heap Space is typically categorized into “Young” and “Old” generations. Depending on the type of JVM and Memory Settings used, you might see different Heaps and Sizes in your application. Let’s discuss a standard scenario where you have a Young and an Old generation.

Newly created objects are found in ‘Young generation’, which disappear when a minor Garbage Collection (GC) takes place.

Objects surviving from minor GC are pushed to “Old” generation; Old generation GC happens less frequently, and is referred to as ‘major GC’. Frequent and longer major GC cycles indicate the possibility of memory leaks.

Insert 1

For a more detailed overview and explanation of Java Memory Management, Garbage Collection and Tuning check out the Online Java Performance Book from Dynatrace.

Triaging Out-of-Memory Exceptions with Dynatrace

Dynatrace automatically provides a range of basic and advanced memory metrics. Under Monitoring > Infrastructure > host server we narrowed down the problem by examining the following metrics:

  • No available free memory
  • As a result, major GC is constantly kicked in to make memory available for process execution
  • High GC cycle time
This is the default Dynatrace Memory Dashboard for JVMs. Depending on the JVM type and GC Settings it automatically displays memory information for every Heap Generation includeing GC activity
This is the default Dynatrace Memory Dashboard for JVMs. Depending on the JVM type and GC Settings it automatically displays memory information for every Heap Generation including GC activity

At the same time, Young generation appears to be comparatively stale. Young gen started with 2500 MB but as Old Generation increased, Young Generation decreased in size. This also caused continuous minor GC.

These dashboards make it easy to see how objects get promoted from Young to Survivor and later to Old Generation.
These dashboards make it easy to see how objects get promoted from Young to Survivor and later to Old Generation.

This leads to high CPU utilization and an increased number of threads.

The Dynatrace Process Health dashboard also shows CPU and Thread Information which in our case directly correlated to the delays caused by major GCs in the Old Generation
The Dynatrace Process Health dashboard also shows CPU and Thread Information which in our case directly correlated to the delays caused by major GCs in the Old Generation

It became evident that memory allocation for some objects was not appropriate, as all objects were directly promoted to Old Generation, thereby hinting to a potential memory leak for these servers. This is a general conclusion if you see your Old Generation constantly growing without ever being cleared up by a GC. Also check out the online book section “The different type of memory leaks and how to analyze them” for more on this topic.

To analyze the problem we decided to test the same transaction throughout a longer period of time using our JMeter scripts. At the same time, we took Java Heap Dumps with Dynatrace’s Memory Diagnostics Feature. Before we started we also tested one Heap Dump to determine the baseline memory usage.

The “compare” feature in Dynatrace for Total Memory snapshots helped identify that the MessageFormatMessage class from log4j consumed 2.12GB of heap space. These should be temporary objects per transaction, and should be cleaned immediately by the GC.

Easy to spot that the memory problem was related to 2.12GB worth of log4j MessageFormatMessage object
Easy to spot that the memory problem was related to 2.12GB worth of log4j MessageFormatMessage object

Out-of-Memory Exceptions: Root Cause & Recommendation

It turned out that the current log strategy in the application was generating about 100MB worth of log entries for EVERY single transaction, with the log entries being kept in memory throughout the execution of the transaction! The in-memory representation of these logging objects were several factors higher than the actual log output. This means that every executed transaction actually had to allocate about 2GB on the heap. The GC couldn’t clean the memory while the transaction was still executing and, as a result, crashed the JVM with an OOM Exception.

With this new insight on the current logging strategy we proposed several changes:

  • Set log level = ERROR and log only what is absolutely critical
  • Review the log entries that must be logged and ensure that only important ones are printed
  • If log statements are also meant for trace transaction and response time of each hop we suggest to only log entry and exit points of each service call. Custom correlation ID and the Dynatrace x-dynatrace header can be logged in each service along with other information, for example, response time and timestamp. Logging x-dynatrace header can be useful if you are planning to link the log management system with Dynatrace

If you are also dealing with Java memory issues, and are not sure if you are actually having a memory leak or only bad memory allocation patterns, I suggest you get your own free copy of Dynatrace AppMon. You can register for the 30-Day Free Trial which will automatically convert to a lifetime free Personal License (allowing you to analyze your local apps forever). There are also several YouTube Tutorials by Andreas Grabner. Check out the one for Memory Diagnostics with Dynatrace where I walk you through the steps of instrumenting your Java app, and analyzing the different types of memory issues from which your app may suffer.