Fullwave multicolor opt3 3x

Out-Of-Memory Errors

Chapter: Memory Management

Out-of-memory errors occur when there is not enough space available in the heap to create a new object. A JVM will always trigger a garbage collection and try to reclaim memory before raising an out-of-memory error. In most cases the underlying cause is an insufficient memory configuration or a memory leak, but there are some others.

Here is a non-exhaustive list of the most important ones:

  • Insufficient memory configuration
  • Memory leak
  • Memory fragmentation
  • Excess GC overhead
  • Allocating oversized temporary objects

Besides the first two, mMemory fragmentation can also cause out-of-memory errors even when there appears to be enough free memory, because there isn't a single continuous area in the heap that's large enough for the allocation request. In most cases, compaction assures that this does not occur, however there are GC strategies that do not use compaction.

Some JVM implementations, such as the Oracle HotSpot, will throw an out-of-memory error when GC overhead becomes too great. This feature is designed to prevent near-constant garbage collection—for example, spending more than 90% of execution time on garbage collection while freeing less than 2% of memory. Configuring a larger heap is most likely to fix this issue, but if not you'll need to analyze memory usage using a heap dump.

The last issue is often a matter of developer thoughtlessness: program logic that attempts to allocate overly large temporary objects. Since the JVM can't satisfy the request, an out-of-memory error is triggered and the transaction will be aborted. This can be difficult to diagnose, as no heap dump or allocation-analysis tool will highlight the problem. You can only identify the area of code triggering the error, and once discovered, fix or remove the cause of the problem.

Churn Rate and High Transactional Memory Usage

Problem

The churn rate describes the number of allocations of temporary objects per transaction or time slice. Java allows us to allocate a large number of objects very quickly, but high concurrency and high throughput quickly lead to churn rates beyond what the JVM can sustain. The transactional memory usage on the other hand describes how much memory a transaction keeps alive until it is finished. (e.g. a single Transaction might need at least 5MB memory and creates 1000 temporary objects). High concurrency means that there are many active transactions and each of them needs some memory. If the sum of that (100 concurrent Transactions at 5 MB = 500 MB) is beyond the capacity of the young generation, temporary objects will be tenured to the old generation! The problem is easily overlooked during development because it is visible only under high-load situations.

Symptom

A high churn rate alone will slow down the applications because the JVM needs to execute young-generation GCs more frequently. Young-generation GCs are inexpensive only if most objects die! In a high-load situation with many concurrent transactions, many objects will be alive at the time of the GC. Therefore a high transactional memory usage, meaning a lot of live objects in the young generation, leads to longer and more expensive young-generation GCs. If the transactional memory usage exceeds the young generation's capacity (as described earlier), objects are tenured prematurely and the old generation will grow. This then leads to more-frequent old-generation GCs, further hurting performance.

In extreme cases the old generation's capacity gets exceeded and we end up with an out-of-memory error. The tricky part is that the out-of-memory error will abort all running transactions and the subsequent GC will remove the root cause from memory. Most memory tools look at the Java memory every couple of seconds and will never see 100% utilization. As a result there seems to be no explanation for the out-of-memory error.

Solution

The indicators for this problem are clear: frequent minor and expensive GCs, eventually leading to a growing old generation. The solution is also clear, but does involve a lot of testing and subsequent code changes.

  • Do a thorough allocation analysis to bring down the churn rate
  • Take several heap dumps under full load. Analyze how much memory a single transaction keeps alive and try to bring that down. The more concurrency you expect in your production system to less memory a single transaction should use.
  • Make sure you follow up your optimizations with a young-generation-sizing exercise and extensive load testing

Optimizing churn rate issues is not a simple task, but the performance and scalability improvement can be substantial.

Incorrect Implementation of Equal and Hashcode

The relationship between the hashcode and equals methods and memory problems is not obvious at first. But we only need to think about hashmaps to make it more clear. An object's hashcode is used to insert and find objects in hashmaps. However, the hashcode is not unique, which is a why it only selects a bucket that can potentially contain multiple objects. Because of this, the equals method is used to make sure that we find the correct object. If the hashcode method is wrong (which can lead to a different result for otherwise-equal objects), we will never find an object in a hashmap. The consequence is often that the application inserts an object again and again.

Although the growing collection can easily be identified by most tools, the root cause is not obvious in a heap dump. I have seen this case over and over through the years, and one extreme case led the customer to run his JVMs with 40 GB of memory. The JVM still needed to be restarted once a day to avoid out-of-memory errors. We fixed the problem and now the application runs quite stable at 800 MB!

A heap dump—even if complete information on the objects is available—rarely helps in this case. One simply would have to analyze too many objects to identify the problem. The best approach is to be proactive and automatically unit-test comparative operators. A few free frameworks (such as EqualsVerifier) ensure that the equals and hashcode methods conform to the contract.

Table of Contents

Try it free

See our unified observability and security platform in action.
Full wave bg