Christopher André is an Enablement Service Consultant at Dynatrace and helps our Customers to maximize their value they get out of Dynatrace.
The other day, I went to a customer who was experiencing a problem that happens quite frequently: he had a cache that was constantly growing, leading to OutOfMemory Exceptions. Other problems in the application seemed linked to it. Analyzing and finding the root cause of this memory related problem triggered me to write this blog on why they ran into OutOfMemory Exceptions despite having a properly configured cache.
He was trying to cache the results of database selects, so he wouldn’t have to execute them multiple times. This is generally a good idea, but most Java developers don’t really know how to do this right and forget about the growing size of their caches.
How can we have memory problems in Java?
Quite often, I hear Java developers saying “I can’t have memory problems; the JVM is taking care of everything for me”. While the JVM’s memory handling is great this does not mean we don’t have to think about it at all. Even if we do not make any obvious mistakes we sometimes have to help the JVM manage memory efficiently.
It will reclaim memory of all objects that can’t be reached by so called “GC Roots” (GC Roots are objects that are assumed to be always reachable). Problems often happen when an object creates many references to different objects and and the developer forgets to releases them.
A cache is, to simplify to the extreme, a Map. You want to remember a particular object and associate it with an identifier. Because we don’t have an endless supply of memory, there are specific algorithms that are dedicated on evicting certain no longer needed entries from this cache. Let’s have a quick look at some of them:
- Least Recently Used (LRU)
In this algorithm the Cache maintains an access timestamp for every entry. Whenever it is triggered to remove entries, due to additional size limitations, it will evict those with the oldest access timestamp first.
- Timed LRU
This is a special form of LRU that evicts items based on a specific “not-used” timeframe instead of a size limitation. It is one of the most frequently used algorithms for database caches and was used in my customer’s case.
- Least Frequently Used (LFU)
This algorithm tracks the number of times an entry has been accessed. When the cache tries to evict some of its entries, it removes those that used accessed least often.
Despite the usage of “Timed LRU” algorithm, my customer faced the problem that the number of objects that were referenced grew too big. The memory used by these objects could not be reclaimed by the JVM because they were still hard referenced by the cache. In his case the root cause was not an inappropriately sized cache or a bad eviction policy. The problem was that the cached objects were too big and occupied too much memory. The Cache did not yet evict these objects yet based on the Timed LRU algorithm and therefore the GC could not claim these objects to free up memory.
Solving this with Memory-sensitive standard Java mechanisms
The problem caused by hard references was addressed by the Java Standard library early on (version 1.2) with so called Reference Objects. We’ll only focus on one of them: Soft References.
SoftReference is a class that was created explicitly for the purpose of being used with memory-sensitive caches. A Soft Reference will, according to the official javadoc, be “cleared at the discretion of the garbage collector in response to memory demand”. In other words, the reference is kept as long as there is no need for more memory and can potentially be deleted if the JVM needs more memory. While the specification states that this can happen any time, the implementations I know only do this to prevent an OutOfMemory. When the Garbage Collector cannot free enough memory, it will dereference all SoftReferences and then runs another Garbage Collection before throwing an OutOfMemoryExcption. If the SoftReference is the only thing that keeps an object tree alive, the whole tree can be collected.
In my customer’s case, it would mean that the cache would be flushed before an OutOfMemoryException would be triggered, preventing it from happening and making it a perfect fail-safe option to his cache.
While Soft References have their uses, nothing is without side effects:
Garbage Collection and Memory Demand
First of all: a memory sensitive cache often has the effect that people size it too big. The assumption is that when the cache can be cleared on memory demand, we should use all available memory for the cache, because it will not pose a problem. The cache will be growing until it fills a large portion of the memory. As we have learned before this leads to slower and slower Garbage Collections because of the many objects to check. At some point the GC will flush the cache and everything will be peachy again, right? Not really, the cache will grow again. This is often mistaken for a memory leak. Because many heap analyzer treat soft references in a special way it is not easily found.
Additionally, the SoftReference objects occupy memory themselves and the mere number of these soft reference objects can also create OutOfMemory exceptions. These objects cannot be cleared by the Garbage Collector! For example, if you create a SoftReference object for every key and every value in your Map, these SoftReference objects are not going to be collected when the object they point to is collected. That means that you’ll get the same problem as previously mentioned except that, instead of it being caused by many objects of type “Key” and “Value”, it’ll be triggered by SoftReference objects.
The small schema below explains how a soft reference works in Java and how an OutOfMemoryException can happen:
This is why you generally cannot use a memory-sensitive cache without using the mentioned cache algorithms: the combination of a good cache algorithm with SoftReference is going to create a very robust cache system that should limit the amount of OutOfMemory occurences.
Cache System must handle flushed Soft References
Your cache system must be able to deal with the situation when it gets flushed. It might sound obvious but sometimes we assume the values or even the keys of the Hashmap are always going to be stored in memory, causing some NullPointerExceptions when they are garbage collected as the cache gets flushed. Either it needs to reload the flushed data upon the next access, which some caches can do, or your cache needs to clean out flushed entries periodically and on access (this is what most systems do).
There is no standard Map implementation using SoftReferences.
To preempt any hardcore coders posting comments about WeakHashMap, let me explain that a WeakHashMap uses WeakReferences and not SoftReferences. As a Weak Reference can be flushed on every GC, even minor ones, it is not suitable for a cache implementation. In my customer’s case, the cache would have been flushed too often, not adding any real value to its usage.
This being said, do not despair! The Apache Commons Collection library provides you with a Map implementation that allows you to use the kind of reference you want to use for keys and values independently. This implementation is the ReferenceMap class and will allow you to create your own cache based on a Map without having to develop it from scratch.
Soft References provide a good means of making cache systems more stable. They should be thought of an enhancement and not a replacement to an eviction policy. Indeed many existing cache systems leverage them, but I encounter many home grown cache solutions at our clients, so it is good to know about this. And finally it should be said that while Soft References make a cache system better they are not without side effects and we should take a hard look at them before trusting that everything is fine.