Memory Leak Detection in Production – a Field Report

Memory Leaks are painful – especially when they crash your production servers. But – how does one go about analyzing the leak that only happens in the production environment? Here is a story I was told while on-site with one of our clients. I asked: “Chris, tell me about a recent performance problem and how you solved it.”

He replied: “I’ve got a good one. We had these memory problems in production with memory increasing steadily and eventually crashing a JVM. First thing I asked my Ops guy is to capture a Java Heap Dump. His response was: ’Heap WHAT?’ So I started explaining how to take a Heap Dump on the JVMs. Two hours later he had the file on the production machines. Now it was time to transfer this file out of Prod on my Desk. This took about 20 minutes. When I opened the file it turned out to be corrupted. So – back to capturing a new dump and transferring it – another hour gone. The second dump actually opened in my analysis tool. Unfortunately it was a bit hard to tell what the memory leak was with just one dump. I would need multiple dumps and compare them to see which objects are actually growing over time. This procedure took several hours. At the end I knew my top classes based on instance count but I had no real answer on the actual leak.”

Improve the process from hours to minutes

… continue the story on the Dynatrace Community Portal

Andreas Grabner has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on You can follow him on Twitter: @grabnerandi