If it is possible to stall the application for a considerable amount of time (about 15 minutes for an 8-GB JVM), you can use leak analysis snapshots, which contain all instances and their references currently in the heap. Due to long application suspension time this process is considered non-production.
Creating a leak analysis snapshot stalls the JVM or CLR. For larger heaps, depending on your hardware and the bandwidth of your network connection as well as the selected snapshot options, creating the snapshot can take a long time: creating a snapshot of a 24-GB heap usually takes about 30 minutes, during which time the VM or CLR is completely stalled. Post-processing large leak analysis snapshots also requires lots of memory. To prevent the AppMon Server from purging PurePaths (as it needs memory for post-processing) use an AppMon Memory Analysis Server. For production environments, the AppMon Memory Analysis Server is mandatory.
To configure an AppMon Memory Analysis Server through the AppMon Client, select Settings > Dynatrace Server > Services > Dynatrace Memory Analysis Server. See Set up a Memory Analysis Server for configuration description.
For a detailed overview of the functionality provided for leak analysis snapshots, see Total Memory dashlet.
Leak analysis snapshots stall the VM while retrieving heap information. Especially for production environments, consider the impact on your system before creating a snapshot.
JVMs require a considerable amount of native memory for the creation of a leak analysis snapshot. For example, a JVM with 25 GB of used memory containing 340 million instances requires approximately 15 GB of native memory. CLRs, however, require very little native memory for creating a leak analysis snapshot.
The Total Memory dashlet provides an overview of the HotSpots found while post-processing a leak analysis snapshot. Look at the HotSpot pane to check whether AppMon has found a potential leak. If a leak was identified, use the analysis drilldowns to investigate leak suspects.
Use the Keep Alive Set dashlet to see which instances are kept alive by this instance and the Shortest Root Paths dashlet to investigate how this large instance is kept alive. If the keep alive set contains instances that should not be kept alive, drill down to the shortest root paths of these instances to identify why they are still referenced. If the keep alive root path doesn't show you why an instance is kept alive, use direct root paths to show the direct references on the heap.
If the HotSpots pane does not contain the instance causing a memory leak, open the content of the snapshot to get an overview of classes and their instance counts as well as their corresponding Garbage Collection Sizes. This is usually a good point in your analysis to consult with an architect or developer if you do not have detailed knowledge of the implementation of your application, to help you figure out if the instance counts, the GC sizes, and in particular the references between instances are feasible.
If the HotSpots pane does not show suspicious instances and the snapshot contains a large number of instances for one or more classes, the analysis is usually more complex and time-consuming. Open the Total Memory Content dashlet and look for suspicious instance counts or Garbage Collection size of classes. You can also compare the average instance Garbage Collection size for a class to the size of individual instances to find any unreasonably large instance. If you find a single instance, use the Keep Alive Set of that instance and follow the root paths of suspicious instances from the Keep Alive Set contents.
For very large heaps, reproduce your memory leak with a smaller heap if possible. Less time is required to create, post-process, and investigate a snapshot with a smaller heap than with a huge heap, especially if there is no single instance that is keeping others alive. Using a fast hard disk (or SSD) for session storage improves analysis performance.
If there is no single instance to be investigated, you can analyze aggregated follow references in the Follow References dashlet. Aggregated follow references are followed for all instances of a class. The goal is to find referrers that should not be there. Keep in mind that an aggregated reference analysis follows the references of each instance of a class and thus can require several minutes. Following by keep-alive should show how many of the instances are kept alive, and by which other instances. Following direct references should show how many of the instances are referenced directly, and by which other instances.
The figure below illustrates an aggregated References by Keep Alive for all
OpenBitSet instances found in a leak analysis snapshot. The referrer analysis shows that that there are 190,426
ExecPathNode instances that are referring at least one
OpenBitSet. This does not mean that all
ExecPathNode instances are referring a total of 190,426
OpenBitSet instances. It might mean that one
OpenBitSet instance is referred by 190,426
ExecPathNode instances. As this is suspicious, you should check whether there is really an
OpenBitSet for each
Drill down from the
ExecPathNodes to keep-alive references using Follow references > By Keep Alive. In the new dashlet, switch to the Analyze Referees tab at the bottom and request the referees of those 190,426
ExecPathNode instances. The tab shows that 190,426 individual
OpenBitSet instances are referenced. This identifies the problem: there should only be a single instance of
While using aggregated reference analysis, you can open a result (for example, 190,426 individual out of 5,000,000
ExecPathNode instances) in a new dashlet that allows to view the corresponding instances. Selecting Show Instances in the context menu to open a dashlet that contains only the corresponding class. Expand this class, to see only the corresponding Instances.