Memory diagnostics

Overview

AppMon provides sophisticated means for diagnosing memory leaks and excessive memory consumption by Java and .NET applications. This page describes memory management in Java and .NET in general, shows how to use AppMon to find memory problems and optimize memory performance, and provides an example of memory diagnostics. See Memory Analysis for a tutorial of memory analysis.

Inefficient use of application memory affects key performance indicators of service management:

  • Unavailability of application due to crashes.
  • Bad response time due to frequent runs of garbage collection.
  • High investment costs from inefficient use of hardware resources.

At worst, memory leaks increase memory demand until out-of-memory exceptions and application malfunctions occur.

Memory problem classification

Generally, memory problems can be classified as follows:

  • Memory leaks, leading to unavailability due to crashes.
  • Heavy memory use, leading to high costs from hardware requirements.
  • Heavy garbage collection use, impacting application performance by frequent object instantiations and high numbers of objects.

Causes of memory leaks

A memory leak is an allocation of memory by the application that is not freed again when the memory is not used anymore. The obvious sign of a memory leak is growing memory consumption with constant load on your application. The causes of memory leaks can be categorized as these programming errors:

  • Collections and maps: Memory leaks are often caused by collections containing objects that are no longer used.
  • Static fields: Objects assigned to static fields of classes are not cleaned up until the application is shut down or the fields are set to null. Static collections and maps constitute special problems.
  • Custom data structures: Custom data structures frequently keep back-references to objects, preventing the freeing of intentionally unused objects.
  • Cleanup errors: Code does not properly.
    • Use statements to release unmanaged resources, such as native code invoked by JNI.
    • Unregister listeners.
    • Release pooled resources, such as database connections, which can lead to database connection leaks.
  • Inappropriate use of sessions in web applications: Objects stored in session variables live for the duration of a user's session. If sessions are not invalidated explicitly, these objects live until the session's timeout. This can prevent releasing objects for minutes to hours, depending on the configured timeout. This is not technically a memory leak, but it results in poor application performance from inefficient memory use.
  • Soft-References: This is not a memory leak by itself, but excessive use of Soft-References makes it hard to detect memory leaks. Soft-References can keep the JVM heap full, making it impossible to use simple JVM metrics monitor the used memory assigned to hard references.
  • Inappropriate pool sizes: Object pools enable reuse of objects without permanently creating and destroying costly objects. Using pools can restrict the number of objects created for a specific object type. Inappropriate pool sizes may result in the creation of more objects than an application would actually need. Instead of waiting for a pool object to be available, a new one is created.

Memory diagnosis with AppMon

Memory diagnosis is a complex and time-consuming task. AppMon supports efficient and effective detection of memory problems and the implementation of company-specific performance tuning processes.

The diagnosis process

The process of diagnosing a memory leak depends on environmental issues. If it is possible to stall the application for a considerable amount of time (for example 15 minutes for an 8-GB JVM), you can use leak analysis snapshots, which contain all instances and their references currently in the heap. If this is not possible, for example if the application is running in a production environment, AppMon provides the means for trending snapshots and selective memory snapshots. Trending snapshots do not contain every instance or references between instances. However, they provide an overview of which Instances are growing. You can use this information In conjunction with selective memory snapshots to find where in an application the Instances of a growing class are allocated.

The following figures illustrate the proposed process for diagnosing memory leaks for environments where an application stall is possible and for environments where it is not suitable.

Memory monitoring

The first step for memory diagnosis is usually to monitor the memory behavior using charting. For details, see Charting and Memory Analysis.

You can also use AppMon incident-based monitoring of memory behavior. With incident-based diagnosis, you analyze memory problems when they occur by defining incidents for memory usage (for example, 80% of maximum memory) that can be combined with corresponding actions to trigger memory snapshots. For Java 6 VMs, AppMon triggers leak analysis snapshots by default if the application runs out of memory.

It is recommended for every running JVM or CLR, you monitor memory usage by charting measures for the system. The following measures can be used for memory monitoring:

  • Java

    • Used Memory
    • Committed Memory
    • Maximum Memory
    • Memory Pools specific to the Virtual Machine, such as Eden, Survivor, Tenured, or Perm-Gen for Sun JVMs.
    • Total GC Activations
    • Total GC Collection Time
  • .NET

    • .NET Memory Consumption (Virtual)
    • .NET Memory Consumption (Heap Size Gen 0, Gen 1, Gen 2, Large Objects)
    • .NET Garbage Collection (% in GC)

AppMon provides some charts out of the box for monitoring the most important measure regarding memory.

In the Production Edition, Start Center > Monitor Applications > Host Health Monitoring displays the Host Health Overview dashboard, a comprehensive overview of host performance, including memory usage. In the Development Team Edition, click Start Center > Monitoring > Host Health Overview to display this dashboard.

In the Development Team Edition, Start Center > Memory Diagnosis > Analyze Memory Usage opens the Memory Analysis dashboard in which you can quickly view the memory usage of the Agents connected to a System Profile.

You can create memory snapshots either manually or automatically through periodic tasks in the Total Memory dashlet of a System Profile.

Incident-based memory diagnosis

For incident-based memory diagnosis, do either of the following:

  • Automatically create a leak analysis snapshot when an application runs out of memory.
  • Define Incidents (for example, if memory usage exceeds 80%) and attach actions (for example, create snapshots) to these incidents.

By default, AppMon automatically creates leak analysis snapshots if a Java 6 VM reports an out-of-memory error; in such a case, an incident is raised and a snapshot is triggered.

You can disable automatic creation of memory snapshots for out-of-memory errors in the Advanced Settings of the Agent Group, as shown in the figure below.

Disabling automatic memory snapshots
Disabling automatic memory snapshots

You can create incidents for memory-related measures like Used memory in %, and to define actions that trigger memory snapshots. AppMon automatically defines such incident rules. To trigger a memory snapshot, add a corresponding action to the incident rule, as shown in the following figures.

When you discover a memory problem, you can use trending memory snapshots as a first step to determine which classes are top-most instantiated and compare them among multiple snapshots created over time. Trending snapshots are created:

  • Using the Create Snapshot dialog box.
  • Using incidents as previously described.
  • Using automated tasks that periodically trigger a snapshot. See Snapshot Schedules for details.

Note

AppMon persists all memory dumps onto the hard disk of the AppMon Server. This can fill up the disk, especially when creating scheduled dumps of larger VMs or CLRs. Unless suppressed by clearing the options in the Create Memory Snapshot dialog box, garbage collection is forced before a memory dump to prevent including already unreferenced objects and to provide better accuracy. Trending snapshots are, however, quite small. One snapshot requires approximately 1‐2 MB per on disk.

After identifying classes where the number of instances or their sizes are growing, use the context menu to create appropriate Memory Sensor rules. These rules are used in the next step, which involves selective memory snapshots. See Total Memory Comparison Dashlet and Memory Analysis for more details on using trending memory snapshots to find growing instances.

Capture selective memory allocations

Selective memory snapshots track the instantiation of selected classes defined by Memory Sensor rules. They identify the point in the code where instances are allocated. They do not stall the VM or CLR while they are created. However, they introduce some overhead for heap memory and CPU usage, especially when many instances are generated.

The Memory Comparison dashlet used for trend comparison allows you to define Memory Sensor rules for growing classes. See Selective Memory Dashlet for details regarding selective memory snapshots.

Identify leaks using leak analysis snapshots

Leak analysis snapshots contain all instances and classes, together with their references, that currently exist on the heap.

Creating a leak analysis snapshot stalls the JVM or CLR. For larger heaps, depending on your hardware and the bandwidth of your network connection as well as the selected snapshot options, creating the snapshot can take a long time: creating a snapshot of a 24-GB heap usually takes about 30 minutes, during which time the VM or CLR is completely stalled. Post-processing large leak analysis snapshots also requires lots of memory. To prevent the AppMon Server from purging PurePaths (as it needs memory for post-processing) use an AppMon Memory Analysis Server. For production environments, the AppMon Memory Analysis Server is mandatory.

To configure an AppMon Memory Analysis Server through the AppMon Client, select Settings > Dynatrace Server > Services > Dynatrace Memory Analysis Server. See Set up a Memory Analysis Server for a description of how to use the Dynatrace Memory Analysis Server tab, shown in the figure below.

Memory Snapshot Analysis Server tab
Memory Snapshot Analysis Server tab

For a detailed overview of the functionality provided for leak analysis snapshots, see Total Memory Dashlet.

Note

Leak analysis snapshots stall the VM while retrieving heap information. Especially for production environments, consider the impact on your system before creating a snapshot.

JVMs require a considerable amount of native memory for the creation of a leak analysis snapshot. For example, a JVM with 25 GB of used memory containing 340 million instances requires approximately 15 GB of native memory. CLRs, however, require very little native memory for creating a leak analysis snapshot.

The Total Memory dashlet provides an overview of the HotSpots found while post-processing a leak analysis snapshot. Look at the HotSpot pane to check whether AppMon has found a potential leak. If a leak was identified, use the analysis drilldowns to investigate leak suspects. For details, see the tutorial in Memory Analysis. Use the Keep Alive Set Dashlet dashlet to see which instances are kept alive by this instance and the Shortest Root Paths Dashlet dashlet to investigate how this large instance is kept alive. If the keep alive set contains instances that should not be kept alive, drill down to the s hortest root paths of these instances to identify why they are still referenced. If the keep alive root path doesn't show you why an instance is kept alive, use direct root paths to show the direct references on the heap.

If the HotSpots do not contain the instance causing a memory leak, open the content of the snapshot to get an overview of classes and their instance counts as well as their corresponding Garbage Collection Size. This is usually a good point in your analysis to consult with an architect or developer if you do not have detailed knowledge of the implementation of your application, to help you figure out if the instance counts, the GC sizes, and in particular the references between instances are feasible.

The following figures show the HotSpots pane and the content of a leak analysis snapshot.

If the HotSpots view of your snapshot does not show suspicious instances and the snapshot contains a large number of instances for one or more classes, the analysis is usually more complex and time-consuming. Open the Total Memory Content dashlet as previously shown and look for suspicious instance counts or Garbage Collection size of classes. You can also compare the average instance Garbage Collection size for a class to the size of individual instances to find any unreasonably large instance. If you find a single instance, use the Keep Alive Set of the instance and follow the root paths of suspicious instances of the Keep Alive Set contents.

Note

For very large heaps, reproduce your memory leak with a smaller heap if possible. Less time is required to create, post-process, and investigate a snapshot with a smaller heap than with a huge heap, especially if there is no single instance that is keeping others alive. Using a fast hard disk (or SSD) for session storage improves analysis performance.

If there is no single instance to be investigated, you can analyze aggregated follow references in the Follow References Dashlet dashlet (drill down from a class record). Aggregated follow references are followed for all instances of a class. The goal is to find referrers that should not be there. Keep in mind that an aggregated reference analysis follows the references of each instance of a class and thus can require several minutes. Following by keep-alive should show how many of the instances are kept alive, and by which other instances. Following direct references should show how many of the instances are referenced directly, and by which other instances.

The figure below illustrates an aggregated References by Keep Alive for all OpenBitSet instances found in a leak analysis snapshot. The referrer analysis shows that that there are 190,426 ExecPathNode instances that are referring at least one OpenBitSet. This does not mean that all ExecPathNode instances are referring a total of 190,426 OpenBitSet instances. It might mean that one OpenBitSet instance is referred by 190,426 ExecPathNode instances. As this is suspicious, you should check whether there is really an OpenBitSet for each ExecPathNode. Drill down from the ExecPathNodes to keep-alive references using References by Keep Alive. In the new dashlet, switch to the Analyze Referees tab at the bottom and request the referees of those 190,426 ExecPathNode instances. The tab shows that 190,426 individual OpenBitSet instances are referenced. This identifies the problem: there should only be a single instance of OpenBitSet for ExecPathNode.

While using aggregated reference analysis, you can open a result (for example, 190,426 individual out of 5,000,000 ExecPathNode instances) in a new dashlet that allows to view the corresponding instances. Selecting Show Instances in the context menu to open a dashlet that contains only the corresponding class. Expand this class, to see only the corresponding Instances.

See Memory Analysis for a tutorial of leak analysis using the easyTravel application.