Thread diagnostics

Overview

Thread dumps provide a snapshot of all JVM/CLR threads. They are a powerful way of finding deadlocks, idle or busy thread pools, thread leaks, and more.

This page describes how to use thread dumps to diagnose performance bottlenecks.

There are many advantages of thread dumps compared to JVM dumps:

  • CPU time information is available.
  • Multiple thread dumps can be compared.
  • You can group threads by significant criteria.
  • You can search for PurePaths where a specific thread is involved.
  • To perform a thread dump, it is not necessary to start the VM in the console or redirect the output.
  • Thread dumps can be triggered automatically.

Note

While the thread information is collected, all threads are suspended to guarantee consistency of stack traces, states, and monitors.

Thread dumps can be persisted, searched, and grouped. You can schedule thread dumps. For example, you might trigger thread dumps at times when the load on the system under diagnosis is low, to ensure that there are no threads leaking, or to compare the CPU usage of different threads to previously created thread dumps.

Grouping threads

It is often tedious to analyze a traditional JVM dump because the vital information is scattered in the log file or tool. For example, you have to know which thread owns which monitor or which threads are waiting for a buffer.

The ability to group threads by different criteria is very handy, especially in production scenarios with many concurrent threads. For example, threads grouped by their thread group show which thread group consumed the most processor time, because the CPU times are accumulated. Grouping by classes or methods is useful to see what is happening in an application at the time of the dump.

Thread state and thread group are the most common groupings. The following figure shows a thread dump taken from Eclipse under load.

Eclipse thread dump
Eclipse thread dump

There are 10 blocked threads and some of them have locks on monitors. Having multiple monitors locked isn't a problem in itself, but it could indicate architectural flaws. For example, it may be that the method was not intended to be called from another method that synchronized on the owned lock.

This figure shows that two threads are waiting for this monitor. It is unclear how long they have been waiting, but it is advisable to keep an eye on them. The methods are candidates for placing additional Sensors.

Blocked threads
Blocked threads

Comparing thread dumps

When the system under diagnosis appears to have random threading issues, it helps to examine all running thread.

Running threads
Running threads

The most active thread is Worker-2. It is updating the Subversion state of all project files. To find out what's taking so long, another thread dump is requested, as shown in the following figure.

Thread dump analysis
Thread dump analysis

The dump shows that Worker-2 was not the only busy thread. Timer-1 also consumed a lot of CPU time, probably with Subversion. To learn more, you could place additional Sensor rules for the package org.tmatesoft.svn.core. The filter feature shows all threads related to Subversion.

Note

Threads are compared by their ID. Because the ID changes after restarting the system under diagnosis, threads from different runs can only be compared by name. The thread name does not have to be unique, therefore threads with the same name cannot be compared.

Limitations

  • Gathering monitor details on some Sun VMs is slow when threads are suspended. It is possible to disable thread suspension by adding the Agent option threaddumpsuspendthreads=false. However, with thread suspension disabled, monitor details might not match the thread states or stack traces.
  • CPU time is available only with JVM 5.0 and later.
  • Acquiring owned monitors under .NET is available only with .NET 4.5.