Many Application Performance Management (APM) vendors that give insight into the runtime behavior of JVMs use interfaces provided by the Java Runtime. Traditionally Java offered the JVMPI Interface which was replaced by JVMTI with Java 5. Both options allow a tool vendor to load a native library (often called native agent) into the same process as the JVM. This library gets access to JVM state and certain aspects of application execution through a native API. As the library is not running as part of the JVM it is not impacted by JVM stops (e.g: longer Garbage Collection suspensions, runtime errors …) and can therefore continue to deliver data to the external tool.
As an alternative to these native interfaces Java 5 also introduced a pure Java interface. This option loads the agent INTO the JVM and runs as part of the JVM. The “downside” is that this agent gets loaded at a later stage of the JVM Startup Sequence. It is also impacted by JVM stops or Java runtime problems and won’t be able to report certain type of error information.
In this blog we will highlight some of the reasons why our engineering team decided on the native approach in combination with bytecode instrumentation (BCI) vs. moving to the Java based approach:
Full Insight of all Classes
The agent gets loaded before any classes are loaded. This allows the agent to collect data from the very beginning and is able to collect data and control execution of ALL Java code. No restrictions apply here. In order to capture method level information, leveraging bytecode instrumentation (BCI) provides an additional benefit instead of relying on native callbacks. Being able to perform BCI on any class allows additional insight into core system classes (java.lang.Object, java.lang.Thread, …)
More detailed information
From native code we can get much more detailed performance information, like hardware high-resolution timers, detailed GC information, et cetera. Using a native approach therefore eliminates the need to install yet another native agent that collects system information. The Java agent most likely doesn’t get access to this data as it runs within the special security context of the JVM.
Within the native agent we can collect much more information about the JVM, like memory & thread dumps, crashes of the JVM, et cetera. Especially for thread and memory analysis it is beneficial to get access to both JVM Threads and Memory usage as well as native Threads and Memory usage. In case of a crash caused by an Out-of-Memory Error the native agent is still able to capture the data on the heap as the native process is still running and able to access memory information.
Less impact on JVM
Pure Java agents run “within” the JVM they’re observing. They are therefore adding overhead to the JVM itself and may impact execution of the application. One prominent example is higher Java heap usage as compared to the native counterpart.
In native code data needed for analysis can just be fetched in a more efficient way, e.g. stack traces. Most of this information is available through native APIs. Calling this from a Java Agent from within the JVM would require a more expensive call from Java to Native.
Not attached to JVM
As the agent is not attached to the JVM, it’s also not affected by JVM stops (especially GC related) and able to continue data collection during such JVM stops. This helps us to collect detailed and accurate information about the actual impact of Garbage Collection Suspensions on the currently executing application threads. A Java Agent won’t get that accurate information as it is also impacted by these “stops” of the JVM.
What are your thoughts?
These are our thoughts and reasoning behind why we decided to use the native option. One counter argument we sometimes hear is that bugs in the native agent may crash the process and with that the JVM. This is of course a valid concern but that’s why we are running lots of automated tests on all platforms we support to ensure we don’t cause any problems.
Now the shout-out to all other developers that use these interfaces: what are your thoughts on this? Why did you decide on the Java or Native approach?
Special Thanks to Christian Schwarzbauer, Chief Software Architect @ Compuware APM/dynaTrace, for the input on this blog.