Not all JVMS are Created Equal
Chapter: Memory Management
Many developers know only a single JVM (Java Virtual Machine), the Oracle HotSpot JVM (formerly Sun JVM), and speak of garbage collection in general when they are referring to Oracle's HotSpot implementation specifically. It may seem as though there is an industry default, but such is not the case! In fact, the two most popular application servers, IBM WebSphere and Oracle WebLogic, each come with their own JVM. In this section, we will examine some of the enterprise-level garbage-collection specifics of the three most prominent JVMs, the Oracle HotSpot JVM, the IBM WebSphere JVM, and the Oracle JRockit JVM.
Oracle Hotspot JVM (formerly known as the Sun JVM)
The Oracle HotSpot JVM uses a generational garbage-collection scheme exclusively (see Figure 2.8). (We'll discuss Oracle's plans to implement a G1, Garbage First, collector below.)
Figure 2.8: The oracle hotspot JVM always uses a generational garbage-collection heap layout
Objects are allocated in the Eden space, which is always considerably larger than the survivor spaces (default ratio is 1:8, but can be configured). The copy GC algorithm is executed either single-threaded or in parallel. It always copies surviving objects from the Eden, and currently used survivor into the second (currently unused) survivor space. Copying an object is, of course, more expensive than simply marking it, which is why the Eden space is the biggest of the three young-generation spaces. The vast majority of objects die in their infancy. A bigger Eden will ensure that these objects will not survive the first GC cycle, and thus not be copied at all. Once an object has survived multiple GC cycles (how many can be configured) it is tenured to the old generation (the old generation is also referred to as tenured space).
It is, of course, entirely possible for objects to tenure prematurely. The size and ratio of the areas has a big influence on allocation speed, GC efficiency, and frequency and depends completely on the application behavior. The optimal solution can be found only by applying this knowledge and a lot of testing.
The old generation can be configured to use either a serial (default), parallel, or concurrent garbage collection. The parallel collector is also a compacting one, but the way it does the compaction can be configured with a wide variety of options. The concurrent GC, on the other hand, does not compact at all. This can lead to allocation errors due to fragmentation (no large-enough blocks available). If this happens the CMS triggers a full GC which effectively uses the normal GC (and collects the young generation as well).
All these combinations lead to a variety of options and configuration possibilities and can make finding an optimal GC configuration seem quite complicated. In the Tuning Section later in this chapter, we will cover the most important optimization: determining the optimal size for the young and old generations. This optimization ensures that only long-lived objects get tenured, while avoiding too many young-generation GCs and thus too many suspensions.
The HotSpot JVM has an additional unique feature, called the permanent generation, to help make the garbage-collection process more efficient. Java maintains the application code and classes as objects within the heap. For the most part these objects are permanent and need not be considered for garbage collection. The HotSpot JVM therefore improves the garbage-collection performance by placing class objects and constants into the permanent generation. It effectively ignores them during regular GC cycles.
For better or for worse, the proliferation of application servers, OSGi containers, and dynamically-generated code has changed the game, and objects once considered permanent are not so permanent after all. To avoid out-of-memory errors, the permanent generation is garbage-collected during a major, or full, GC only.
In the Problem Pattern Section below, we will examine the issue of out-of-memory errors in the permanent collection, which wasn't designed to handle modern use cases. Today's application servers can load an amazing number of classes—often more than 100,000, which pretty much busts whatever default allocation has usually been set. We will also discuss the memory-leak problems caused by dynamic bytecode libraries.
A detailed description of the memory managements for this JVM can be found here.
Note: For further information, here's a link to a detailed description of Oracle HotSpot JVM memory management.
Garbage First (G1)
Oracle's Java 7 will implement G1 garbage-collection (with backports to Java 6), using what is known as a garbage first algorithm. The underlying principle is very simple, and it is expected to bring substantial performance improvements. Here's how it works.
The heap is divided into a number of fixed subareas. A list with references to objects in the area, called a remember set, is kept for each subarea. Each thread then informs the GC if it changes a reference, which could cause a change in the remember set. If a garbage collection is requested, then areas containing the most garbage are swept first, hence garbage first. In the best case (likely the most common, as well), an area will contain no living objects and is simply defined as free—no annoying mark-and-sweep process and no compacting. In addition, it is possible to define targets with the G1 collector, such as overhead or pause times. The collector then sweeps only as many areas as it can in the prescribed interval.
In this way, G1 combines the advantages of a generational GC with the flexibility of a continuous garbage collector. G1 also supports thread-local allocation and thus combines the advantages of all three garbage collection methods we've discussed—at least theoretically.
For instance, where a generational heap helps find the correct size for the young generation, often a source of problems, G1 is intended to obviate this sizing problem entirely. However, Oracle has indicated that G1 is not yet ready for production, and still considers the G1 experimental and not ready for production use.
The IBM WebSphere JVM
As of Java 5, the IBM WebSphere JVM has added generational GC configuration option to its classic mark-and-sweep algorithm. The default setup still uses a single big heap with either a parallel or a concurrent GC strategy. This is recommended for applications with a small heap size, not greater than 100 MB, but is not suitable for large or more-complex applications, which should use the generational heap.
Figure 2.9: When using the generational GC, the IBM WebSphere JVM separates the heap into Nursery and Tenured space. The nursery is devied into two equal sized areas for infant and surviving objects
The layout of the generational heap (see Figure 2.9) is slightly different from that of the Oracle HotSpot JVM. The WebSphere nursery is equivalent to the HotSpot young generation, and the WebSphere tenured space is equivalent to the HotSpot old generation. The nursery is divided into two parts of equal size—allocate and survivor areas. Objects are always allocated in the nursery and copy garbage collection is used to copy surviving objects to the survivor area. After a successful GC cycle, the former survivor area becomes the nursery.
The IBM WebSphere JVM omits the Eden space and does not treat infant objects specially. It does, however, differentiate between small and large objects. Large objects, usually more than 64k, are allocated in a specific area for the non-generational heap or directly in the tenured generation space. The rationale is simple. Copying (generational GC) or moving (compacting) large objects is more expensive than considering them during the marking phase of a normal GC cycle.
And unlike the HotSpot JVM, the WebSphere JVM treats classes like any other object, placing them in the "normal" heap. There is no permanent generation and so classes are subjected to garbage collection every time. Under certain circumstances when classes are repeatedly reloaded, this can lead to performance problems. Examples of this can be found in the Problem Pattern Section below.
Oracle's Weblogic Application Server uses the Oracle JRockit JVM, and, like the IBM WebSphere JVM, can use a single continuous heap or generational GC. Unlike the other two other JVMs we've discussed, JRockit does not use a copy garbage collection strategy within the nursery. It simply declares a block within the nursery (size is configurable and the placement changes after every GC) as a keep area (see Figure 2.10).
Figure 2.10: The generational GC of the Oracle JRockIt JVM does not split its Nursery into separate spaces, but it does attempt to keep infant objects from being treated by the garbage collector at all
The keep area leads to the following object allocation and garbage collection semantics:
- Objects are first allocated anywhere outside the keep area. Once the nursery fills up, the keep area gets used as well.
- The keep area automatically contains the most-recently allocated objects once the GC is triggered.
- All live objects outside the keep area are promoted to the tenured space. Objects in the keep area are considered alive and left untouched.
- After the GC, the nursery is empty apart from objects within the former keep area. A new equally-sized memory block within the nursery is now declared as a keep area and the cycle starts again.
This is much simpler and more efficient than the typical copy garbage collection, due to these two main points:
- An object is never copied more than once.
- Recently allocated objects are too young to tenure and most likely alive, and thus simply left untouched.
To avoid unnecessary promotions, the JRockit JVM young generation has to be larger than the corresponding generations in the other two main JVMs. It is important to note that the young-generation GC is a stop-the-world scenario—the more objects get promoted, the longer the application is suspended.
While the JRockit handles the young generation differently, The the tenured space is then using the same either a parallel or concurrent GC strategies as the two other JVMs.
The following are some important points that distinguish the JRockit JVM from others:
- Thread-local allocation (TLA) is active in default (which we'll discuss in the next section of this chapter) and is part of the nursery. of the nursery.
- JRockit distinguishes between small and large objects, with large objects allocated directly to the old generation.
- Classes are considered normal objects and are placed on the heap and subject to garbage collection (which is also true of the IBM WebSphere JVM).
"Special" Garbage Collection Strategies
There are some situations when standard garbage collection is not sufficient. We will examine the Remote Garbage Collector, which deals with distributed object references, and the Real Time Garbage Collector, which deals with real-time guarantees.
Remote Garbage Collector
With a Remote Method Invocation (RMI) we can use a local Java object (client-side stub) represent another object residing on a different JVM (server-side). Obviously, RMI calls to the server-side object require that this object exists. Therefore RMI makes it necessary to consider the server-side object being referenced by the client-side stub. Since the server has no way of knowing about this reference, we need remote garbage collection remedies. Here's how it works:
- When a client receives a stub from a server, it acquires a lease for it. The server side object is considered referenced by the client stub.
- A server-side object is being kept alive by the RMI implementation itself until the lease expires, which is a simple timeout.
- Existing client side stubs execute regular heartbeats (known informally as dirty calls) to renew their leases. (This is done automatically by the RMI implementation.)
- The server side checks periodically for expired leases.
- Once the lease expires (because no clients exist anymore to reference the object) the RMI implementation simply forgets the object. It can then be garbage-collected like any other object
Server objects, though no longer used by the client, can therefore survive garbage collection (no client objects are alive). An otherwise-inactive client might hold onto a remote object for a long time even if the object is ready for garbage collection. If the client object is not garbage-collected the server object is held onto as well. In extreme cases, this means that a lot of inactive clients lead to a lot of unused server objects that cannot be garbage-collected. This can effectively crash the server with an out-of-memory error.
To avoid this, the distributed garbage collector (RMI garbage collector) forces a major client garbage collection (with all the negative impact on performance) at regular intervals. This interval is controlled using the GCInterval system property.
The same setting exists for the server side and do the same thing. (Until Java 6, both settings defaulted to a minute. In Java 6, the server-side default changed to one hour.) It effectively triggers a major garbage collection every minute, a performance nightmare. The setting makes sense in general on the client side (to allow the server to remove remote objects), but it's unclear why it exists on the server side. A server remote object is freed for garbage collection either when the lease is up or when the client explicitly cleans it. The explicit garbage collection has no impact on this, which is why I recommend setting this property as high as possible for the server.
I also recommend that RMI be restricted to stateless service interfaces. Since there would exist only one instance of such a server interface and it would never need to be garbage-collected (or at least as long as the application is running), we do not need remote garbage collection to remove it. If we restrict RMI in this way, we can also set the client-side interval very high and effectively remove the distributed garbage collector from our equation by negating its impact on application performance.
Real Time Garbage Collectors
Real-time systems guarantee nearly instantaneous (in the single-digit-milliseconds range) execution speed for every request being processed. However, the precious time used for Garbage collection runtime suspensions can pose problems, especially since the frequency and duration of GC execution is inherently unpredictable. We can optimize for low pause time, but we can't guarantee a maximum pause time. Thankfully, there are multiple solutions to these problems.
Sun originally specified the Java Real-Time System (Java RTS; see JSR-1 and JSR-282) with a specific real-time garbage collector called Henriksson GC that attempts to ensure strict thread scheduling. The algorithm is intended to make sure garbage collection does not occur while critical threads (defined by priority) are executing tasks. However, it is a best-effort algorithm, and there is no way to guarantee that no critical threads are suspended.
In addition, the Java RTS specification includes scoped and immortal memory areas. A scope is defined by marking a specific method as the start of a scoped memory area. All objects allocated during the execution of that method are considered to be part of the scoped memory area. Once the method execution has finished, and thus the scoped memory area is left, all objects allocated within it are simply considered deleted. No actual garbage collection occurs, objects allocated in a scoped memory area are freed, and all used memory is reclaimed immediately after the defined scope has been exited.
Immortal objects, objects allocated via the immortal memory area, are never garbage collected, an enormous advantage. As such, they must never reference scoped objects, which would lead to inconsistencies because the scoped object will be removed without checking for references.
These two capabilities give us a level of memory control that is otherwise not possible in Java, which allows us to minimize the unpredictable impact of the GC on our response time. The disadvantage is that this is not part of the standard JDK, so it requires a small degree of code change and an intrinsic understanding of the application at hand.
The IBM WebSphere and Oracle JRockit JVMs both provide real-time garbage collectors. IBM promotes its real-time garbage collector by guaranteeing ≤1 ms pause time. Oracle JRockit provides a deterministic garbage collector, in which the maximum GC pause can be configured. Other JVMs, such as Zing, from Azul Systems, try to solve this issue by completely removing the stop-the-world event from the garbage collector. (There are a number of Real Time Java implementations available).
Table of Contents
Application Performance Concepts
Differentiating Performance from Scalability
Calculating Performance Data
Collecting Performance Data
Collecting and Analyzing Execution-Time Data
Visualizing Performance Data
Controlling Measurement Overhead
The Theory Behind Performance
How Humans Perceive Performance
How Java Garbage Collection Works
The Impact of Garbage Collection on application performance
Reducing Garbage Collection Pause time
Making Garbage Collection faster
Not all JVMS are created equal
Analyzing the Performance impact of Memory Utilization and Garbage Collection
GC Configuration Problems
The different kinds of Java memory leaks and how to analyze them
High Memory utilization and their root causes
Classloader-releated Memory Issues
Out-Of-Memory, Churn Rate and more
Approaching Performance Engineering Afresh
Agile Principles for Performance Evaluation
Employing Dynamic Architecture Validation
Performance in Continuous Integration
Enforcing Development Best Practices
Load Testing—Essential and Not Difficult!
Load Testing in the Era of Web 2.0
Virtualization and Cloud Performance
Introduction to Performance Monitoring in virtualized and Cloud Environments
IaaS, PaaS and Saas – All Cloud, All different
Virtualization's Impact on Performance Management
Monitoring Applications in Virtualized Environments
Monitoring and Understanding Application Performance in The Cloud
Performance Analysis and Resolution of Cloud Applications