This time I take an a bit unconventional approach towards defining performance management. The idea for this post came through a number of customer engagements, where the same question came up over and over again: “How do we start with Application Performance Management and what should we do?” Over time I developed a simple model which I called the performance management pyramid.
The basic idea is to assign performance management activities to several levels. Each level serves a specific need and builds on top of the lower ones. Just like a building it starts with the basement and then moves up further until we reach the roof. However, without a solid basement the best roof is not worth anything. Conversely, you do not start building a house from the roof.
In a recent conversation Michael – a member of my team – brought up the idea of combining activities with system properties like health or availability. While I very much liked the idea, the pieces of the puzzle did not yet fit together. Then I remembered a model I learned in school – the Maslow hierarchy of needs. Rethinking the model I thought it is a good analogy and basis for the model I wanted to define.
The Maslow Model
For those who don’t know or don’t remember Maslow’s model, I will shortly explain the idea behind it. The model defines 5 layers of needs as shown below. The theory is that people will strive to achieve lower layers before higher ones. Physiological needs are the lowest level of needs. If these needs are not met the human body stops to function. The next level aims at safety, predictability – the feeling that you are “in control” of your life. If this need is met, people strive for social acceptance. People sometimes see this need as more important than the need for health or the feeling to control one’s life. The next level then is about the need that your work and activities are valued by others. People want to be respected for what they do. The last three layers are also called deficiency needs. This means that the body is not showing any physical sign of them not being met, although people feel a deficit in their life quality. Finally, the ultimate level is self-actualization meaning being able to reach one’s full potential.
Applying Maslow’s Model to Applications
So what does this have to do with application performance at all? Actually, there are astonishingly many similarities. The body in our case is our physical infrastructure. If the infrastructure is not working our application simple is not available. So the lowest level in performance management is to ensure availability. The next level is to ensure a healthy system. Healthy means a system that works within defined parameters and which is considered stable. While this level ensures that our systems are sized properly it does not tell us anything whether the service is accepted by our end users. We might have proper connection pool sizes, reasonable Garbage Collection times etc.; however, users do not want to use the application because it is slow.
Ensuring service to end user is the next level which I call serviceability – meaning the ability to provide service to end-users. At this level we have an interesting similarity with the original Maslow model, where people neglect the lower levels to achieve social acceptance. In an application context this means that we naturally (and logically) try to ensure service, even if the underlying infrastructure is not stable. Typical real world examples are application servers which are continuously restarted due to unresolved memory leaks or trying to maintain service by adding more and more hardware although we know that we need to fix the real problems of the application.
The next level is about providing the service that people really want to get. In an application world we talk about SLA compliance here. So we ensure that all our service levels are met and people value the service they receive.
Finally we strive for optimizing our application to work as well as possible. This means optimizing non optimal parts and delivering the best possible service with minimal resource consumption. I refer to this level as optimizing. Below you see a graphical representation of this model
The Implementation Pyramid
Taking this model as a basis we will now define our strategy for implementing performance management. At the lowest level we ensure that our infrastructure and application is available. This involves monitoring of basic system resources like CPU, memory or disk space as well as availability monitoring using pings at system and application level to check whether our application is online.
At the next level we ensure that our application environment is working properly. The respective metrics can be retrieved via management interfaces. For Java there is JMX and JSR-77 which defines a standard set of available metrics which provide information about the health of your application environment. The .NET equivalent are PerfMon counters which provide equivalent information. By monitoring these metrics we verify if our application environment is working as intended. However we cannot judge our service from a user perspective.
Application monitoring addresses this topic. Application monitoring provides metrics at application level about the individual transactions performed by users. Implementing monitoring at this level provides us with response times of our application. Actual implementations differ in the level of details they provide. Solutions like Dynatrace provide insight into each individual transaction while other tools aggregate metrics for specific transaction types. While we can verify how our application is performing we are not primarily dealing with optimizing it.
SLA compliance management also ensures that we achieve the performance required by end users. At this level also end-user monitoring comes into play, as we are interested in how the performance feels for our users. I discussed the area of end-user monitoring in one of my other posts in the Application Performance Almanac. I am deliberately talking about SLA management rather than SLA monitoring as our goal is to ensure end-user satisfaction. “Classical” SLA monitoring would only monitor compliance and report SLA breaches. At this level, however, we also want to be able to take the required actions to achieve SLA compliance. This means that we now additionally require diagnosis capabilities which help us to easily resolve problems. This requires gathering diagnostics data which provides more insight into application performance. This includes component-level information down to code-level like database calls, web service calls etc. This also includes proactive measures like continuously monitoring application performance to detect potential problems early on before users are impacted by them.
The highest level of application performance management is proactive optimization. At this level we really strive for achieving maximum performance with minimal resource usage. This can be done forever and in many different ways. In order to know which way to go you have to be clear about your goals. We at Dynatrace use the model shown below which defines goals in the dimensions of cost, quality and process (or time) optimization. This level is definitely the most strategic. The important point is to not work at this level, unless all other levels are achieved. Often people try to optimize where they can rather than where they really need.
Performance Management should be – or better, must be – designed around your actual needs. It is important to start with ensuring basic performance characteristics first. The more you understand about your system the more obvious the measures to further improve your system will become. I have tried to provide an easy-to-understand and easy-to-follow framework for implementing performance management. This is my approach on implementing performance management and I am eager to learn about your thoughts and ideas on the topic.