When things work great in testing it doesn’t necessarily mean that everything works great in production. There are many things that might be different – such as: much higher load than tested, different hardware, external services that couldn’t be tested, “Real Users” that execute transactions that weren’t tested.

This blog marks the start of a series of blog posts covering the Top Performance Mistakes when moving from Test to Production. All examples are taken from real life applications we have been working with. I will start the series with one that many application owners may not think about: Excessive Logging.

Logging is important – but handle with care

We all know that logging is important as it provides valuable information when it comes to troubleshooting. Logging Frameworks make it easy to generate log entries and, with the help of tools, its becoming easier to consume and analyze log files when it is time to troubleshoot a problem. But what if there is too much log information that nobody looks at anyway, that just fills up your file system and – the worst thing – seriously impacts your application performance and with that introduces a new problem to you application? The following two examples are taken from an online web application running on Tomcat. We can easily see how the current logging settings can become a real problem once this application gets deployed in testing.

Example #1: Excessive use of Exception objects to log Stack Trace

In this example the Tomcat Connection Pool was used with the setting “logAbandoned=true”. In case the Connection Pool runs into a problem it starts logging a lot of information including full stack traces of where the application tried to get a connection from the pool. In case of this application a typical transaction executed several hundred database queries. Most queries were executed on separate connections which results in about as many getConnection calls as executedQuery calls.

High Overhead when creating hundreds of exception objects to get full stack trace information
High Overhead when creating hundreds of exception objects to get full stack trace information

Looking just at the Exception objects gives us an even clearer picture on how much information gets collected that won’t provide a whole lot of useful information.

Several hundred exception objects are created just for a single web request
Several hundred exception objects are created just for a single web request

When looking at the exception details of these individual exception objects it is easy for us to see which class actually creates these objects in order to obtain the stack trace information for logging:

AbandonedTrace.setStackTrace is the method that creates these Exception objects
AbandonedTrace.setStackTrace is the method that creates these Exception objects

Advice for Production

In the test environment this might not be a big issue if only simulating a fraction of the load expected on the live production system. It must however be a warning to the one responsible for moving this application to production. More information on this particular problem can be found here: https://groups.google.com/forum/?fromgroups#!topic/mmbase-discuss/5x24EjBZMGA

Example #2: Too granular and excessive use of logging

Many applications have very detailed logging in order to make problem analysis easier in development and testing. When deploying an application in production it is often forgotten to turn off DEBUG, FINE, FINEST or INFO logs which would just flood the file system with useless log information and also cause additional runtime overhead. It also often happens that log levels are used incorrectly which leads to too many WARNING or ERROR logs which are not a problem at all. This will make it very hard when analyzing log files to find the “real” problems. The following example shows a log message writing multiple times in the same transaction using a wrong log level.

Excessive Logging and incorrect usage of severity level results in too much time spent analyzing log outputs
Excessive Logging and incorrect usage of severity level results in too much time spent analyzing log outputs

Advice for Production

  • Make sure you configure your log levels correctly when deploying to production.
  • Test these settings in your test environment and verify that there are no DEBUG, FINE, … log messages still logged out to the file system.
  • Show the generated log files to the people that will have to use these log files in case of a production problem and get their agreement that these log entries make sense and don’t contain duplicated or “spam” messages.

Additional Reading

If you are interested in Load Testing also check out the post on To Load Test or Not to Load Test: That is not the Question.