How do you pinpoint all the trouble spots in your application’s code? Failure rate analysis can easily become a very tedious task. This post explains how I did it when I was still a developer (and why I never quite got the feeling that I was doing it right). And—SPOILER ALERT—I’ll tell you how I would (and you should) investigate trouble spots today.
Know your log files
When I was a developer I had to check log files to find out if the latest release triggered any errors in production. I checked the log files frequently following each new release and, if things went well, only infrequently thereafter. In hindsight, this was an inefficient approach, and possibly even a bit simple minded. But, hey, even though an early version of Logstash may have been available, I was struggling with introducing Maven into my projects back then, so it wasn’t until later that I found time to dig into Logstash.
Logstash to the rescue!
Logstash (and the ELK stack) made things much easier in that I no longer had to keep my eyes on the log files at all times. Instead, I had a nice looking dashboard that visualized all log errors. This was great because I then knew when to open a log file and analyze it. That was a huge improvement over having
tail -f /var/log/yourstuff.log open all day. If you use Logstash, make sure you have it running on a decent machine—the ELK stack can be a real resource hog.
Although Logstash seemed to be (and may still be!?) state of the art when it comes to log file analysis, and it was a real improvement over manual checking of log files, there was something about our process that didn’t feel quite right. Note to self: “Read James Turnbull’s Logstash book.” This book probably has answers to all my Logstash questions.
Logstash visually notifies me whenever ERROR-level messages are logged, but to really nail down the major pain points in an application, I need to know which exceptions occur more often than the others. Our application wasn’t micro services-based back then, so we didn’t have individual services (and consequently no indiviual log files) for each application domain. This made it even harder for us to identify which exceptions were affecting which parts of our code.
I wouldn’t want to give up the option of looking into my log files. And I definitely wouldn’t want to give up my Logstash setup, as it makes working with log files much more convenient. Nonetheless, I don’t think that log files are the best approach to application-wide failure rate analysis.
If not log files, then what?
Dynatrace offers a different approach to failure rate analysis. Dynatrace doesn’t analyze service log files, it pulls exceptions directly from your application. With this approach writing log statements at the right time is no longer a pre-requisite for analyzing exceptions.
Plus, Dynatrace groups exceptions and errors according to where they appear. If your application is built on the principles of microservices, you’ll see all this information on each service-level failure rate page.
When performing failure analysis, Dynatrace isn’t shy about sharing all the details with you. First, Dynatrace shows you a list of all HTTP status codes. These may be client-side problems, like 404 errors, or server-side problems, like HTTP 500 errors.
When you select an error type, Dynatrace lists all the requests that failed with that status code. This is pretty cool as it allows you to concentrate on one issue type at a time. My favorite feature is this: You can pick a request from the list and have Dynatrace display the associated exception, the message, and the stacktrace of the problem. This is just like debugging an application from within your IDE, except you’re not debugging your application from within your IDE—you’re analyzing it in your browser!
Selecting a broken link conveniently lists all the referring pages, which makes fixing broken links a breeze.
Get to know your failures
As I mentioned at the beginning of this post, I wouldn’t want to lose the option of looking into my log files, and I love having Logstash display all its fancy graphs and allowing me to filter for certain log lines. Logstash really is a great complement to Dynatrace.
Log files are great when it comes to identifying user behavior and manually analyzing log statements. And Dynatrace provides a great approach to the overall analysis of application exceptions.
Try Dynatrace for free!
If you haven’t signed up for Dynatrace yet, do it today. We offer an extended free trial period, no credit-card required. Then just sit back and watch your environment for a couple of days with Dynatrace; it may uncover some surprises.