In this series, we are reviewing the various approaches to enterprise application monitoring and their cost-benefit analysis. If you haven’t read the previous post, please do so here or start at the beginning. Failure to do so may lead to confusion by irrelevant physics references and analogies.
Unraveling performance and outage issues in your application environments is likely utilizing some form of log monitoring. There are some excellent options available and in most cases logs are an easy way of providing valuable insight into your applications and infrastructure. Every application developer knows how to write out to a log file and often do so for everything from error messages to telemetry. Operations teams are familiar with the format and can use the output for forensic analysis of outages and troubleshooting. All major infrastructure providers have a mechanism to encourage these use cases. However, logs in themselves are not answers but rather collections of structured and unstructured data that must be parsed and analyzed to reveal answers.
There are many challenges introduced by using logs for real-time troubleshooting. They can be written practically anywhere for each component of the stack you wish to monitor. In some cases they are reports of ephemeral workloads that can’t be looked at after the problem has occurred because the components just don’t exist any longer. Aggregating all of these logs present a new administrative, resource intensive, and potentially costly paradigm when you are paying for a point solution by the gigabyte.
Assuming you solve the organizational challenge of aggregating all your logs, you are now presented with the task of analyzing and making actionable the data that lies within. Not all logs are in the same format and there are a number of products that will treat your log stashes like databases, but now you have to create queries against these stashes. How do you know what to look for in the logs if you didn’t write the application? How do you ensure that there is no sensitive data in your logs or protect it if it is? Returning to our airplane analogy, imagine if a pilot had to analyze log files to determine what the current altitude is or why they suddenly have a lot of bells and emergency warning lights appearing on the console? You wouldn’t want to be the pilot in that cockpit, but if you are using logs as your sole source of APM analysis you are in this scenario.
I am not suggesting that logs are not important. In many industries, they are even a regulatory requirement. Often, they can help identify problems that have occurred and help pinpoint areas where improvements can be made in future releases. Using them in a real-time effort is where they lose their value. Logs can be important in review of a problem after it has been remediated, but to do so in-flight would be time consuming, potentially disastrous, and likely futile. There are excellent point solutions that help in making sense of your logs, but some now suggest that you can forgo other monitoring options. Instead publish all of your structured and unstructured data to logs for analysis in real-time including telemetry metrics, errors, trace data, etc.. The reliance only on logs for management and maintenance has served us well over the years, but when applied in this manner to complex and ephemeral micro-service architectures it is difficult at best, often stale or outdated, and ultimately doomed to fail spectacularly at scale. Without context, log data is just data.
Dynatrace Log Analytics enable the capture and analysis of logs. However, when emergencies arise, the logs will be provided in the context of issues discovered by the Artificial Intelligence monitoring all of your application data. This is a far better use of logs as part of your larger APM strategy rather than writing queries to parse them.
Logs represent only one facet of your environment and should be bound together in context with other monitoring perspectives to provide the best picture of your complex application environments. Just like you wouldn’t want to be a pilot who only has logs to troubleshoot your cockpit warnings, you shouldn’t rely solely on this one perspective of your production environment. If you have Dynatrace as your co-pilot, you will have all of the information including logs in context of the emergency at hand so you can focus on flying high in your enterprise cloud while your monitoring is on auto-pilot.
Dynatrace introduces a new approach to this need with Log Analytics. When using the full-stack monitoring of the Dynatrace OneAgent in your legacy, micro-service, or container architectures your application and infrastructure logs are automatically discovered. Because the logs are identified alongside their components that are monitored, the context of a problem is more than just a time-stamped entry in the logs but also the full-stack metrics, Smartscape relationships and root cause analysis that is identified by Dynatrace’s AI engine.
Read my next post in this series on Agents or if you are ready to make the leap just skip the rest series all together and try Dynatrace for yourself to discover how much more there is to know about monitoring your application stack.