Some evolutionary changes are actually revolutions. This is the case for DC RUM’s new Universal Decode. It is not simply a new option for decoding a packet stream, part of a set of new features, but a real innovation!
My client, a pioneer in the art of measuring, has been waiting for this day for a long time. As a financing subsidiary of a large automotive group, they were one of the early adopters to start monitoring their applications. Twelve years ago, I started to work with their network operations team to build a performance management group, adopting tools including Network Vantage and Application Vantage. Today, this Dynatrace-equipped Digital Performance Management (DPM) team plays an essential company-wide role in ensuring the quality of application service.
The application and project managers have become fond of the DC RUM measurement “Operation breakdown”, which helps them to understand the operation time and where that that time is spent (server, network, client). To understand the condition of their application they fully rely on their DMI reports showing operation response times (HTTP operations, MQ messages, Oracle DB queries, and more). These factual measurements of end-user experience and transaction performance have become indispensable in their analyses and decisions.
In full “downsizing” mode, many of this group’s applications are migrating from mainframe to Linux. This is one of the major projects of the company, and progress results are scanned daily by the Director of Information Systems. Project teams are under increasing pressure, as some users are complaining of degraded performance – with all of the first batch of applications to be migrated. It quickly became clear that we needed to extend DC RUM’s packaged transaction decodes to include custom protocols such as EntireX, SAU, and Tibco. This would help us answer critical questions: How can we measure transaction response time for these protocols, to anticipate and prevent possible delays? How can we break down and isolate operation time to report to the IS Director? In short, how can we clearly pinpoint and resolve current performance issues?
It is therefore natural that the DPM manager asked me to create custom transaction decodes using the newly-released DC RUM Universal Decode. Priority was placed on decoding the EntireX middleware protocol, used in a particularly critical financial management application. This application is highly visible, and since its migration to Linux, problems continue to be elevated to the management team.
A first analysis of the protocol helped target exactly what we wanted to get.
The next step was to create the script, using the Lua language. For this we used the Lua development tools component from the Mars Eclipse environment. Lua (which means moon in Portuguese) was created in 1993 by a research team in Brazil. This scripting language is designed to be embedded in other applications to extend them. Here are some examples of some products that use Lua: Adobe Photoshop Lightroom, VLC, Wireshark, Nginx, Celestia, Wikipedia, World of Warcraft, the Sims… and now Dynatrace DC RUM!
Not having written a script for over ten years, I felt a bit of hesitation to tackle such a project and immerse myself in a new language. But it is with great pleasure that I discovered this compact scripting language to be very flexible and easy to use – and reputed to be extremely fast.
I had no difficulty in creating a first simple script to retrieve each query in the CAS reporting server. Obviously, this first version did not provide a 24/7 consolidated view of all EntireX traffic. But it did quickly demonstrate the simplicity and power of the Universal Decode. From simple (and less actionable) TCP metrics, we quickly advance to application performance measurements – for almost any type of proprietary protocol.
Here is the first script – as simple as it is effective:
The first operations monitored on the EntireX flow are displayed below. Obviously, the results here have been made anonymous, but you should get the idea that such a simple script can easily extract operations.
Quickly, I was able to provide a complete version of the script, retrieving Operation Name, Function, and Module. I even added validation within the script to ensure that the request and response have the same Conversation ID as a verification of returned error codes.
After many tests in the test environment, we were ready to import the EntireX.lua decode to the production AMD probe, and to use the new software service – “Simple Parser” – configured for the occasion.
A few minutes later, all EntireX operations were visible in the CAS and ADS:
We could now start our performance analysis. We used the ADS to make the first detailed report on transactions between EntireX and the Adabase tier.
The results show a number of operations taking over 15 seconds; some can even exceed 30 seconds! Operation time is spent entirely on the last tier.
These results also found that operations taking longer than 30 seconds time out, causing the loss of application context for the user who receives an HTTP 500 error from the Web Service.
Operations responsible for performance degradation are now clearly identified, and the fault domain is isolated to a specific tier/server. The project manager can now provide the development teams with the evidence and context needed to make the appropriate application changes.
Buoyed by this success, I developed another decode, this time for the SAU protocol; for this project, we have focused on the analysis of errors. The client wanted to be informed proactively of errors of type “2033” returned by the MQ tier, as these errors have a significant business impact.
I developed the SAU LUA script to track all errors. In the particular case of error “2033”, an email is sent directly to the application manager, permitting him to restore service and minimize the business impact.
Here is a sample error report, simulated in the customer’s test environment and monitored by the new SAU decode.
Other decode projects are underway; Tibco is next on the list!
We now can’t imagine DC RUM without the Universal Decode. And this is just the beginning! All of these results were obtained with the Universal Decode version 12.3, which was a pre-release or Early Access Program version. In the upcoming release, in 12.4, the universal decode supports queuing architectures, multiple IP sessions, encrypted streams, and includes its own SDK.