Monitoring Windows Workstations with Dynatrace: An IT helpdesk case study

During one of my latest Performance Clinics, I started chatting with Chad Turner from NYCM Insurance. He wanted to get some information on the Dynatrace UFO, but then the conversation changed towards how they leverage Dynatrace. I told him: If you have stories and best practices you want to tell just let me know and I will convert it into a blog. Well – here we are! 😊

Issue: Crashing websites on Windows Desktops

NYCM’s end users in their different office locations use Windows Desktops for their day-to-day work. Some of their users started to complain about crashes when accessing their internal web portals to process their jobs. This web portal also accesses a local home-grown application called PACE, which is installed on every desktop. NYCM’s help desk has worked on these crash reports but didn’t come up with a solution as they happened sporadically and couldn’t be reproduced!

Solution: Dynatrace OneAgent on Windows Desktops

Chad heard about these crashes and proposed installing Dynatrace OneAgent’s on these windows desktops. In order to better organize the monitored data with the rest of the Dynatrace monitoring landscape, they decided to put these OneAgent’s into a special host group called “End-User-PCs” for all these desktop hosts. Host groups give them the flexibility to filter, build specific dashboards but also set custom maintenance windows and setup specific alerting, e.g: you don’t want an alert in case an employee turns off his desktop at the end of the day.

They also invested in automated host tagging so that every host was automatically tagged with information such as Organizational Department, Vlan, ESXi Datashare …

Here is the first screenshot of the Dynatrace host details for one of their hosts – showing the meta data and tags as well as key performance metrics that get monitored:

OneAgent picks up additional metadata about location, user, department … - this data can also be used for tags
OneAgent picks up additional metadata about location, user, department … – this data can also be used for tags

The OneAgent also monitors all processes running on this machine including all network connections, log files and is also keeping an eye on key events on that machine, e.g: deployment changes, process restarts or crashes. The following screenshot shows the second part of the host details dialog:

OneAgent shows us when the machine was up&running (availability), the running processes and key events (crashes, restarts, deployment changes)
OneAgent shows us when the machine was up&running (availability), the running processes and key events (crashes, restarts, deployment changes)

Quick observations: When looking at the data above we can see when this machine was used during business hours and shut down during weekends and nights. We also see a high volume of process crashes which are automatically detected. Lets dig deeper in this!

Analysis: Crashing custom application leads to user experience problem

As we said in the beginning – the bad user experience explanation given to the Helpdesk team was never detailed enough to pinpoint the problem to a specific area. With the data that the OneAgent collected and analyzed, Chad & team could immediately point to the problem (the crashing home-grown application) and the root cause (exhausted process memory).

Step 1: Explore the process crashes

Their home-grown app – PaceExplorer.exe – kept crashing 9 times within one hour. This is automatically detected by Dynatrace
Their home-grown app – PaceExplorer.exe – kept crashing 9 times within one hour. This is automatically detected by Dynatrace

Step 2: Explore the crash dump

In the screenshot above we can see the “Process crash details” button which is a convenient feature of Dynatrace as it provides crash details – fully automatic. Here is the first part of the crash dump – including information about the exception that caused the process to crash:

Crash details get automatically captured and provide insights developers can use to fix the crash
Crash details get automatically captured and provide insights developers can use to fix the crash

The crash information alone is already great to pass on to developers. Additionally, Chad and team looked into traffic patterns of the Pace Application at the time of the crash. Dynatrace is also recording this type of data. In their instance, it showed them that the problem was not a result of high or unusual traffic or transactions.

Step 3: Pass the information to development

Here is the information they could pass on to the application development team to address the problem

  1. Actual user who experienced the issue: user is captured as Dynatrace Tag
  2. The traffic behavior of the app when it crashed: basic metrics OneAgent captures for every process
  3. Full Crash dump of all crashes that occurred

Conclusion: OneAgent’s to support IT helpdesks

In most of our use cases and stories, we write about monitoring microservice applications in hybrid-cloud environments and how Dynatrace DAVIS – the deterministic AI – detects problems and root cause in these very complex environments. Thanks to Chad – who was willing to share his story – we get to learn about these other use cases that are very critical to any enterprise organization.

If you want to give this a try yourself simply sign up for the Dynatrace Trial (either SaaS or On-Premises), install the OneAgents on your workstations, and dig into the data it provides in case your end users experience any problems. Remember leveraging meta data, tags & host groups to better organize your monitoring data as well as to streamline your alerting!

If you happen to have your own stories just drop me a note – happy to share it!

Stay updated