In my role I am responsible for our Community and our Community Portal. In order for our Community Portal to be accepted by our users I need to ensure that our users find the content they are interested in. In a recent upgrade we added lots of new multi-media content that will make it easier for our community members to get educated on Best Practices, First Steps, … .

Error in Production: 3rd Party Plugin prevents users from accessing content

Here is what happened today when I figured out that some of our users actually had a problem accessing some of the new content. I was able to directly contact these individual users before they reported the issue. We identified the root cause of the problem and are currently working on a permanent fix preventing these problems for other users. Let me walk you through my steps.

Step 1: Verify and Ensure Functional Health

One dashboard I look at to check whether there are any errors on our Community Portal is the Functional Health Dashboard. dynaTrace comes with several Out-of-the-Box Error Detection Rules. These are rules that e.g.: check if there are any HTTP 500s, Exceptions being thrown between Application Tiers (e.g.: from our Authentication Web Service back to our Frontend System), Severe Log Messages or Exceptions when accessing the Database.

The following screenshot shows the Functional Health Dashboard. As we monitor more then just our Community Portal with dynaTrace I just filter on this application. I see that we had 14 failed transactions in the last hour. Seems we also had several unhandled exceptions and several HTTP 400s between Transaction Tiers:

dynaTrace automates error detection by analyzing every transaction against error rules. In my case I had 14 failed transactions in the last hour on our Community Portal
dynaTrace automates error detection by analyzing every transaction against error rules. In my case, I had 14 failed transactions in the last hour on our Community Portal

My First Step tells me that we have users that experience a problem.

Step 2: Analyze Errors

A click on the Error on the bottom right brings me to the error details allowing me to analyze what these errors are. The following screenshot shows the Error Dashboard with an overview of all detected Errors based on the configured Error Rules. A click on one Error Rule shows me the actual errors on the bottom. Seems we have a problem with some of our new PowerPoint slides we make available on our community portal:

The 14 errors are related to the Powerpoint Slide Integration we recently added to our Community Portal as well as some internal Confluence Problems
The 14 errors are related to the Powerpoint Slide Integration we recently added to our Community Portal as well as some internal Confluence Problems

Now I know what these errors are. Next is to identify the impacted users.

Step 3: Identify impacted User

A drill into our Business Transactions tell me which users were impacted by this problem. Turns out that we had 5 internal users (those with the short users names) and 2 actual customers having problems.

Knowing which users are impacted by this problem allows me to pro-actively contact them before they contact me
Knowing which users are impacted by this problem allows me to pro-actively contact them before they contact me

What is also interesting for me is to understand what these users were doing on our Community Portal. dynaTrace gives me the information about every Visit including all Page Actions with detailed Performance and Context Information. The following shows the activities of one of the users that experienced the problem. I can see how they got to the problematic page and whether they continued browsing for other material or whether they stopped because of this frustrating experience:

Analyzing the visit shows me where the error happened. Fortunately the user continued browsing to other material
Analyzing the visit shows me where the error happened. Fortunately the user continued browsing to other material

I now know exactly which users were impacted by the errors. I also know that even though they had a frustrating experience these users are still continuing browsing other content. Just to be safe I contacted them letting them know we are working on the problem and also sent them the content they couldn’t retrieve through the portal.

Step 4: Identify Root Cause and fix problem

My last step is to identify the actual root cause of these errors because I want these errors to be fixed as soon as possible to prevent more users from being impacted. A drill into our PurePath’s shows me that error is caused by a NullPointerException thrown by the Confluence Plugin we use to display PowerPoint’s embedded in a page.

Having a PurePath for every single request (failed or not) available makes it easy to identify problems. In this case we have a NullPointerException being thrown all the way to the WebServer leading to an HTTP 500
Having a PurePath for every single request (failed or not) available makes it easy to identify problems. In this case we have a NullPointerException being thrown all the way to the WebServer leading to an HTTP 500

dynaTrace also captures the actual exception including the stack trace giving me just the information I was looking for.

Exception Details reveal more information about the actual problem
Exception Details reveal more information about the actual problem

Conclusion

Automatic Error Detection helped me to proactively work on problems and also contact my users before they report the problem. In this particular case we identified a problem with the viewfile Confluence Plugin. In case you use it make sure you do not have path-based animations in your slides. Seems like this is the root cause of this NullPointer Exception.

For our dynaTrace Users: If you are interested in more details on how to use dynaTrace, Best Practices or Self-Guided Walkthroughs then check out our updated dynaLearn Section on our Community Portal.

For those that want more information on how to become more pro-active in your application performance management check out What’s New in dynaTrace 4.