End-to-end tracing for additional message queues with OneAgent SDK

Message queues have become an important building block of modern architectures as they enable asynchronous communication between distributed applications. Producers can add requests to the queue without waiting for them to be processed. Consumers process messages only when they are available. Queues also help increase reliability as they make the data persistent, reducing errors that occur when different parts of your system go offline.

The good news is that Dynatrace already supports most popular messaging systems out of the box, including ActiveMQ, HornetQ, IBM WebSphere MQ, IBM MQ, JMS, RabbitMQ, while we continuously improve and extend our support.

What if your messaging system isn’t yet automatically supported? Read on—you can now use the OneAgent SDK.

Queues and message tracing in Dynatrace

Dynatrace OneAgent allows you to track each request from end to end, including individual messages across queues. This enables Davis, the Dynatrace AI causation engine, to automatically identify the root causes of detected problems and analyze transactions using powerful analysis features like service flow and the service-level backtrace. Dynatrace automatically detects and monitors queues and messages in your applications for the supported messaging services.

We’re committed to providing out-of-the-box support for the most popular messaging systems. Though we sometimes receive requests from users asking for support of messaging systems that aren’t widely used and therefore aren’t yet supported. To meet this demand and enable the end-to-end tracing of any queue and messaging system, we’ve now extended the OneAgent SDK for Java, .NET, and C/C++. Support for this feature in the OneAgent SDKs for Node.js and Python will follow.

Request tracing for any messaging system, end to end

At Dynatrace we use SuperDump, an open source project created internally, to automatically analyze crash dumps generated in our test systems. SuperDump enables us to speed up the first assessment of a crash dump, by automatically preparing the analysis. Our developers can then quickly determine if the issue is already known or if they need to act upon it. Therefore, it’s an important tool for our R&D and needs to be monitored accordingly.

SuperDump, our service for automated crash-dump analysis

SuperDump uses Hangfire to schedule tasks such as downloading, analyzing, and processing the crash dumps. We’ve been monitoring SuperDump for a long time with OneAgent. Christoph Neumüller, the Dynatrace engineer maintaining this project, decided to use the OneAgent SDK to get visibility into Hangfire queues. Here’s the short snippet he added to his .NET code to monitor the download queue end to end.

Sending the message to the queue:

// initialization of the OneAgent SDK
private static IOneAgentSdk dynatraceSdk = 
   OneAgentSdkFactory.CreateInstance();
private readonly IMessagingSystemInfo messagingSystemInfo = 
   dynatraceSdk.CreateMessagingSystemInfo("Hangfire", "download", MessageDestinationType.QUEUE, ChannelType.IN_PROCESS, null);

...

// schedule download, tagging the outgoing message with OneAgent SDK
Hangfire.BackgroundJob.Enqueue<SuperDumpRepository>(repo => DownloadAndScheduleProcessFile(bundleId, url, filename));
var outgoingMessageTracer = dynatraceSdk.TraceOutgoingMessage(messagingSystemInfo);
outgoingMessageTracer.Trace(() => {
string jobId = Hangfire.BackgroundJob.Enqueue<SuperDumpRepository>(repo => DownloadAndScheduleProcessFile(bundleId, url, filename, outgoingMessageTracer.GetDynatraceByteTag()));
outgoingMessageTracer.SetVendorMessageId(jobId);
});

On the processing side:

// initialization of the OneAgent SDK
private static IOneAgentSdk dynatraceSdk = 
   OneAgentSdkFactory.CreateInstance(); 
private readonly IMessagingSystemInfo messagingSystemInfo = 
   dynatraceSdk.CreateMessagingSystemInfo("Hangfire", "download", MessageDestinationType.QUEUE, ChannelType.IN_PROCESS, null);

...
// starting the download, tagging the incoming message with OneAgent SDK
public void DownloadAndScheduleProcessFile(string bundleId, string url, string filename, byte[] dynatraceTag = null) {
   var processTracer = dynatraceSdk.TraceIncomingMessageProcess(messagingSystemInfo);
   processTracer.SetDynatraceByteTag(dynatraceTag);
   processTracer.Trace(() => 
      AsyncHelper.RunSync(() => DownloadAndScheduleProcessFileAsync(bundleId, url, filename))
   );
}

Once Christoph redeployed his code, the download queue instantly appeared in Smartscape, our environment topology visualization tool. This information is placed in the context of your overall environment.  Smartscape builds an interactive map showing how everything in the environment is interconnected, and you can clearly see which services are sending messages to the queue and which ones are processing the queue.

Hangfire download queue in Smartscape

Of course, the download queue also appears in the service flow. You can see below that 564 requests were sent to that queue and that the average queue time is 27.8 ms.

Hangfire download queue in the service flow

Best of all, all the data captured by OneAgent is automatically taken into account in the analysis, including, of course, AI-based root-cause analysis. For example, you can analyze the response time on any service page to see the contribution. You’ll see all important queue interactions in addition to how much time they contribute to overall response time.

SuperDump service response time analysis on service page

Requirements

What’s next?

  • Stay tuned for the upcoming availability of this feature in OneAgent SDKs for Node.js and Python.
  • The general availability (GA) of the OneAgent SDK will be announced later this year.

We’d love to hear your feedback! The OneAgent SDK is available on GitHub. All user contributions (from additional language bindings for the C/C++ SDK, to the reporting of minor defects, issues, or typos) are welcome. The best way for you to do this is via the GitHub repository issue tracker. You can also comment on our roadmap thread in AnswerHub.

Stay updated