Optimizing Python code during development

Here at Dynatrace we are constantly expanding the platform. As a member of the Platform Extensions practice I am one of the subject matter experts responsible for all services related to expanding the visibility of Dynatrace into technologies which aren’t available out of the box. We take on custom work to create plugins, API and notification middleware’s, SDK’s and OpenKit implementations. We are also responsible for what we call “turn-key” extensions which can be reused by a lot of customers. Our work over the past year has resulted in the monitoring of DataPower, f5, IBM MQ, Juniper, iSeries, Citrix NetScaler, WMI and SAP ABAP.

For this blog post I want to focus on how you can leverage Dynatrace to get a lot of insight into your plugin code. While working on the integration with Juniper I managed to resolve several bugs and achieve a major performance improvement.

Part 1 – The code as it stood

The Juniper plugin is an ActiveGate Plugin written in Python, it consists of a script that connects to a Juniper Networks device and collects some facts and metrics about it. As the plugin needs to run in less than a minute, even on very large environments I have to monitor the execution time of my code. Without monitoring the code with Dynatrace it was starting to look like this:

Dynatrace supports many languages out-of-the box by simply installing the OneAgent (such as Java, .NET, Node.js, …). Support for Python is available through the OneAgent SDK, and requires minimal code manual instrumentation. I decided to instrument my plugin code to see service flows in Dynatrace, and stop relying on logging. I was purely interested in connecting my PurePaths with my custom devices that were created by the ActiveGate Plugin. I was satisfied with the response times, or so I thought.

Part 2 – Instrumenting the code

Instrumenting the code with the OneAgent SDK could not have gone smoother, the API that Dynatrace provides is very easy to use, and it was really refreshing after having worked with other tracing frameworks in the past.

The first step is to install the Oneagent SDK using PyPI. All you need to do is execute “pip install oneagent-sdk

The next step is to instrument your code. This is how my query method looked like after it was instrumented:

As you can see, I only needed to add four lines of code:

  • Two to initialize the SDK and get an instance of it.
  • One to start a new Purepath (trace_incoming_remote_call)
  • One to track my asynchronous operations (create_in_process_link)

To instrument my asynchronous calls, I used the link created before, this is what the code looked like before the instrumentation:

This is how the code looked like after it was instrumented:

Dynatrace provides these Python Context Managers for when you want to start a new PurePath or make a remote call to some other service. It’s a very intuitive and powerful API.

Part 3 – The realization

After a couple of minutes instrumenting the code, I ran the plugin and alt-tabbed to my Dynatrace environment to see the results. Without any additional changes I had a new Service as well as PurePaths!

When I drilled down to one of the PurePaths, I got a surprising realization:

My metric gathering methods, which I wrote asynchronously, were executing one after the other. After I call device.open, every service with “metrics” on its name should be executing at the same time, but they were not.

I then realized Dynatrace had found a bug in my code that all my logging and confidence could not. After looking at the code that spawned my threads, this is what I had:

This is an error many developers faced before. I had a pool of five threads, but instead of passing my method to be called by the “ThreadPoolExecutor”, I was calling the method myself. The fix was quick and easy:

Part 4 – Results and even more improvements

After uploading the plugin with the fix, I went back to Dynatrace to see the results, with this single fix I was able to cut around 20% off the response times.

This is a localhost trial device. For production devices that have dozens of physical interfaces and hundreds of logical ones, this improvement is amazing.

I looked at a PurePath to confirm that the change was effective:

Dynatrace found the bug, it was easy to identify the wrong pattern in the PurePath, and I was able to fix it quickly. The main bottleneck was now the “Plugin facts” service. It was taking a long time to execute, and my other tasks were waiting for it.

Looking at the PurePath I started to realize that it was silly to wait for the properties (facts) to be collected and sent before starting to collect the metrics, so I made another change:

The PurePaths were now in their best shape ever, the extra change shaved another half a second off on the response time.

Without Dynatrace I would still be running my 5.4 seconds of response time. I could also have faced issues in the future, for devices that have tons of metrics and properties to report on.

This is the response time and requests chart during the whole exercise, where we can see where I made each change:

I am now incorporating Dynatrace into every plugin project, even when the code works and it’s fast. Dynatrace can help you find bugs and get more speed out of Python, the API is seamless that it’s very easy to integrate without big changes in your code.

The Juniper Networks plugin lets you monitor your Juniper Networks Devices from Dynatrace. This is an ActiveGate plugin which brings in metrics like memory utilization, network interface throughput and errors, firewall security flow statistics, connections statistics and more. We are excited to launch this extension and it is currently in EAP.

The OneAgent SDK for Python has recently graduated from early access to beta status. Feedback is welcome on roadmap thread in AnswerHub.