How to automate Canary Release decisions with Dynatrace

Progressive Delivery enables speeding up while managing the risk of software deployments and configuration changes. One of the aspects of progressive delivery is using new zero-downtime deployment strategies such as Canary, Blue-Green, or Feature Flags. Those strategies allow development teams to decouple the tasks of deployment (rolling out a new binary to production) from releasing (making it accessible by your end-users). Monitoring and observability provide insights into how end-users react to the new version. Based on that data – like shown in the following Dynatrace Canary Tracker dashboard – teams can make better-informed decisions on whether to keep the new version or better roll back to the previous known good to ensure no Service Level Objectives (SLOs) are violated.

Using Dynatrace results in better progressive delivery decisions based on automated version detection, SLO monitoring and anomaly detection
Using Dynatrace results in better progressive delivery decisions based on automated version detection, SLO monitoring, and anomaly detection

Dynatrace to automate your Canary Releases decisions

While progressive delivery deployment concepts have been around for a while, it’s taken time for development teams to adopt these ways of working. At Dynatrace we have been doing progressive delivery at cloud-scale for a while, and recently we observed an influx in queries from our community wanting to learn how Dynatrace can:

  • Keep oversight in releases: which version(s) are deployed where and are any of them having problems?
  • Compare releases and look out for architectural regressions by having transactional level version awareness
  • Ensure your SLOs are met while rolling out a Canary
  • Detect problematic Canary deployments quickly and trigger automated remediation

To learn how Dynatrace helps with the above points and with that drives better automated progressive delivery decisions I reached out to Kristof Renders, Autonomous Cloud Practice Manager at Dynatrace and, in the video below, Kristof explains how canary deployments work, how Dynatrace identifies canary version and release events, and then walks us through a full end-to-end demo including:

  • The deployment and gradual rollout of a new version as a canary
  • The automatic monitoring of canary specific SLOs in Dynatrace
  • Dynatrace triggering automatic rollback in case of a problem BEFORE the SLO is impacted

Watch the video here, open it on YouTube or watch it on Dynatrace University (including access to slides):

If you want to try this yourself then just follow the guidance from Kristof. To make it easier let me walk you through the things I have learned in this video so you can bring Dynatrace Cloud Automation to your progressive delivery process.

Let Dynatrace help you with your Canary Deployments

As mentioned above I encourage you to watch Kristof’s video closely and also download the slides from Dynatrace University. I learned a lot while watching him and so I thought I highlight those things I took away that will allow you to enable Dynatrace to automatically track your canary release decisions.

Step 1: Release and Build Version Meta Data

The new release awareness and version tracking feature of the Dynatrace Cloud Automation solution is based on specific metadata that is pulled from the PGI (Process Group Instance) that Dynatrace monitors. Whether your processes run in a container on k8s, on a VM, or even in the mainframe you can follow the guideline in your doc on Version Detection. The minimum is to pass release version, application, and environment name. Optionally you can also pass the build version and can send a deployment event from your deployment automation tool. This will all feed the information in the release inventory screen as shown here:

Release meta data from k8s labels or defined as environment variables feed the real time release inventory view
Release meta data from k8s labels or defined as environment variables feed the real time release inventory view

Step 2: Version based distributed trace analysis

The metadata not only feeds the real-time release inventory screen but also adds context to each capture PurePath (=Distributed Trace). For those of you that are familiar with Dynatrace know that PurePath is the base of many diagnostics and analysis use cases. You can analyze hotspots for a particular transaction from a specific canary, you can create calculated metrics split by release version and use those metrics for dashboards or alerting:

Version metadata is automatically available on each PurePath – enabling version-specific diagnostics, analytics, and alerting use cases
Version metadata is automatically available on each PurePath – enabling version-specific diagnostics, analytics, and alerting use cases

Step 3: SLOs

Dynatrace has been supporting SLOs for a while now. Having metrics with version information, e.g, Error Rate for Canary A, Error Rate for Canary B allows you to define SLOs for your service overall but also for individual canaries. Once defined you can put them on a dashboard, use them for reporting, and also get alerted in case a detected anomaly might put your SLO at risk:

Service Level Objectives for each canary gives you extra confidence when making deployment decisions
Service Level Objectives for each canary gives you extra confidence when making deployment decisions

Step 4: Auto-remediation based on Davis Anomaly Detection

The cherry on top is that Dynatrace Davis – our deterministic AI – automatically detects anomalies on specific canaries. Those problems are reported back to your SLOs and act as an early warning signal. Best of all you can trigger your auto-remediation to revert or roll back your canary deployment before it impacts your SLOs – fully automatically:

Dynatrace Davis detects problems on your canaries enabling auto-remediation to fix problems before impacting your SLOs
Dynatrace Davis detects problems on your canaries enabling auto-remediation to fix problems before impacting your SLOs

What holds you back from giving this a try?

Whether you already do blue/green, canary, feature flagging, or if you are just deploying regular monolithic apps, passing metadata to your deployments enables many exciting use cases for all applications and services. The worst thing you can do is not doing anything at all. So – go ahead! Add the version metadata, push the deployment events, create those calculated metrics, define your SLOs, build some nice dashboards and see how this makes your daily decisions so much easier.

And, for more details and extended observability use cases, I recommend watching my Progressive Delivery Conf talk Better Progressive Delivery Decisions Through Observability.

Stay updated