Keptn – The Autonomous Cloud control plane for Dynatrace explained

Autonomous Cloud Enablement (ACE) and Keptn – the Event-Driven Autonomous Cloud Control Plane – are helping our Dynatrace customers to automate their delivery and operations processes.

Besides our early adopter success stories, from avodag AG or amasol, we have seen lots of great innovation from Christian Heckelmann at eResearchTechnology (ERT), who integrated Keptn Quality Gates into their GitLab Pipeline which automates up to 90% of otherwise manual build approval tasks. Gone are the days for Christian manually looking at dashboards and metrics after a new build got deployed into a testing or acceptance environment:

Integrating Keptn into your existing DevOps tools such as GitLab is just a matter of an API call.
Integrating Keptn into your existing DevOps tools such as GitLab is just a matter of an API call.

There’s more from Christian and the rest of the Keptn and Autonomous Cloud community that we can all benefit from. But before we go into detail – let me first explain how we ended up with our ACE practices, how it relates to Dynatrace, and which role Keptn plays for our Dynatrace customers.

Autonomous Cloud was driven by our customers need to better automation

Our engineering and delivery teams at Dynatrace have invested a lot of time building automation into the Dynatrace Software Intelligence Platform. Our recently launched Dynatrace API v2 was focused on providing a consistent developer experience, which enables our customers and partners to implement new use cases and use the API to manage Dynatrace at scale.

The Dynatrace teams also built automation on top of Dynatrace to automate many tasks of our Continuous Delivery & Feedback (CDF) and operational processes. From our learnings on how we integrated Dynatrace into our DevOps toolchain, we advise our customers to follow our best practices around integrating delivery tools with Dynatrace, enforcing Dynatrace-based quality gates, implementing monitoring as code or automate remediation based on Dynatrace problems.

Over the past two years we brought these practices to market under the name “Autonomous Cloud” and Autonomous Cloud Enablement – which is the practice to enable your organization to become more autonomous when deploying and operating your apps & services in the new multi-hybrid cloud platforms. ACE is about automating delivery, automate quality, automate operations and automate feedback loops by integrating Dynatrace into your specific toolset:

Autonomous Cloud Enablement is about automating delivery, operations & feedback loops
Autonomous Cloud Enablement is about automating delivery, operations & feedback loops

Based on our survey from Perform 2019 we knew – while we have talked about how to automate Dynatrace into your processes for many years – many organizations are still struggling to:

  1. Automate Delivery: improve Mean Time to Innovation (MTTI)
  2. Automate Operations: reduce Mean Time to Remediate (MTTR)
Our survey showed that many of our customers are not yet mature enough when it comes to delivery and operations
Our survey showed that many of our customers are not yet mature enough when it comes to delivery and operations

While these numbers have improved over the last year, we’re not yet close to where we want to be as an industry. We came up with a four Step Autonomous Cloud Maturity Plan that gives our customers guidance on how to integrate their DevOps tools with Dynatrace to improve on MTTI & MTTR.

While this plan is a good start, we noticed that many customers spent a lot of time implementing the same use cases with similar tools leading to a lot of duplicated work. For example, implementing Quality Gates in Jenkins or Auto-Remediation Workflows with ServiceNow. To avoid duplicated work and make the implementation of these higher-level use cases easier we released Keptn – the Autonomous Cloud Control Plane which implements the key autonomous cloud use cases on top of the Dynatrace API.

To better understand how Keptn can boost your Autonomous Cloud adoption let me first walk through the maturity plan and then how you can benefit from Keptn:

4 Step Autonomous Cloud maturity plan

We have talked about these four steps in several blog posts, conference talks, YouTube tutorials and also provide an Autonomous Cloud Service Engagement to help your improve MTTI and MTTR.

The following animation walks you through these building blocks, and shows you which Dynatrace APIs to use to integrate your DevOps tools to automate key use cases such as:

Automate Operations aka NoOps as a Self-Service: Watch Self-Healing with Dynatrace and Ansible

The Autonomous Cloud Maturity Path guides you through automating monitoring, performance, delivery and operations
The Autonomous Cloud Maturity Path guides you through automating monitoring, performance, delivery and operations

The richness of the Dynatrace API allows you to integrate all your DevOps tools with Dynatrace, automatically create dashboards, setup alerts based on your SLIs (Service Level Indicators) and SLOs (Service Level Objectives), give you automated feedback for performance tests or production deployments and allow you to automate your remediation processes:

The Dynatrace API allows you to automate all relevant autonomous cloud use cases with your existing DevOps tools
The Dynatrace API allows you to automate all relevant autonomous cloud use cases with your existing DevOps tools

While we learned that some of our customers have been really successful at integrating Dynatrace through these APIs, to automate their delivery and operations processes, we also learned that we should provide higher-level APIs to make the implementation of things like an SLI/SLO-based Quality Gate as simple as a single API call instead of many calls to different Dynatrace APIs. This is now where Keptn, our Event-Driven Control Plane for Autonomous Cloud Control Plane, comes into the picture!

Keptn – the event-driven Control Plane for Autonomous Cloud

Keptn is an open source project, and we are proud that as of July 2020 we are a CNCF (Cloud Native Computing Foundation) sandbox project. You can find us on the CNCF as well as on the CDF (Continuous Delivery Foundation) landscape.

Keptn can integrate with other monitoring and observability platforms thanks to our event-driven architecture. As this blog is targeted towards Autonomous Cloud and Dynatrace I will however focus on the specific use cases that Keptn provides by integrating with Dynatrace. If you are interested in Keptn with Prometheus or other monitoring platforms please check out our tutorials.

The following animation shows what Keptn does for Dynatrace users. It provides an additional layer of automation (through the Keptn API) and a user interface (we call it the Keptn Bridge). Keptn will integrate with Dynatrace through the Dynatrace API so all Keptn needs is our Tenant API Endpoint and an API Token. From that point on it gives you access to the core use cases such as:

  • Performance Feedback as a Self Service
  • Monitoring Configuration as Code
  • Deployment Validation through SLI/SLO-based Quality Gates
  • Incident Notification and Auto-Remediation
Keptn acts as an additional automation layer on top of Dynatrace, enabling the core Autonomous Cloud use cases
Keptn acts as an additional automation layer on top of Dynatrace, enabling the core Autonomous Cloud use cases

Installation of Keptn

Keptn can be installed on a wide range of k8s platforms including my favorite which is on the lightweight k3s distribution. While Keptn itself runs on k8s, it is not limited to enabling the ACE use cases on applications that run on k8s. Remember – Keptn is an automation layer on top of APIs such as the Dynatrace API. Many of our current users are using Keptn to automate use cases for their existing enterprise applications, e.g: Quality Gates for their Java or .NET Based applications!

More tutorials on how to install Keptn can be found on https://tutorials.keptn.sh. Our plan is to also offer Keptn as part of your Dynatrace Tenant in case you don’t want to operate Keptn yourself. Dynatrace Managed is planned first, SaaS at a later time. If you are interested let us know!

Keptn automates 90% of manual build approvals through SLI/SLO-based Quality Gates

A popular use case of Keptn is the SLI/SLO-based Quality Gate capability. It automates the validation of a set of metrics (Service Level Indicators) against thresholds (Service Level Objectives). Keptn not only allows you to define static thresholds, but also allows you to compare metrics against previous evaluations such as previous builds, test runs or releases. This makes this use case very powerful and very easy to integrate into your existing CI/CD tools such as Jenkins, GitLab Pipelines, Azure DevOps or others. The results are also visualized in the Keptns Bridge either via a Heatmap or chart:

Keptn’s Quality Gate capability gives you automated feedback on builds, test runs or deployments that can easily be integrated into Jenkins, GitLab …
Keptn’s Quality Gate capability gives you automated feedback on builds, test runs or deployments that can easily be integrated into Jenkins, GitLab …

Keptn’s integration with Dynatrace not only pulls metrics for the Quality Gate evaluation, it also sends events to Dynatrace with every action Keptn executes. This gives you full audit trails in Dynatrace about what happened as part of your delivery process:

Keptn automatically links its own events (quality gate, deploy, test …) with Dynatrace through the Dynatrace Events API
Keptn automatically links its own events (quality gate, deploy, test …) with Dynatrace through the Dynatrace Events API

If you want to learn more watch my Performance Clinic on Shift-Left Performance into your Jenkins Pipeline with Keptn and Dynatrace. Whatever CI/CD tooling you integrate Keptn Quality Gates with – here are the benefits for you:

  • Easy of integration into your existing tools
  • Powerful configuration as code approach with SLIs & SLOs
  • Comparison across builds
  • Heatmap and chart visualization in the Keptn Bridge

If you have questions – join our slack channel and let us help you. While the Keptn API is easy to call from any of your CI/CD tools we can also point you to projects such as the Jenkins Shared Library, Azure DevOps Extension or the GitLab Plugin.

Keptn detect 90% of performance and scalability issues through Performance as a Self-Service

Another very popular use case is Performance as a Self-Service. This is targeted for organizations that try to establish a performance culture where engineers can get performance feedback at any time by executing some load against a deployed service or application and get automated feedback on performance & scalability behavior.

The animation below shows how you can notify Keptn about a new deployment, e.g: the URL endpoint of a service an engineer just deployed. Additionally, to that you can tell Keptn which type of test you want to execute (referred to as test strategy) and which SLIs/SLOs you want Keptn to evaluate once the test is done. Keptn will orchestrate the complete workflow for you by first executing your tests and then evaluating our SLIs/SLOs. Thanks to its event-driven architecture ANY testing tool can be triggered:

Keptn can orchestrate test execution and result analysis. This enables true Performance as a Self-Service!
Keptn can orchestrate test execution and result analysis. This enables true Performance as a Self-Service!

While JMeter is a popular open source tool, and Keptn can execute your JMeter script right in the k8s cluster where you have installed Keptn, it can also trigger other testing tools. Neotys has implemented an integration for Neoload where a test gets triggered through their Neoload Web SaaS offering. Any other testing tool that provides and automation API can easily be integrated with Keptn. In case you have questions let us know through our slack channel.

Keptn reduces MTTR by automating your production remediation

Automated remediation of production issues and building Self-Healing applications is the next big thing, but it’s also not that easy. While many of our customers have tools that allow them to execute workflows when Dynatrace detects a problem, many have asked us why Dynatrace can’t just do the same thing. With Keptn we provide that capability.

Keptn can be triggered through the Dynatrace Problem Notification integration and can then execute a remediation workflow that was defined through “Auto Remediation as Code”. Keptn will execute remediation actions, will check back with Dynatrace on whether the system is back in a healthy state and continue its workflow by either calling the next workflow steps, stopping the workflow or escalating it if auto-remediation didn’t work.

Keptn implements a remediation as code approach. It executes actions and validates the impact before moving on
Keptn implements a remediation as code approach. It executes actions and validates the impact before moving on

Keptn 0.7.0 – which is scheduled to be released in the coming weeks – will give you new capabilities of defining custom remediation actions. If you’re interested and have questions on tool integration support please let us know through our slack channel.

Keptn can modernize our delivery pipeline end-to-end through Progressive Continuous Delivery

If you have watched some of our Keptn presentations, or walked through the tutorials, then you have probably seen us demo the end-to-end delivery use case where Keptn orchestrates the complete delivery pipeline starting with deploy, test, evaluate and promote. The great thing about this is that Keptn can leverage any existing tool or pipeline for delivery and promotion, e.g: Keptn can call your existing Jenkins pipelines.

The benefit of having Keptn orchestrate the delivery pipeline is that you can step-by-step modernize your pipeline by enabling new tool integrations along the way, e.g: integrate testing, security checks or canary deployments without having to implement this into your existing pipelines. Keptn’s integration with Dynatrace also makes sure that Dynatrace is aware of every single action by sending Dynatrace Events such as Deployment, Start/Stop Test or Quality Gate Results. The following animation shows how Keptn Progressive Continuous Delivery works:

Keptn can orchestrate your delivery process by leveraging your existing assets such as pipelines for deploying.
Keptn can orchestrate your delivery process by leveraging your existing assets such as pipelines for deploying.

With the upcoming release, we introduce a new Delivery Assistant which was demanded by our earlier adopters giving you more control about which versions of your services get promoted into the next stage. Read more in the blog Advanced production support with Keptn 0.7 or ping us on slack channel if you have more questions.

Keptn can be extended with new use cases through event-driven architecture

Keptn internally simply orchestrates processes through events. This allows Keptn to decouple process definition from actual execution. The benefit of this architecture is that anyone can add new tools to the processes by simply subscribing to events such as “Deployment Finished”, “Evaluation Done” or “Problem Detected”.

Christian Heckelmann already leverages this capability at ERT by not only using the Dynatrace Keptn Service, which pushes deployment, test, quality gate and auto-remediation events to Dynatrace, but Christian has developed his own Dynatrace Synthetic Keptn Service which automates SLA Monitoring of deployed services by creating a new Dynatrace Synthetic test for newly deployed versions of a piece of software.

So, whenever his GitLab pipelines deploys a new version, Keptn helps him send the deployment information to Dynatrace (including links back to GitLab) as well as creating a Synthetic Check to ensure the newly deployed version is accessible by the end users and meets SLAs:

Keptn - The Autonomous Cloud control plane for Dynatrace explained
Keptn’s event-driven architecture enables adding new use case such as automating SLA checks.

The Keptn community has already contributed several Keptn Services that extend Keptn with new use cases. To get an overview check out the Kept-Contrib and Keptn-Sandbox GitHub repositories. If you have questions on any service or how to write your own let us know through the slack channel.

Summary: Keptn accelerates your Autonomous Cloud Journey

I hope the explanation on how Keptn can accelerate your Autonomous Cloud Journey has answered a lot of your questions you may had when first hearing about Keptn.

If you want to embark on that journey, I highly recommend using Keptn. Most users I’ve worked with start with a single use case, like Quality Gates. This gives you immediate benefit as it automates a tedious manual process. Next might be Performance as a Self-Service or your first steps into automated remediation.

Step by step Keptn helps you modernize and automate delivery and operations. If you need help feel free to reach out to the Keptn team as well as reach out to our Autonomous Cloud Services Team which are also happy to help you implement one use case at a time.

Stay updated