Dynatrace is the only full-stack monitoring solution available in the market that enables out-of-the-box whitebox monitoring for application workloads running in containers. Dynatrace OneAgent is container-aware and comes with built-in monitoring support for vanilla Kubernetes and Red Hat OpenShift.
In this post, we’ll outline the preferred way of monitoring Kubernetes and OpenShift clusters, which includes two high-level options:
- Deploy OneAgent for full-stack visibility on all your cluster nodes
- Monitor application workloads in locked-down Kubernetes and OpenShift Online environments
Full-stack visibility into Kubernetes and OpenShift clusters
You can make full use of the strength and power of Dynatrace if OneAgent is deployed to the underlying cluster nodes. Deploying OneAgent to every cluster node not only allows you to automatically pick up and start with deep monitoring of newly started containers in the pods, but also to enable monitoring of cluster nodes, including their resource utilization.
Monitoring needs to be part of your cluster. This is why we recommend you roll-out Dynatrace OneAgent as early as possible, together with the roll-out of the cluster. You should add a Dynatrace OneAgent deployment step to your automated cluster deployment provisioning scripts, for example, Ansible (see the OneAgent role in Ansible Galaxy), Terraform templates, or BOSH.
You can of course also deploy Dynatrace OneAgent to your existing clusters by making use of the more Kubernetes-like way of deploying things. For example, the Dynatrace OneAgent Operator or a DaemonSet for Kubernetes and OpenShift. By deploying the OneAgent Operator or a DaemonSet you’ll roll out OneAgent to the cluster nodes and so will need the permissions to run privileged containers. Dynatrace OneAgent Operator also allows you to automatically update OneAgents on your nodes whenever an update in your Dynatrace environment becomes available. For details, see Dynatrace OneAgent Operator GitHub page.
Case study from one of our internal testbeds
Full-stack visibility allows you to monitor the health of worker nodes. In the following situation, one worker node became unresponsive because it was overloaded with workloads. The workloads scheduled to this node were not deployed with memory limits or CPU shares.
A freshly set up cluster (i.e., based on the default configuration) and a shortage of available memory have caused heavy swapping to the disk and a memory-hungry JVM started with garbage collection, which in turn caused CPU saturation at the worker node level.
The worker node also hosts other pods, this is why the failing
et-frontend pod also affected other pods on the node that serve several backend services and a database. Please have a look at the screenshot showing all the events and impacts that are related to the problem. Any further pods or services are automatically added to this list of affected services, and pods if they happen to share the same root cause (failing
Monitoring application workloads in pods for managed Kubernetes and OpenShift Online
Even if you have no access to the underlying cluster you can still gain visibility into the applications that are deployed in pods by adding Dynatrace OneAgent to your application deployments in locked-down Kubernetes and OpenShift Online.
Integrating OneAgent code modules into your applications is easy and straightforward. It’s basically a two-liner that you need to add to your Dockerfile for adding OneAgent code modules and setting the loader for OneAgent. Once this is done, OneAgent takes care of everything else and automatically starts monitoring once the process starts in a pod. For details, see How do I monitor Kubernetes applications only?
Monitoring workloads on a per-container basis enables visibility into technology-related metrics such as JVM suspension time, requests per second, CPU, and memory consumption, plus all the code-level insights, service-level metrics, traces and insights that you would also get with our recommended full-stack monitoring approach. With the app-only monitoring approach, you miss out on visibility at the node level. For details, see the support matrix for cloud platform deployments.
The backlog for Kubernetes and OpenShift is full of enhancements to cover even more use-cases related to monitoring containerized environments. This includes a plugin for monitoring the Kubernetes control plane by leveraging Prometheus metrics endpoints. Stay tuned!