SLOs for Kubernetes clusters: Optimize resource utilization of Kubernetes clusters with service level objectives

Published November 9, 2023 Updated July 1, 2024 8 min read

Gerhard Kleemaier

Tips and Tricks Engineering Infrastructure

Establishing SLOs for Kubernetes clusters can help organizations optimize resource utilization. A good Kubernetes SLO strategy helps teams manage and make containerized workloads more efficient.

Kubernetes is a widely used open source system for container orchestration. It allows for seamless running of containerized workloads across different environments. However, managing Kubernetes optimally can be a daunting task due to its complex architecture. Properly monitoring a Kubernetes cluster or any related environment can be difficult.

Effective resource provisioning and management is a critical aspect of a Kubernetes cluster. It involves a coordinated effort among various stakeholders, including cluster owners, application teams, and business owners. The primary goal is to allocate sufficient resources to keep the applications running smoothly without overprovisioning and incurring unnecessary costs. This delicate balance requires a collaborative approach and transparent communication among all parties involved. Service-level objectives (SLOs) play a vital role in ensuring that all stakeholders have visibility into the resources being used and the performance of the applications.

SLOs for Kubernetes clusters

SLOs are often used to monitor business-critical services and applications for customers. However, they can also be used to monitor optimization processes effectively. Essentially, SLOs track a selected service-level indicator (SLI) and continuously evaluate its behavior over a given timeframe against a fixed threshold. This feature is valuable for platform owners who want to monitor and optimize their Kubernetes environment. By considering historical behavior, SLOs provide an excellent way to track and evaluate optimization tasks. Users can continuously evaluate the system’s performance against predefined quality criteria, making SLOs for Kubernetes clusters a good option for monitoring and improving the system’s overall performance.

Efficient coordination among resource usage, requests, and allocation is critical. To optimize resource utilization in a Kubernetes cluster, all users must work together seamlessly. A Kubernetes SLO can serve as a transparent and trackable collaboration tool for different teams and collaborators.

Kubernetes stakeholders

Optimize memory utilization of your Kubernetes Namespaces

Monitoring your Kubernetes cluster allows for proactively identifying and resolving resource constraints, failures, and crashes before they impact the end-user experience and your business.

Proper Kubernetes monitoring includes utilizing observability information to optimize your environment. By gaining insights into how your Kubernetes workloads utilize computing and memory resources, you can make informed decisions about how to size and plan your infrastructure, leading to reduced costs. A Kubernetes SLO that continuously evaluates CPU, memory usage, and capacity and compares these available resources to the requested and utilized memory by Kubernetes workloads makes potential waste of resources visible, revealing opportunities for countermeasures.

When it comes to resource utilization in a Kubernetes environment, there are two main perspectives to consider based on the primary stakeholders involved. One perspective focuses on the potential for optimization at the interface between the team responsible for managing the Kubernetes cluster and the teams responsible for developing and deploying applications.

When setting up SLOs for Kubernetes clusters, it is important to choose the right metrics to track based on your objective. If your team is responsible for setting up the Kubernetes cluster, you might want to monitor and optimize the workload performance. However, if you’re part of the application team, the usage of reserved resources may differ significantly from the blocked resources.

Requests represent the number of resources being reserved or blocked for a container. Tracking the ratio between request and usage can provide valuable insights into optimization potential. As every container has defined requests for CPU and memory, these indicators are well-suited for efficiency monitoring.

One option is to continuously track the memory utilization efficiency of existing Kubernetes objects, such as namespaces or workloads. Since teams typically run multiple workloads on namespaces, using this level of abstraction is a suitable option for a Kubernetes SLO. However, if you require more granular information, you can adjust the levels for resource utilization monitoring accordingly.

SLI (Memory request efficiency)

SLOs for Kubernetes clusters: Measuring SLIs

This Kubernetes SLO measures the ratio between the requested memory and the memory used for an entire namespace. It provides insights into how efficiently the blocked resources are being utilized. Since resources that have been requested cannot be used elsewhere, the objective is to keep the difference between requested and used resources as small as possible. Utilizing such SLOs for Kubernetes clusters makes it possible to track efficiency transparently over time. The teams responsible for the cluster and the application teams that run their containers on the cluster can agree on the intended ratio between used and requested memory.

Optimize Kubernetes cluster’s resource allocation

One aspect of managing cloud resources is tracking and adjusting the ratio between the requested and used memory resources. On the other hand, the cluster owners must allocate cloud resources to meet the application team’s resource requests.

For cluster owners, monitoring resource usage at the node level provides better insights and information for taking sound actions. By tracking resource usage at each node, teams can gain insights into how many resources the entire cluster uses and whether the nodes are working correctly and efficiently.

Teams should implement suitable SLOs to continuously monitor resource usage for Kubernetes cluster nodes at the node level. These SLOs should cover metrics like node memory utilization, which involves monitoring the ratio of requested versus allocated resources or the ratio of desired versus running pods per node.

For instance, if a node’s memory or CPU utilization is high, it can lead to the undesired deletion of pods. This, in turn, can disrupt an application or service and incur additional costs for potential cluster upscaling.

A possible Kubernetes SLO for monitoring and evaluating the ratio of requested versus allocated memory can be expressed as follows:

SLI (Cluster memory efficiency)

Cluster Memory efficiency details in Dynatrace screenshot

In an ideal environment, the requested resources would almost match the allocated ones, and hence, cloud resources would be optimally used, reducing costs of invoiced but unused resources.

Ensure overall Kubernetes resource utilization efficiency

The two SLOs mentioned above provide valuable insights into the overall utilization of resources in a Kubernetes cluster and are a good starting point for improving efficiency. Although both interfaces (usage vs. request and request vs. allocation) must be analyzed together to obtain a holistic view, the split is extremely helpful in ensuring accountability. While the application teams have complete control over the usage/request SLO, the cluster owner can influence the request vs. allocation SLO.

Setting up proper SLOs to make these ratios visible and transparent does not reduce the need for collaboration and alignment, but it provides a solid foundation for optimizing resource utilization efficiency.

The outlined SLOs for Kubernetes clusters guide you in implementing SRE best practices in monitoring your Kubernetes environment. By recognizing the insights provided, you can optimize processes and improve overall efficiency. This makes it an excellent tool for collaboration between different contributors while also holding the respective parties accountable.

Service level objectives in Dynatrace screenshot

SLO Memory Utilization Efficiency dashboard tile in Dynatrace screenshot

What’s next with SLOs for Kubernetes clusters?

Increase your resource utilization efficiency with Dynatrace using the SLOs mentioned in the blog. If you want to learn how to set up SLOs with Dynatrace, have a look here:

Or check the following Kubernetes blogs if you’d like to learn more about Dynatrace capabilities to increase efficiency and reliability in your Kubernetes environment:

Optimize memory utilization of your Kubernetes Namespaces

Optimize Kubernetes cluster’s resource allocation

Ensure overall Kubernetes resource utilization efficiency

What’s next with SLOs for Kubernetes clusters?

The 3 biggest Kubernetes deployment mistakes you can make

Accelerate and empower Site Reliability Engineering with Dynatrace observability

Unified services: Easily keep track of your pipelines

Looking for answers?