Header background

SLOs for Kubernetes clusters: Optimize resource utilization of Kubernetes clusters with SLOs

Establishing SLOs for Kubernetes clusters can help organizations optimize the resource utilization of their Kubernetes clusters. A good Kubernetes SLO strategy helps teams manage and make containerized workloads more efficient.

Kubernetes is a widely used open source system for container orchestration. It allows for seamless running of containerized workloads across different environments. However, managing Kubernetes optimally can be a daunting task due to its complex architecture. Properly monitoring a Kubernetes cluster or any related environment can be difficult.

Effective resource provisioning and management is a critical aspect of a Kubernetes cluster. It involves a coordinated effort among various stakeholders, including cluster owners, application teams, and business owners. The primary goal is to allocate sufficient resources to keep the applications running smoothly without overprovisioning and incurring unnecessary costs. This delicate balance requires a collaborative approach and transparent communication among all parties involved. Service-level objectives (SLOs) play a vital role in ensuring that all stakeholders have visibility into the resources being used and the performance of the applications.

SLOs for Kubernetes clusters

SLOs are often used to monitor business-critical services and applications for customers. However, they can also be used to monitor optimization processes effectively. Essentially, SLOs track a selected service-level indicator (SLI) and continuously evaluate its behavior over a given timeframe against a fixed threshold. This feature is valuable for platform owners who want to monitor and optimize their Kubernetes environment. By considering historical behavior, SLOs provide an excellent way to track and evaluate optimization tasks. Users can continuously evaluate the system’s performance against predefined quality criteria, making SLOs for Kubernetes clusters a good option for monitoring and improving the system’s overall performance.

Efficient coordination among resource usage, requests, and allocation is critical. To optimize resource utilization in a Kubernetes cluster, all users must work together seamlessly. A Kubernetes SLO can serve as a transparent and trackable collaboration tool for different teams and collaborators.

Kubernetes stakeholders

Optimize memory utilization of your Kubernetes Namespaces

Monitoring your Kubernetes cluster allows for proactive identification and resolution of resource constraints, failures, and crashes before they impact the end-user experience and your business.

Proper monitoring of Kubernetes includes utilizing observability information to optimize your environment. By gaining insights into how your Kubernetes workloads utilize computing and memory resources, you can make informed decisions about how to size and plan your infrastructure, leading to reduced costs. A Kubernetes SLO that continuously evaluates CPU, memory usage, and capacity and compares these available resources to the requested and utilized memory by Kubernetes workloads makes potential waste of resources visible, revealing opportunities for countermeasures.

When it comes to resource utilization in a Kubernetes environment, there are two main perspectives to consider based on the primary stakeholders involved. One perspective focuses on the potential for optimization at the interface between the team responsible for managing the Kubernetes cluster and the teams responsible for developing and deploying applications.

When setting up SLOs for Kubernetes clusters, it’s important to choose the right metrics to track based on your objective. If your team is responsible for setting up the Kubernetes cluster, you might want to monitor and optimize the workload performance. However, if you’re part of the application team, the usage of reserved resources may differ significantly from the blocked resources.

Requests represent the number of resources being reserved or blocked for a container. Tracking the ratio between request and usage can provide valuable insights into optimization potential. As every container has defined requests for CPU and memory, these indicators are well-suited for efficiency monitoring.

One option is to continuously track the Memory utilization efficiency of existing Kubernetes objects, such as namespaces or workloads. Since teams typically run multiple workloads on namespaces, using this level of abstraction is a suitable option for a Kubernetes SLO. However, if more granular information is required, the levels for resource utilization monitoring can be adjusted accordingly.

SLI (Memory request efficiency)

SLOs for Kubernetes clusters: Measuring SLIs

This Kubernetes SLO measures the ratio between the memory requested and the memory actually used for an entire namespace. It provides insights into how efficiently the blocked resources are being utilized. Since resources that have been requested cannot be used elsewhere, the objective is to keep the difference between requested and used resources as small as possible. By utilizing such SLOs for Kubernetes clusters, it becomes possible to track efficiency transparently over time. The teams responsible for the cluster and the application teams that run their containers on the cluster can agree on the intended ratio between used and requested memory.

Optimize Kubernetes cluster’s resource allocation

One aspect of managing cloud resources is to track and adjust the ratio between the requested and used memory resources. On the other hand, the cluster owners must allocate cloud resources to meet the resource requests from the application team.

For owners of clusters, monitoring the usage of resources at the node level provides better insights and information for taking sound actions. By tracking the usage of resources at each node, teams can gain insights into how many resources the entire cluster uses and whether the nodes are working correctly and efficiently.

To continuously monitor the usage of resources at the node level, suitable SLOs for Kubernetes cluster nodes should be put in place. These SLOs should cover metrics like node memory utilization, which involves monitoring the ratio of requested versus allocated resources, or the ratio of desired versus running pods per node.

For instance, if the memory or CPU utilization of a node is high, it can lead to the undesired deletion of pods. This, in turn, can lead to a disruption of an application or service, as well as additional costs for potential cluster upscaling.

A possible Kubernetes SLO for monitoring and evaluating the ratio of requested versus allocated memory can be expressed as follows:

SLI (Cluster memory efficiency)

Cluster Memory efficiency details in Dynatrace screenshot

In an ideal environment, the requested resources would almost match the allocated ones, and hence, cloud resources would be optimally used, reducing costs of invoiced but unused resources.

Ensure overall Kubernetes resource utilization efficiency

The two SLOs mentioned above provide valuable insights into the overall utilization of resources in a Kubernetes cluster and are a good starting point for improving efficiency. Although both interfaces (usage vs. request and request vs. allocation) must be analyzed together to obtain a holistic view, the split is extremely helpful in ensuring accountability. While the application teams have complete control over the usage/request SLO, the cluster owner can influence the request vs. allocation SLO.

Setting up proper SLOs to make these ratios visible and transparent does not reduce the need for collaboration and alignment, but it provides a solid foundation for optimizing resource utilization efficiency.

The outlined SLOs for Kubernetes clusters serve as a guide for implementing SRE best practices in monitoring your Kubernetes environment. By recognizing the insights provided, you can optimize processes and improve overall efficiency. This makes it an excellent tool for collaboration between different contributors, while also holding the respective parties accountable.

Service level objectives in Dynatrace screenshot

SLO Memory Utilization Efficiency dashboard tile in Dynatrace screenshot

What’s next with SLOs for Kubernetes clusters?

Increase your resource utilization efficiency with Dynatrace using the SLOs mentioned in the blog. If you want to learn how to set up SLOs with Dynatrace, have a look here:

Or check the following Kubernetes blogs if you’d like to learn more about Dynatrace capabilities to increase efficiency and reliability in your Kubernetes environment: