Amazon Elastic Kubernetes Service

Dynatrace ingests metrics for multiple preselected namespaces, including Amazon Elastic Kubernetes Service (EKS). You can view graphs per service instance, with a set of dimensions, and create custom graphs that you can pin to your dashboards.

Prerequisites

To enable monitoring for this service, you need

Add the service to monitoring

In order to view the service metrics, you must add the service to monitoring in your Dynatrace environment.

Configure service metrics

Once you add a service, Dynatrace starts automatically collecting a suite of metrics for this particular service. These are recommended metrics.

Recommended metrics:

  • Are enabled by default
  • Can't be disabled
  • Can have recommended dimensions (enabled by default, can't be disabled)
  • Can have optional dimensions (disabled by default, can be enabled)

Apart from the recommended metrics, most services have the possibility of enabling optional metrics.

Optional metrics:

  • Can be added and configured manually

Import preset dashboards

Dynatrace provides preset AWS dashboards that you can import from GitHub to your environment's dashboard page. Once you download a preset dashboard locally, there are two ways to import it.

eks-dash

Available metrics

Name Description Unit Statistics Dimensions Recommended
cluster_failed_node_count The number of failed worker nodes in the cluster Count Average ClusterName ✔️
cluster_node_count The total number of worker nodes in the cluster Count Average ClusterName ✔️
namespace_number_of_running_pods The number of pods running per namespace in the resource that is specified by the dimensions that you're using Count Average ClusterName, Namespace ✔️
node_cpu_limit The maximum number of CPU units that can be assigned to a single node in this cluster None Multi ClusterName ✔️
node_cpu_reserved_capacity The percentage of CPU units that are reserved for node components, such as kubelet, kube-proxy, and Docker Percent Multi ClusterName, InstanceId, NodeName
node_cpu_reserved_capacity Percent Multi ClusterName ✔️
node_cpu_usage_total The number of CPU units being used on the nodes in the cluster None Multi ClusterName ✔️
node_cpu_utilization The total percentage of CPU units being used on the nodes in the cluster Percent Multi ClusterName, InstanceId, NodeName ✔️
node_cpu_utilization Percent Multi ClusterName
node_filesystem_utilization The total percentage of file system capacity being used on nodes in the cluster Percent Multi ClusterName, InstanceId, NodeName ✔️
node_filesystem_utilization Percent Multi ClusterName
node_memory_limit The maximum amount of memory, in bytes, that can be assigned to a single node in this cluster Bytes Multi ClusterName ✔️
node_memory_reserved_capacity The percentage of memory currently being used on the nodes in the cluster Percent Multi ClusterName, InstanceId, NodeName
node_memory_reserved_capacity Percent Multi ClusterName ✔️
node_memory_utilization The percentage of memory currently being used by the node or nodes Percent Multi ClusterName, InstanceId, NodeName ✔️
node_memory_utilization Percent Multi ClusterName
node_memory_working_set The amount of memory, in bytes, being used in the working set of the nodes in the cluster Bytes Multi ClusterName ✔️
node_network_total_bytes The total number of bytes per second transmitted and received over the network per node in a cluster Bytes/Second Multi ClusterName, InstanceId, NodeName ✔️
node_network_total_bytes Bytes/Second Multi ClusterName ✔️
node_number_of_running_containers The number of running containers per node in a cluster Count Average ClusterName, InstanceId, NodeName ✔️
node_number_of_running_containers Count Average ClusterName
node_number_of_running_pods The number of running pods per node in a cluster Count Average ClusterName, InstanceId, NodeName ✔️
node_number_of_running_pods Count Average ClusterName
pod_cpu_reserved_capacity The CPU capacity that is reserved per pod in a cluster Percent Multi ClusterName, Namespace, PodName
pod_cpu_reserved_capacity Percent Multi ClusterName
pod_cpu_utilization The percentage of CPU units being used by pods Percent Multi ClusterName, Namespace ✔️
pod_cpu_utilization Percent Multi ClusterName, Namespace, PodName
pod_cpu_utilization Percent Multi ClusterName
pod_cpu_utilization_over_pod_limit The percentage of CPU units being used by pods that is over the pod limit Percent Multi ClusterName, Namespace
pod_cpu_utilization_over_pod_limit Percent Multi ClusterName, Namespace, PodName ✔️
pod_cpu_utilization_over_pod_limit Percent Multi ClusterName
pod_memory_reserved_capacity The percentage of memory that is reserved for pods Percent Multi ClusterName, Namespace, PodName
pod_memory_reserved_capacity Percent Multi ClusterName
pod_memory_utilization The percentage of memory currently being used by the pod or pods Percent Multi ClusterName, Namespace ✔️
pod_memory_utilization Percent Multi ClusterName, Namespace, PodName
pod_memory_utilization Percent Multi ClusterName
pod_memory_utilization_over_pod_limit The percentage of memory that is being used by pods that is over the pod limit Percent Multi ClusterName, Namespace
pod_memory_utilization_over_pod_limit Percent Multi ClusterName, Namespace, PodName ✔️
pod_memory_utilization_over_pod_limit Percent Multi ClusterName
pod_network_rx_bytes The number of bytes per second being received over the network by the pod Bytes/Second Multi ClusterName, Namespace
pod_network_rx_bytes Bytes/Second Multi ClusterName, Namespace, PodName ✔️
pod_network_rx_bytes Bytes/Second Multi ClusterName
pod_network_tx_bytes The number of bytes per second being transmitted over the network by the pod Bytes/Second Multi ClusterName, Namespace
pod_network_tx_bytes Bytes/Second Multi ClusterName, Namespace, PodName ✔️
pod_network_tx_bytes Bytes/Second Multi ClusterName
pod_number_of_container_restarts The total number of container restarts in a pod Count Sum ClusterName, Namespace, PodName ✔️
service_number_of_running_pods The number of pods running the service or services in the cluster Count Average ClusterName, Namespace, Service
service_number_of_running_pods Count Average ClusterName