Amazon Elastic Kubernetes Service (EKS) monitoring

Dynatrace ingests metrics for multiple preselected namespaces, including Amazon Elastic Kubernetes Service (EKS). You can view graphs per service instance, with a set of dimensions, and create custom graphs that you can pin to your dashboards.

Prerequisites

To enable monitoring for this service, you need

  • An Environment or Cluster ActiveGate version 1.197+
  • Dynatrace version 1.199+
  • An updated AWS monitoring policy to include the additional AWS services.
    To update the AWS IAM policy, use the JSON below, containing the monitoring policy (permissions) for all supporting services.

If you don't want to add permissions to all services, and just select permissions for certain services, consult the table below. The table contains a set of permissions that are required for all services (All monitored Amazon services) and, for each supporting service, a list of optional permissions specific to that service.

Example of JSON policy for one single service.

In this example, from the complete list of permissions you need to select

  • "apigateway:GET" for Amazon API Gateway
  • "cloudwatch:GetMetricData", "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics", "sts:GetCallerIdentity", "tag:GetResources", "tag:GetTagKeys", and "ec2:DescribeAvailabilityZones" for All monitored Amazon services.

Enable monitoring

To enable monitoring for this service, you first need to integrate Dynatrace with Amazon Web Services:

Add the service to monitoring

In order to view the service metrics, you must add the service to monitoring in your Dynatrace environment.

Cloud-service monitoring consumption

Beginning in early 2021, all cloud services consume Davis data units (DDUs). The amount of DDU consumption per service instance depends on the number of monitored metrics and their dimensions (each metric dimension results in the ingestion of 1 data point; 1 data point consumes 0.001 DDUs). For DDU consumption estimates per service instance (recommended metrics only, predefined dimensions, and assumed dimension values), see DDU consumption estimates per cloud service instance.

Monitor resources based on tags

You can choose to monitor resources based on existing AWS tags, as Dynatrace automatically imports them from service instances. Nevertheless, the transition from AWS to Dynatrace tagging isn't supported for all AWS services. Expand the table below to see which supporting services are filtered by tagging.

To monitor resources based on tags

  1. Go to Settings > Cloud and virtualization > AWS and select the AWS instance.
  2. For Resource monitoring method, select Monitor resources based on tags.
  3. Enter the Key and Value.
  4. Select Save.

Configure service metrics

Once you add a service, Dynatrace starts automatically collecting a suite of metrics for this particular service. These are recommended metrics.

Recommended metrics:

  • Are enabled by default
  • Can't be disabled
  • Can have recommended dimensions (enabled by default, can't be disabled)
  • Can have optional dimensions (disabled by default, can be enabled)

Apart from the recommended metrics, most services have the possibility of enabling optional metrics.

Optional metrics:

  • Can be added and configured manually

View service metrics

You can view the service metrics in your Dynatrace environment either on the custom device overview page or on your Dashboards page.

View metrics on the custom device overview page

To access the custom device overview page

  1. Go to Technologies on the Dynatrace navigation menu.
  2. Filter by service name and select the relevant custom device group.
  3. Once you select the custom device group, you're on the custom device group overview page.
  4. The custom device group overview page lists all instances (custom devices) belonging to the group. Select an instance to view the custom device overview page.

View metrics on your dashboard

After you add the service to monitoring, a preset dashboard containing all recommended metrics is automatically listed on your Dashboards page. To look for specific dashboards, filter by Preset and then by Name.
aws-presets
Note: For existing monitored services, you might need to resave your credentials for the preset dashboard to appear on the Dashboards page. To resave your credentials, go to Settings > Cloud and virtualization > AWS, select the desired AWS instance, and then select Save.

You can't make changes on a preset dashboard directly, but you can clone and edit it. To clone a dashboard, open the browse menu (...) and select Clone.
To remove a dashboard from the dashboards page, you can hide it. To hide a dashboard, open the browse menu (...) and select Hide.
Note: Hiding a dashboard doesn't affect other users. clone-hide-aws

To check the availability of preset dashboards for each AWS service, see the list below.

eks-dash

Available metrics

Name Description Unit Statistics Dimensions Recommended
cluster_failed_node_count The number of failed worker nodes in the cluster Count Average ClusterName ✔️
cluster_node_count The total number of worker nodes in the cluster Count Average ClusterName ✔️
namespace_number_of_running_pods The number of pods running per namespace in the resource that is specified by the dimensions that you're using Count Average ClusterName, Namespace ✔️
node_cpu_limit The maximum number of CPU units that can be assigned to a single node in this cluster None Multi ClusterName ✔️
node_cpu_reserved_capacity The percentage of CPU units that are reserved for node components, such as kubelet, kube-proxy, and Docker Percent Multi ClusterName, InstanceId, NodeName
node_cpu_reserved_capacity Percent Multi ClusterName ✔️
node_cpu_usage_total The number of CPU units being used on the nodes in the cluster None Multi ClusterName ✔️
node_cpu_utilization The total percentage of CPU units being used on the nodes in the cluster Percent Multi ClusterName, InstanceId, NodeName ✔️
node_cpu_utilization Percent Multi ClusterName
node_filesystem_utilization The total percentage of file system capacity being used on nodes in the cluster Percent Multi ClusterName, InstanceId, NodeName ✔️
node_filesystem_utilization Percent Multi ClusterName
node_memory_limit The maximum amount of memory, in bytes, that can be assigned to a single node in this cluster Bytes Multi ClusterName ✔️
node_memory_reserved_capacity The percentage of memory currently being used on the nodes in the cluster Percent Multi ClusterName, InstanceId, NodeName
node_memory_reserved_capacity Percent Multi ClusterName ✔️
node_memory_utilization The percentage of memory currently being used by the node or nodes Percent Multi ClusterName, InstanceId, NodeName ✔️
node_memory_utilization Percent Multi ClusterName
node_memory_working_set The amount of memory, in bytes, being used in the working set of the nodes in the cluster Bytes Multi ClusterName ✔️
node_network_total_bytes The total number of bytes per second transmitted and received over the network per node in a cluster Bytes/Second Multi ClusterName, InstanceId, NodeName ✔️
node_network_total_bytes Bytes/Second Multi ClusterName ✔️
node_number_of_running_containers The number of running containers per node in a cluster Count Average ClusterName, InstanceId, NodeName ✔️
node_number_of_running_containers Count Average ClusterName
node_number_of_running_pods The number of running pods per node in a cluster Count Average ClusterName, InstanceId, NodeName ✔️
node_number_of_running_pods Count Average ClusterName
pod_cpu_reserved_capacity The CPU capacity that is reserved per pod in a cluster Percent Multi ClusterName, Namespace, PodName
pod_cpu_reserved_capacity Percent Multi ClusterName
pod_cpu_utilization The percentage of CPU units being used by pods Percent Multi ClusterName, Namespace ✔️
pod_cpu_utilization Percent Multi ClusterName, Namespace, PodName
pod_cpu_utilization Percent Multi ClusterName
pod_cpu_utilization_over_pod_limit The percentage of CPU units being used by pods that is over the pod limit Percent Multi ClusterName, Namespace
pod_cpu_utilization_over_pod_limit Percent Multi ClusterName, Namespace, PodName ✔️
pod_cpu_utilization_over_pod_limit Percent Multi ClusterName
pod_memory_reserved_capacity The percentage of memory that is reserved for pods Percent Multi ClusterName, Namespace, PodName
pod_memory_reserved_capacity Percent Multi ClusterName
pod_memory_utilization The percentage of memory currently being used by the pod or pods Percent Multi ClusterName, Namespace ✔️
pod_memory_utilization Percent Multi ClusterName, Namespace, PodName
pod_memory_utilization Percent Multi ClusterName
pod_memory_utilization_over_pod_limit The percentage of memory that is being used by pods that is over the pod limit Percent Multi ClusterName, Namespace
pod_memory_utilization_over_pod_limit Percent Multi ClusterName, Namespace, PodName ✔️
pod_memory_utilization_over_pod_limit Percent Multi ClusterName
pod_network_rx_bytes The number of bytes per second being received over the network by the pod Bytes/Second Multi ClusterName, Namespace
pod_network_rx_bytes Bytes/Second Multi ClusterName, Namespace, PodName ✔️
pod_network_rx_bytes Bytes/Second Multi ClusterName
pod_network_tx_bytes The number of bytes per second being transmitted over the network by the pod Bytes/Second Multi ClusterName, Namespace
pod_network_tx_bytes Bytes/Second Multi ClusterName, Namespace, PodName ✔️
pod_network_tx_bytes Bytes/Second Multi ClusterName
pod_number_of_container_restarts The total number of container restarts in a pod Count Sum ClusterName, Namespace, PodName ✔️
service_number_of_running_pods The number of pods running the service or services in the cluster Count Average ClusterName, Namespace, Service
service_number_of_running_pods Count Average ClusterName