AWS Glue monitoring

Dynatrace ingests metrics for multiple preselected namespaces, including AWS Glue. You can view metrics for each service instance, split metrics into multiple dimensions, and create custom charts that you can pin to your dashboards.

Prerequisites

To enable monitoring for this service, you need

  • An Environment or Cluster ActiveGate version 1.181+
    Note: For role-based access (whether in a SaaS or Managed deployment), you need an Environment ActiveGate installed on an AWS EC2 host.
  • Dynatrace version 1.182+
  • An updated AWS monitoring policy to include the additional AWS services.

To update the AWS IAM policy, use the JSON below, containing the monitoring policy (permissions) for all supporting services.

If you don't want to add permissions to all services, and just select permissions for certain services, consult the table below. The table contains a set of permissions that are required for all services (All monitored Amazon services) and, for each supporting service, a list of optional permissions specific to that service.

Example of JSON policy for one single service.

In this example, from the complete list of permissions you need to select

  • "apigateway:GET" for Amazon API Gateway
  • "cloudwatch:GetMetricData", "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics", "sts:GetCallerIdentity", "tag:GetResources", "tag:GetTagKeys", and "ec2:DescribeAvailabilityZones" for All monitored Amazon services.

Enable monitoring

To enable monitoring for this service, you first need to integrate Dynatrace with Amazon Web Services:

Add the service to monitoring

In order to view the service metrics, you must add the service to monitoring in your Dynatrace environment.

Note: Once AWS supporting services are added to monitoring, you might have to wait 15-20 minutes before the metric values are displayed.

Cloud-service monitoring consumption

As of 2021, all cloud services consume Davis data units (DDUs). The amount of DDU consumption per service instance depends on the number of monitored metrics and their dimensions (each metric dimension results in the ingestion of 1 data point; 1 data point consumes 0.001 DDUs). For DDU consumption estimates per service instance (recommended metrics only, predefined dimensions, and assumed dimension values), see DDU consumption estimates per cloud service instance.

Monitor resources based on tags

You can choose to monitor resources based on existing AWS tags, as Dynatrace automatically imports them from service instances. Nevertheless, the transition from AWS to Dynatrace tagging isn't supported for all AWS services. Expand the table below to see which supporting services are filtered by tagging.

To monitor resources based on tags

  1. In the Dynatrace menu, go to Settings > Cloud and virtualization > AWS and select the AWS instance.
  2. For Resource monitoring method, select Monitor resources based on tags.
  3. Enter the Key and Value.
  4. Select Save.

Configure service metrics

Once you add a service, Dynatrace starts automatically collecting a suite of metrics for this particular service. These are recommended metrics. Apart from the recommended metrics, most services have the possibility of enabling optional metrics. You can remove or edit any of the existing metrics or any of their dimensions, where there are multiple dimensions available. Metrics consisting of only one dimension can't be edited. They can only be removed or added.

Service-wide metrics are metrics for the whole service across all regions. Typically, these metrics include dimensions containing Region in their name. If selected, these metrics are displayed on a separate chart when viewing your AWS deployment in Dynatrace. Keep in mind that available dimensions differ among services.

To change a metric's statistics, you have to recreate that metric by choosing different statistics. You can choose among the following statistics: Sum, Minimum, Maximum, Average, and Sample count. The Average + Minimum + Maximum statistics enable you to collect all three statistics as one metric instead of one statistic for three metrics separately. This can reduce your expenses for retrieving metrics from your AWS deployment.

To be able to save a newly added metric, you need to select at least one statistic and one dimension.

Note: Once AWS supporting services are configured, you might have to wait 15-20 minutes before the metric values are displayed.

View service metrics

You can view the service metrics in your Dynatrace environment either on the custom device overview page or on your Dashboards page.

View metrics on the custom device overview page

To access the custom device overview page

  1. In the Dynatrace menu, go to Technologies.
  2. Filter by service name and select the relevant custom device group.
  3. Once you select the custom device group, you're on the custom device group overview page.
  4. The custom device group overview page lists all instances (custom devices) belonging to the group. Select an instance to view the custom device overview page.

View metrics on your dashboard

You can also view metrics in the Dynatrace web UI on dashboards. There is no preset dashboard available for this service, but you can create your own dashboard.

To check the availability of preset dashboards for each AWS service, see the list below.

Available metrics

Name Description Statistics Unit Dimensions Recommended
glue.driver.aggregate.bytesRead The number of bytes read from all data sources by all completed Spark tasks running in all executors Sum Bytes JobName, JobRunId, Type
glue.driver.aggregate.elapsedTime The ETL elapsed time in milliseconds (doesn't include the job bootstrap times) Sum Milliseconds JobName, JobRunId, Type
glue.driver.aggregate.numCompletedStages The number of completed stages in a job Sum Count JobName, JobRunId, Type
glue.driver.aggregate.numCompletedTasks The number of completed tasks in a job Sum Count JobName, JobRunId, Type
glue.driver.aggregate.numFailedTasks The number of failed tasks in a job Sum Count JobName, JobRunId, Type
glue.driver.aggregate.numKilledTasks The number of tasks killed in a job Sum Count JobName, JobRunId, Type
glue.driver.aggregate.recordsRead The number of records read from all data sources by all completed Spark tasks running in all executors Sum Count JobName, JobRunId, Type
glue.driver.aggregate.shuffleBytesWritten The number of bytes written by all executors to shuffle data between them since the previous report (aggregated by the AWS Glue metrics dashboard as the number of bytes written for this purpose during the previous minute) Sum Bytes JobName, JobRunId, Type
glue.driver.aggregate.shuffleLocalBytesRead The number of bytes read by all executors to shuffle data between them since the previous report (aggregated by the AWS Glue metrics dashboard as the number of bytes read for this purpose during the previous minute) Sum Bytes JobName, JobRunId, Type
glue.driver.BlockManager.disk.diskSpaceUsed_MB The number of megabytes of disk space used across all executors Average Megabytes JobName, JobRunId, Type
glue.driver.ExecutorAllocationManager.executors.numberAllExecutors The number of actively running job executors Average Count JobName, JobRunId, Type
glue.driver.ExecutorAllocationManager.executors.numberMaxNeededExecutors The number of maximum (actively running and pending) job executors needed to satisfy the current load Maximum Count JobName, JobRunId, Type
glue.driver.jvm.heap.usage The fraction of memory used by the JVM heap (scale: 0-1) for this driver Average Percent JobName, JobRunId, Type ✔️
glue.ALL.jvm.heap.usage The fraction of memory used by the JVM heap (scale: 0-1) for all executors Average Percent JobName, JobRunId, Type ✔️
glue.driver.jvm.heap.used The number of memory bytes used by the JVM heap for the driver Average Bytes JobName, JobRunId, Type
glue.ALL.jvm.heap.used The number of memory bytes used by the JVM heap for all executors Average Bytes JobName, JobRunId, Type
glue.driver.s3.filesystem.read_bytes The number of bytes read from Amazon S3 by the driver since the previous report (aggregated by the AWS Glue metrics dashboard as the number of bytes read during the previous minute) Sum Bytes JobName, JobRunId, Type
glue.ALL.s3.filesystem.read_bytes The number of bytes read from Amazon S3 by all executors since the previous report (aggregated by the AWS Glue metrics dashboard as the number of bytes read during the previous minute) Sum Bytes JobName, JobRunId, Type ✔️
glue.driver.s3.filesystem.write_bytes The number of bytes written to Amazon S3 by the driver since the previous report (aggregated by the AWS Glue metrics dashboard as the number of bytes written during the previous minute) Sum Bytes JobName, JobRunId, Type
glue.ALL.s3.filesystem.write_bytes The number of bytes written to Amazon S3 by all executors since the previous report (aggregated by the AWS Glue metrics dashboard as the number of bytes written during the previous minute) Sum Bytes JobName, JobRunId, Type ✔️
glue.driver.system.cpuSystemLoad The fraction of CPU system load used (scale: 0-1) by the driver Average Percent JobName, JobRunId, Type
glue.ALL.system.cpuSystemLoad The fraction of CPU system load used (scale: 0-1) by all executors Average Percent JobName, JobRunId, Type