Amazon Elastic MapReduce (EMR)

Dynatrace ingests metrics for multiple preselected namespaces, including Amazon Elastic MapReduce (EMR). You can view metrics for each service instance, split metrics into multiple dimensions, and create custom charts that you can pin to your dashboards.

Prerequisites

To enable monitoring for this service, you need

  • An Environment or Cluster ActiveGate version 1.181+
  • Dynatrace version 1.182+
  • An updated AWS monitoring policy to include the additional AWS services.
    To update the AWS IAM policy, use the JSON below, containing the monitoring policy (permissions) for all supporting services.

If you don't want to add permissions to all services, and just select permissions for certain services, consult the table below. The table contains a set of permissions that are required for all services (All monitored Amazon services) and, for each supporting service, a list of optional permissions specific to that service.

Example of JSON policy for one single service.

In this example, from the complete list of permissions you need to select

  • "apigateway:GET" for Amazon API Gateway
  • "cloudwatch:GetMetricData", "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics", "sts:GetCallerIdentity", "tag:GetResources", "tag:GetTagKeys", and "ec2:DescribeAvailabilityZones" for All monitored Amazon services.

Enable monitoring

To enable monitoring for this service, you first need to integrate Dynatrace with Amazon Web Services:

Add the service to monitoring

In order to view the service metrics, you must add the service to monitoring in your Dynatrace environment.

Note: Once AWS supporting services are added to monitoring, you might have to wait 15-20 minutes before the metric values are displayed.

Cloud-service monitoring consumption

Beginning in early 2021, all cloud services will consume Davis Data Units (DDUs). The amount of DDU consumption per service instance depends on the number of monitored metrics and their dimensions (each metric dimension results in the ingestion of 1 data point; 1 data point consumes 0.001 DDUs). For DDU consumption estimates per service instance (recommended metrics only, predefined dimensions, and assumed dimension values), see DDU consumption estimates for per cloud service instance.

Monitor resources based on tags

You can choose to monitor resources based on existing AWS tags, as Dynatrace automatically imports them from service instances. Nevertheless, the transition from AWS to Dynatrace tagging isn't supported for all AWS services. Expand the table below to see which supporting services are filtered by tagging.

To monitor resources based on tags

  1. Go to Settings > Cloud and virtualization > AWS and select the AWS instance.
  2. For Resource monitoring method, select Monitor resources based on tags.
  3. Enter the Key and Value.
  4. Select Save.

tags-aws

Configure service metrics

Once you add a service, Dynatrace starts automatically collecting a suite of metrics for this particular service. These are recommended metrics. Apart from the recommended metrics, most services have the possibility of enabling optional metrics. You can remove or edit any of the existing metrics or any of their dimensions, where there are multiple dimensions available. Metrics consisting of only one dimension can't be edited. They can only be removed or added.

Service-wide metrics are metrics for the whole service across all regions. Typically, these metrics include dimensions containing Region in their name. If selected, these metrics are displayed on a separate chart when viewing your AWS deployment in Dynatrace. Keep in mind that available dimensions differ among services.

To change a metric's statistics, you have to recreate that metric by choosing different statistics. You can choose among the following statistics: Sum, Minimum, Maximum, Average, and Sample count. The Average + Minimum + Maximum statistics enable you to collect all three statistics as one metric instead of one statistic for three metrics separately. This can reduce your expenses for retrieving metrics from your AWS deployment.

To be able to save a newly added metric, you need to select at least one statistic and one dimension.

Note: Once AWS supporting services are configured, you might have to wait 15-20 minutes before the metric values are displayed.

View service metrics

You can view the service metrics in your Dynatrace environment either on the custom device overview page or on your Dashboards page.

View metrics on the custom device overview page

To access the custom device overview page

  1. Go to Technologies on the Dynatrace navigation menu.
  2. Filter by service name and select the relevant custom device group. cdg
  3. Once you select the custom device group, you're on the custom device group overview page.

Example of custom device group overview page cdg-ex 4. The custom device group overview page lists all instances (custom devices) belonging to the group. Select an instance to view the custom device overview page.

Example of custom device overview page cd-ex

View metrics on your dashboard

Once you add a service to monitoring, a preset dashboard for the respective service containing all recommended metrics is automatically created on your Dashboards page. You can look for specific dashboards by filtering by Preset and then by Name.
aws-presets
Note: For existing monitored services, you might need to resave your credentials for the preset dashboard to appear on the Dashboards page. To resave your credentials, go to Settings > Cloud and virtualization > AWS, select the desired AWS instance, then select Save.

You can't make changes on a preset dashboard directly, but you can clone and edit it. To clone a dashboard, open the browse menu (...) and select Clone.
To remove a dashboard from the dashboards list, you can hide it. To hide a dashboard, open the browse menu (...) and select Hide.
Note: Hiding a dashboard doesn't affect other users. clone-hide-aws

Available metrics

Name Description Unit Statistics Dimensions Recommended
AppsCompleted The number of applications submitted to YARN that have completed Count Sum JobFlowId; JobFlowId, JobId
AppsFailed The number of applications submitted to YARN that have failed to complete Count Sum JobFlowId; JobFlowId, JobId
AppsKilled The number of applications submitted to YARN that have been killed Count Sum JobFlowId; JobFlowId, JobId
AppsPending The number of applications submitted to YARN that are in a Pending state Count Sum JobFlowId; JobFlowId, JobId
AppsRunning The number of applications submitted to YARN that are running Count Sum JobFlowId ✔️
AppsRunning Count Sum JobFlowId, JobId ✔️
AppsSubmitted The number of applications submitted to YARN Count Sum JobFlowId; JobFlowId, JobId
BackupFailed Shows if the last backup failed. Set to 0 by default and updated to 1 if the previous backup attempt failed. This metric is only reported for HBase clusters. Count Sum JobFlowId; JobFlowId, JobId
CapacityRemainingGB The amount of remaining HDFS disk capacity Count Sum JobFlowId; JobFlowId, JobId
ContainerAllocated The number of resource containers allocated by the resource manager Count Sum JobFlowId; JobFlowId, JobId
ContainerPending The number of containers in the queue that have not yet been allocated Count Sum JobFlowId; JobFlowId, JobId
ContainerPendingRatio The ratio (in numbers) of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, then ContainerPendingRatio = ContainerPending. Count Sum JobFlowId; JobFlowId, JobId
ContainerReserved The number of containers reserved Count Sum JobFlowId; JobFlowId, JobId
CoreNodesPending The number of core nodes waiting to be assigned (pending requests) Count Sum JobFlowId; JobFlowId, JobId
CoreNodesRunning The number of working core nodes Count Sum JobFlowId; JobFlowId, JobId
CorruptBlocks The number of blocks that HDFS reports as corrupted Count Sum JobFlowId; JobFlowId, JobId
DfsPendingReplicationBlocks The status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests Count Sum JobFlowId; JobFlowId, JobId
HDFSBytesRead The number of bytes read from HDFS Count Sum JobFlowId; JobFlowId, JobId
HDFSBytesWritten The number of bytes written to HDFS Count Sum JobFlowId; JobFlowId, JobId
HDFSUtilization The percentage of HDFS storage currently used Percent Average JobFlowId ✔️
HDFSUtilization Percent Average JobFlowId, JobId ✔️
HbaseBackupFailed Shows if the last backup failed. Set to 0 by default and updated to 1 if the previous backup attempt failed. This metric is only reported for HBase clusters. Count Minimum JobFlowId; JobFlowId, JobId
IsIdle Indicates that a cluster is no longer performing work, but is still alive and accruing charges. Set to 1 if no tasks are running and no jobs are running, and to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. Count Minimum JobFlowId; JobFlowId, JobId ✔️
IsIdle Count Minimum JobFlowId, JobId ✔️
JobsFailed The number of jobs in the cluster that have failed Count Sum JobFlowId; JobFlowId, JobId
JobsRunning The number of jobs in the cluster that are currently running Count Sum JobFlowId; JobFlowId, JobId
LiveDataNodes The percentage of data nodes that are receiving work from Hadoop Count Sum JobFlowId; JobFlowId, JobId
LiveTaskTrackers The percentage of task trackers that are functional Percent Average JobFlowId; JobFlowId, JobId
MRActiveNodes The number of nodes presently running MapReduce tasks or jobs. Equivalent to YARN metric mapred.resourcemanager.NoOfActiveNodes Count Sum JobFlowId; JobFlowId, JobId
MRDecommissionedNodes The number of nodes allocated to MapReduce applications that have been marked in a Decommissioned state Count Sum JobFlowId; JobFlowId, JobId
MRLostNodes The number of nodes allocated to MapReduce that have been marked in a Lost state Count Sum JobFlowId; JobFlowId, JobId
MRRebootedNodes The number of nodes available to MapReduce that have been rebooted and marked in a Rebooted state Count Sum JobFlowId; JobFlowId, JobId
MRTotalNodes The number of nodes presently available to MapReduce jobs Count Sum JobFlowId; JobFlowId, JobId
MRUnhealthyNodes The number of nodes available to MapReduce jobs marked in an Unhealthy state Count Sum JobFlowId; JobFlowId, JobId
MapSlotsOpen The unused map task capacity. This is calculated as the maximum number of map tasks for a given cluster, less the total number of map tasks currently running in that cluster. Count Sum JobFlowId; JobFlowId, JobId
MapTasksRemaining The number of remaining map tasks for each job Count Sum JobFlowId; JobFlowId, JobId
MapTasksRunning The number of running map tasks for each job Count Sum JobFlowId ✔️
MapTasksRunning Count Sum JobFlowId, JobId ✔️
MemoryAllocatedMB The amount of memory allocated to the cluster Count Sum JobFlowId; JobFlowId, JobId
MemoryAvailableMB The amount of memory available for allocation Count Sum JobFlowId; JobFlowId, JobId
MemoryReservedMB The amount of memory reserved for allocation Count Sum JobFlowId; JobFlowId, JobId
MemoryTotalMB The total amount of memory in the cluster Count Sum JobFlowId; JobFlowId, JobId
MissingBlocks The number of blocks in which HDFS has no replicas. These might be corrupt blocks. Count Sum JobFlowId; JobFlowId, JobId
MostRecentBackupDuration The amount of time it took the previous backup to complete. This metric is set regardless of whether the last completed backup succeeded or failed. While the backup is ongoing, this metric returns the number of minutes after the backup started. This metric is only reported for HBase clusters. Count Sum JobFlowId; JobFlowId, JobId
PendingDeletionBlocks The number of blocks marked for deletion Count Sum JobFlowId; JobFlowId, JobId
ReduceSlotsOpen Unused reduce task capacity. This is calculated as the maximum reduce task capacity for a given cluster, less the number of reduce tasks currently running in that cluster. Count Sum JobFlowId; JobFlowId, JobId
ReduceTasksRemaining The number of remaining reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. Count Sum JobFlowId; JobFlowId, JobId
ReduceTasksRunning The number of running reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. Count Sum JobFlowId; JobFlowId, JobId
RemainingMapTasksPerSlot The ratio of the total map tasks remaining to the total map slots available in the cluster Percent Average JobFlowId; JobFlowId, JobId
S3BytesRead The number of bytes read from Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on EMR. Count Sum JobFlowId; JobFlowId, JobId
S3BytesWritten The number of bytes written to Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on EMR. Count Sum JobFlowId; JobFlowId, JobId
TaskNodesPending The number of task nodes waiting to be assigned (pending requests) Count Sum JobFlowId; JobFlowId, JobId
TaskNodesRunning The number of working task nodes Count Sum JobFlowId; JobFlowId, JobId
TimeSinceLastSuccessfulBackup The number of elapsed minutes after the last successful HBase backup started on your cluster. This metric is only reported for HBase clusters. Count Sum JobFlowId; JobFlowId, JobId
TotalLoad The total number of concurrent data transfers Count Sum JobFlowId; JobFlowId, JobId
UnderReplicatedBlocks The number of blocks that need to be replicated one or more times Count Sum JobFlowId; JobFlowId, JobId
YARNMemoryAvailablePercentage The percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB) Percent Average JobFlowId; JobFlowId, JobId