Amazon MSK (Kafka) monitoring

Dynatrace ingests metrics for multiple preselected namespaces, including Amazon MSK (Kafka). You can view metrics for each service instance, split metrics into multiple dimensions, and create custom charts that you can pin to your dashboards.

Prerequisites

To enable monitoring for this service, you need

  • An Environment or Cluster ActiveGate version 1.197+
    Note: For role-based access (whether in a SaaS or Managed deployment), you need an Environment ActiveGate installed on an AWS EC2 host.
  • Dynatrace version 1.203+
  • An updated AWS monitoring policy to include the additional AWS services.

To update the AWS IAM policy, use the JSON below, containing the monitoring policy (permissions) for all supporting services.

If you don't want to add permissions to all services, and just select permissions for certain services, consult the table below. The table contains a set of permissions that are required for all services (All monitored Amazon services) and, for each supporting service, a list of optional permissions specific to that service.

Example of JSON policy for one single service.

In this example, from the complete list of permissions you need to select

  • "apigateway:GET" for Amazon API Gateway
  • "cloudwatch:GetMetricData", "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics", "sts:GetCallerIdentity", "tag:GetResources", "tag:GetTagKeys", and "ec2:DescribeAvailabilityZones" for All monitored Amazon services.

Enable monitoring

To enable monitoring for this service, you first need to integrate Dynatrace with Amazon Web Services:

Add the service to monitoring

In order to view the service metrics, you must add the service to monitoring in your Dynatrace environment.

Cloud-service monitoring consumption

As of 2021, all cloud services consume Davis data units (DDUs). The amount of DDU consumption per service instance depends on the number of monitored metrics and their dimensions (each metric dimension results in the ingestion of 1 data point; 1 data point consumes 0.001 DDUs).

Monitor resources based on tags

You can choose to monitor resources based on existing AWS tags, as Dynatrace automatically imports them from service instances. Nevertheless, the transition from AWS to Dynatrace tagging isn't supported for all AWS services. Expand the table below to see which supporting services are filtered by tagging.

To monitor resources based on tags

  1. In the Dynatrace menu, go to Settings > Cloud and virtualization > AWS and select Edit for the desired AWS instance.
  2. For Resources to be monitored, select Monitor resources selected by tags.
  3. Enter the Key and Value.
  4. Select Save.

Configure service metrics

Once you add a service, Dynatrace starts automatically collecting a suite of metrics for this particular service. These are recommended metrics.

Recommended metrics:

  • Are enabled by default
  • Can't be disabled
  • Can have recommended dimensions (enabled by default, can't be disabled)
  • Can have optional dimensions (disabled by default, can be enabled)

Apart from the recommended metrics, most services have the possibility of enabling optional metrics.

Optional metrics:

  • Can be added and configured manually

View service metrics

You can view the service metrics in your Dynatrace environment either on the custom device overview page or on your Dashboards page.

View metrics on the custom device overview page

To access the custom device overview page

  1. In the Dynatrace menu, go to Technologies and processes.
  2. Filter by service name and select the relevant custom device group.
  3. Once you select the custom device group, you're on the custom device group overview page.
  4. The custom device group overview page lists all instances (custom devices) belonging to the group. Select an instance to view the custom device overview page.

View metrics on your dashboard

After you add the service to monitoring, a preset dashboard containing all recommended metrics is automatically listed on your Dashboards page. To look for specific dashboards, filter by Preset and then by Name.
aws-presets
Note: For existing monitored services, you might need to resave your credentials for the preset dashboard to appear on the Dashboards page. To resave your credentials, go to Settings > Cloud and virtualization > AWS, select the desired AWS instance, and then select Save.

You can't make changes on a preset dashboard directly, but you can clone and edit it. To clone a dashboard, open the browse menu () and select Clone.
To remove a dashboard from the dashboards page, you can hide it. To hide a dashboard, open the browse menu () and select Hide.
Note: Hiding a dashboard doesn't affect other users. clone-hide-aws

To check the availability of preset dashboards for each AWS service, see the list below.

msk

Available metrics

Name Description Unit Statistics Dimensions Recommended
ActiveControllerCount Only one controller per cluster should be active at any given time. Count Multi Cluster Name ✔️
ActiveControllerCount Count Sum Cluster Name ✔️
BytesInPerSec The number of bytes per second received from clients Bytes/Second Multi Cluster Name, Broker ID
BytesInPerSec Bytes/Second Multi Cluster Name, Broker ID, Topic
BytesInPerSec Bytes/Second Sum Cluster Name, Broker ID
BytesInPerSec Bytes/Second Sum Cluster Name, Broker ID, Topic
BytesOutPerSec The number of bytes per second sent to clients Bytes/Second Multi Cluster Name, Broker ID
BytesOutPerSec Bytes/Second Multi Cluster Name, Broker ID, Topic
BytesOutPerSec Bytes/Second Sum Cluster Name, Broker ID
BytesOutPerSec Bytes/Second Sum Cluster Name, Broker ID, Topic
CPUCreditBalance The number of earned credits Count Multi Cluster Name, Broker ID
CPUCreditBalance Count Sum Cluster Name, Broker ID
CPUCreditUsage The number of used credits Count Multi Cluster Name, Broker ID
CPUCreditUsage Count Sum Cluster Name, Broker ID
CpuIdle The percentage of CPU idle time Percent Multi Cluster Name, Broker ID ✔️
CpuIdle Percent Sum Cluster Name, Broker ID ✔️
CpuSystem The percentage of CPU in kernel space Percent Multi Cluster Name, Broker ID ✔️
CpuSystem Percent Sum Cluster Name, Broker ID ✔️
CpuUser The percentage of CPU in user space Percent Multi Cluster Name, Broker ID ✔️
CpuUser Percent Sum Cluster Name, Broker ID ✔️
FetchConsumerLocalTimeMsMean The mean time in milliseconds that the consumer request is processed at the leader Milliseconds Multi Cluster Name, Broker ID
FetchConsumerLocalTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchConsumerRequestQueueTimeMsMean The mean time in milliseconds that the consumer request waits in the request queue Milliseconds Multi Cluster Name, Broker ID
FetchConsumerRequestQueueTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchConsumerResponseQueueTimeMsMean The mean time in milliseconds that the consumer request waits in the response queue Milliseconds Multi Cluster Name, Broker ID
FetchConsumerResponseQueueTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchConsumerResponseSendTimeMsMean Milliseconds Multi Cluster Name, Broker ID
FetchConsumerResponseSendTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchConsumerTotalTimeMsMean The mean total time in milliseconds that consumers spend on fetching data from the broker Milliseconds Multi Cluster Name, Broker ID
FetchConsumerTotalTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchFollowerLocalTimeMsMean The mean time in milliseconds that the follower request is processed at the leader Milliseconds Multi Cluster Name, Broker ID
FetchFollowerLocalTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchFollowerRequestQueueTimeMsMean The mean time in milliseconds that the follower request waits in the request queue Milliseconds Multi Cluster Name, Broker ID
FetchFollowerRequestQueueTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchFollowerResponseQueueTimeMsMean The mean time in milliseconds that the follower request waits in the response queue Milliseconds Multi Cluster Name, Broker ID
FetchFollowerResponseQueueTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchFollowerResponseSendTimeMsMean The mean time in milliseconds for the follower to send a response Milliseconds Multi Cluster Name, Broker ID
FetchFollowerResponseSendTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchFollowerTotalTimeMsMean The mean total time in milliseconds that followers spend on fetching data from the broker Milliseconds Multi Cluster Name, Broker ID
FetchFollowerTotalTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchMessageConversionsPerSec The number of fetch message conversions per second for the broker Count/Second Multi Cluster Name, Broker ID
FetchMessageConversionsPerSec Count/Second Multi Cluster Name, Broker ID, Topic
FetchMessageConversionsPerSec Count/Second Sum Cluster Name, Broker ID
FetchMessageConversionsPerSec Count/Second Sum Cluster Name, Broker ID, Topic
FetchMessageConversionsTimeMsMean The mean total time in milliseconds that messages being fetched spend converting Milliseconds Multi Cluster Name, Broker ID
FetchMessageConversionsTimeMsMean Milliseconds Sum Cluster Name, Broker ID
FetchThrottleByteRate The number of throttled bytes per second Bytes/Second Multi Cluster Name, Broker ID
FetchThrottleByteRate Bytes/Second Sum Cluster Name, Broker ID
FetchThrottleQueueSize The number of messages in the throttle queue Count Multi Cluster Name, Broker ID
FetchThrottleQueueSize Count Sum Cluster Name, Broker ID
FetchThrottleTime The average fetch throttle time in milliseconds Milliseconds Multi Cluster Name, Broker ID
FetchThrottleTime Milliseconds Sum Cluster Name, Broker ID
GlobalPartitionCount Total number of partitions across all brokers in the cluster Count Multi Cluster Name ✔️
GlobalPartitionCount Count Sum Cluster Name ✔️
GlobalTopicCount Total number of topics across all brokers in the cluster Count Multi Cluster Name ✔️
GlobalTopicCount Count Sum Cluster Name ✔️
KafkaAppLogsDiskUsed The percentage of disk space used for application logs Percent Multi Cluster Name, Broker ID ✔️
KafkaAppLogsDiskUsed Percent Sum Cluster Name, Broker ID ✔️
KafkaDataLogsDiskUsed The percentage of disk space used for data logs Percent Multi Cluster Name, Broker ID ✔️
KafkaDataLogsDiskUsed Percent Sum Cluster Name, Broker ID ✔️
LeaderCount The number of leader replicas Count Multi Cluster Name, Broker ID
LeaderCount Count Sum Cluster Name, Broker ID
MemoryBuffered The size in bytes of buffered memory for the broker Bytes Multi Cluster Name, Broker ID ✔️
MemoryBuffered Bytes Sum Cluster Name, Broker ID ✔️
MemoryCached The size in bytes of cached memory for the broker Bytes Multi Cluster Name, Broker ID ✔️
MemoryCached Bytes Sum Cluster Name, Broker ID ✔️
MemoryFree The size in bytes of memory that is free and available for the broker Bytes Multi Cluster Name, Broker ID ✔️
MemoryFree Bytes Sum Cluster Name, Broker ID ✔️
MemoryUsed The size in bytes of memory that is in use for the broker Bytes Multi Cluster Name, Broker ID ✔️
MemoryUsed Bytes Sum Cluster Name, Broker ID ✔️
MessagesInPerSec The number of incoming messages per second for the broker Count/Second Multi Cluster Name, Broker ID
MessagesInPerSec Count/Second Multi Cluster Name, Broker ID, Topic
MessagesInPerSec Count/Second Sum Cluster Name, Broker ID
MessagesInPerSec Count/Second Sum Cluster Name, Broker ID, Topic
NetworkProcessorAvgIdlePercent The average percentage of the time the network processors are idle Percent Multi Cluster Name, Broker ID
NetworkProcessorAvgIdlePercent Percent Sum Cluster Name, Broker ID
NetworkRxDropped The number of dropped receive packages Count Multi Cluster Name, Broker ID ✔️
NetworkRxDropped Count Sum Cluster Name, Broker ID ✔️
NetworkRxErrors The number of network receive errors for the broker Count Multi Cluster Name, Broker ID ✔️
NetworkRxErrors Count Sum Cluster Name, Broker ID ✔️
NetworkRxPackets The number of packets received by the broker Count Multi Cluster Name, Broker ID ✔️
NetworkRxPackets Count Sum Cluster Name, Broker ID ✔️
NetworkTxDropped The number of dropped transmit packages Count Multi Cluster Name, Broker ID ✔️
NetworkTxDropped Count Sum Cluster Name, Broker ID ✔️
NetworkTxErrors The number of network transmit errors for the broker Count Multi Cluster Name, Broker ID ✔️
NetworkTxErrors Count Sum Cluster Name, Broker ID ✔️
NetworkTxPackets The number of packets transmitted by the broker Count Multi Cluster Name, Broker ID ✔️
NetworkTxPackets Count Sum Cluster Name, Broker ID ✔️
OfflinePartitionsCount Total number of partitions that are offline in the cluster Count Multi Cluster Name ✔️
OfflinePartitionsCount Count Sum Cluster Name ✔️
PartitionCount The number of partitions for the broker Count Multi Cluster Name, Broker ID
PartitionCount Count Sum Cluster Name, Broker ID
ProduceLocalTimeMsMean The mean time in milliseconds for the follower to send a response Milliseconds Multi Cluster Name, Broker ID
ProduceLocalTimeMsMean Milliseconds Sum Cluster Name, Broker ID
ProduceMessageConversionsPerSec The number of produce message conversions per second for the broker Count/Second Multi Cluster Name, Broker ID
ProduceMessageConversionsPerSec Count/Second Multi Cluster Name, Broker ID, Topic
ProduceMessageConversionsPerSec Count/Second Sum Cluster Name, Broker ID
ProduceMessageConversionsPerSec Count/Second Sum Cluster Name, Broker ID, Topic
ProduceMessageConversionsTimeMsMean The mean time in milliseconds spent on message format conversions Milliseconds Multi Cluster Name, Broker ID
ProduceMessageConversionsTimeMsMean Milliseconds Sum Cluster Name, Broker ID
ProduceRequestQueueTimeMsMean The mean time in milliseconds that request messages spend in the queue Milliseconds Multi Cluster Name, Broker ID
ProduceRequestQueueTimeMsMean Milliseconds Sum Cluster Name, Broker ID
ProduceResponseQueueTimeMsMean The mean time in milliseconds that response messages spend in the queue Milliseconds Multi Cluster Name, Broker ID
ProduceResponseQueueTimeMsMean Milliseconds Sum Cluster Name, Broker ID
ProduceResponseSendTimeMsMean The mean time in milliseconds spent on sending response messages Milliseconds Multi Cluster Name, Broker ID
ProduceResponseSendTimeMsMean Milliseconds Sum Cluster Name, Broker ID
ProduceThrottleByteRate The number of throttled bytes per second Bytes/Second Multi Cluster Name, Broker ID
ProduceThrottleByteRate Bytes/Second Sum Cluster Name, Broker ID
ProduceThrottleQueueSize The number of messages in the throttle queue Count Multi Cluster Name, Broker ID
ProduceThrottleQueueSize Count Sum Cluster Name, Broker ID
ProduceThrottleTime The average produce throttle time in milliseconds Milliseconds Multi Cluster Name, Broker ID
ProduceThrottleTime Milliseconds Sum Cluster Name, Broker ID
ProduceTotalTimeMsMean The mean produce time in milliseconds Milliseconds Multi Cluster Name, Broker ID
ProduceTotalTimeMsMean Milliseconds Sum Cluster Name, Broker ID
RequestBytesMean The mean number of request bytes for the broker Bytes Multi Cluster Name, Broker ID
RequestBytesMean Bytes Sum Cluster Name, Broker ID
RequestExemptFromThrottleTime The average time in milliseconds spent in broker network and I/O threads to process requests that are exempt from throttling Milliseconds Multi Cluster Name, Broker ID
RequestExemptFromThrottleTime Milliseconds Sum Cluster Name, Broker ID
RequestHandlerAvgIdlePercent The average percentage of the time the request handler threads are idle Percent Multi Cluster Name, Broker ID
RequestHandlerAvgIdlePercent Percent Sum Cluster Name, Broker ID
RequestThrottleQueueSize The number of messages in the throttle queue Count Multi Cluster Name, Broker ID
RequestThrottleQueueSize Count Sum Cluster Name, Broker ID
RequestThrottleTime The average request throttle time in milliseconds Milliseconds Multi Cluster Name, Broker ID
RequestThrottleTime Milliseconds Sum Cluster Name, Broker ID
RequestTime The average time in milliseconds spent in broker network and I/O threads to process requests Milliseconds Multi Cluster Name, Broker ID
RequestTime Milliseconds Sum Cluster Name, Broker ID
RootDiskUsed The percentage of the root disk used by the broker Percent Multi Cluster Name, Broker ID ✔️
RootDiskUsed Percent Sum Cluster Name, Broker ID ✔️
SwapFree The size in bytes of swap memory that is available for the broker Bytes Multi Cluster Name, Broker ID ✔️
SwapFree Bytes Sum Cluster Name, Broker ID ✔️
SwapUsed The size in bytes of swap memory that is in use for the broker Bytes Multi Cluster Name, Broker ID ✔️
SwapUsed Bytes Sum Cluster Name, Broker ID ✔️
UnderMinIsrPartitionCount The number of under minIsr partitions for the broker Count Multi Cluster Name, Broker ID
UnderMinIsrPartitionCount Count Sum Cluster Name, Broker ID
UnderReplicatedPartitions The number of under-replicated partitions for the broker Count Multi Cluster Name, Broker ID
UnderReplicatedPartitions Count Sum Cluster Name, Broker ID
ZooKeeperRequestLatencyMsMean Mean latency in milliseconds for ZooKeeper requests from broker Milliseconds Multi Cluster Name, Broker ID ✔️
ZooKeeperRequestLatencyMsMean Milliseconds Multi Cluster Name ✔️
ZooKeeperRequestLatencyMsMean Milliseconds Sum Cluster Name, Broker ID ✔️
ZooKeeperRequestLatencyMsMean Milliseconds Sum Cluster Name ✔️
ZooKeeperSessionState Connection status of broker's ZooKeeper session which may be one of the following: NOT_CONNECTED: 0.0, ASSOCIATING: 0.1, CONNECTING: 0.5, CONNECTEDREADONLY: 0.8, CONNECTED: 1.0, CLOSED: 5.0, AUTH_FAILED: 10.0. Count Multi Cluster Name, Broker ID ✔️
ZooKeeperSessionState Count Multi Cluster Name ✔️
ZooKeeperSessionState Count Sum Cluster Name, Broker ID
ZooKeeperSessionState Count Sum Cluster Name