Hadoop monitoring
Hadoop monitoring in Dynatrace provides a high-level overview of the main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.
Prerequisites
- Dynatrace OneAgent version 1.103+
- For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
- Linux OS
- Hadoop version 2.4.1+
Enabling Hadoop monitoring globally
- In the Dynatrace menu, go to Settings.
- Select Monitoring > Monitored technologies.
- On the Supported technologies tab, find the Hadoop entry.
- Turn on the Hadoop switch.
With Hadoop monitoring enabled globally, Dynatrace automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.
Analyzing your Hadoop components
- In the Dynatrace menu, go to Technologies.
- Select the Hadoop tile on the Technology overview page.
- Select an individual Hadoop component in the Process group table to view metrics and a timeline chart specific to that component.
Enhanced insights for HDFS
Viewing NameNode metrics
- In the Process group table, select a NameNode process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
- Further down the page, you’ll find a number of cluster-specific charts.
Viewing DataNode metrics
- In the Process group table, select a DataNode process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab and select the DataNode.
- Select the Hadoop HDFS metrics tab.
Enhanced insights for MapReduce
Viewing ResourceManager metrics
- Expand the Details section of the ResourceManager process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
- Further down the page, you’ll find a number of ResourceManager-specific charts.
Viewing MRAppMaster metrics
- Expand the Details section of an MRAppMaster process group.
- Select Process group details.
- On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
- Select the Hadoop MapReduce tab.
To view NodeManager metrics
- Expand the Details section of the NodeManager manager process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab and select a NodeManager process.
- Select the Hadoop MapReduce.
NameNode metrics
Metric | Description |
---|---|
Total | Raw capacity of DataNodes in bytes. |
Used | Used capacity across all DataNodes in bytes. |
Remaining | Remaining capacity in bytes. |
Total load | The number of connections. |
Total | The number of allocated blocks in the system. |
Pending deletion | The number of blocks pending deletion. |
Files total | Total number of files. |
Pending replication | The number of blocks pending to be replicated. |
Under replicated | The number of under-replicated blocks. |
Scheduled replication | The number of blocks scheduled for replication. |
Live | The number of live DataNodes. |
Dead | The number of dead DataNodes. |
Decommission Live | The number of decommissioning live DataNodes. |
Decommission Dead | The number of decommissioning dead DataNodes. |
Usage – Volume failures total | Total volume failures. |
Estimated capacity lost total | Estimated capacity lost in bytes. |
Decommission Decommissioning | The number of decommissioning data DataNodes. |
Stale | The number of stale DataNodes. |
Blocks missing and corrupt – Missing | The number of missing blocks. |
Capacity | Cache capacity in bytes. |
Used | Cache used in bytes. |
Blocks missing and corrupt – Corrupt | The number of corrupt blocks. |
Capacity in bytes – Used, non-DFS | Capacity used, non-DFS in bytes. |
Appended | The number of files appended. |
Created | The number of files and directories created by create or mkdir operations. |
Deleted | The number of files and directories deleted by delete or rename operations. |
Renamed | The number of rename operations. |
DataNode metrics
Metric | Description |
---|---|
Live | The number of live DataNodes. |
Dead | The number of dead DataNodes. |
Decommission Live | The number of decommissioning live DataNodes. |
Decommission Dead | The number of decommissioning dead DataNodes. |
Decommission Decommissioning | The number of decommissioning data DataNodes. |
Stale | The number of stale DataNodes. |
Capacity | Cache capacity in bytes. |
Used | Cache used in bytes. |
Capacity | Disk capacity in bytes. |
DfsUsed | Disk usage in bytes. |
Cached | The number of blocks cached. |
Failed to cache | The number of blocks that failed to cache. |
Failed to uncache | The number of blocks that failed to remove from cache. |
Number of failed volumes | The number of volume failures occurred. |
Capacity in bytes – Remaining | The remaining disk space left in bytes. |
Blocks | The number of blocks read from DataNode. |
Removed | The number of blocks removed. |
Replicated | The number of blocks replicated. |
Verified | The number of blocks verified. |
Blocks | The number of blocks written to DataNode. |
Bytes | The number of bytes read from DataNode. |
Bytes | The number of bytes written to DataNode. |
ResourceManager metrics
Metric | Description |
---|---|
Active | Number of active NodeManagers. |
Decommissioned | Number of decommissioned NodeManagers. |
Lost | Number of lost NodeManagers – no heartbeats. |
Rebooted | Number of rebooted NodeManagers. |
Unhealthy | Number of unhealthy NodeManagers. |
Allocated | Number of allocated containers. |
Allocated | Allocated memory in bytes. |
Allocated | Number of allocated CPU in virtual cores. |
Completed | Number of successfully completed applications. |
Failed | Number of failed applications. |
Killed | Number of killed applications. |
Pending | Number of pending applications. |
Running | Number of running applications. |
Submitted | Number of submitted applications. |
Available | Amount of available memory in bytes. |
Available | Number of available CPU in virtual cores. |
Pending | Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler. |
Pending | Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler. |
Reserved | Amount of reserved memory in bytes. |
Reserved | Number of reserved CPU in virtual cores. |