Which measures contribute to host health?

Individual Host pages show problem history, event history, and related processes for each host. To assess health, the following performance metrics are captured for each host and presented on each Host overview page:

  • CPU
  • Memory
  • Disk (storage health)
  • NIC (network health)

What's factored into host CPU health?

CPU usage is the primary measurement used to calculate CPU health. This is the percentage of time that the CPU was busy processing (i.e., not idle). This percentage is computed over all available CPU cores and scaled to a range of 0–100%.

The same calculation method is used for total CPU usage in a system as well as usage per process-group. This means that a process group composed of a single threaded process on a 4-core system will reach maximum CPU usage at 25%.
The CPU usage metric is used to generate high CPU measurements for host incidents.

What's included in host Memory health?

Host pages include two memory-related metrics for your hosts, Memory used and Page faults. Both measurements and other factors, are used to correlate and calculate host high memory incidents.

  • Memory used Percentage of total RAM used by processes. RAM used by system caches and buffers isn't included in this metric. Dynatrace calculates memory usage as:
    memory_used = total_memory_size - (free_memory + active_memory + inactive_memory + reclaimamble_memory)

  • Page faults Number of major page faults per second. Major page faults involve loading a page from disk, thereby adding disk latency to the interrupted program’s execution.

What's included in host Disk health?

  • Throughput The total number of bytes read and written to disk per second.

  • IOPS I/O (input/output) operations per second. Operations are counted after operations addressing adjacent disk sectors are merged.

  • Disk latency Time from I/O request submission to I/O request completion. The average delay of disk read and write operations in milliseconds. This metric is used to detect host slow disk incidents.

  • Disk space usage
    The amount of disk space that's been used.

  • Idle time
    Amount of time the disk has been idle.

What's included in host NIC health?

  • Traffic
    The average rate at which data was transmitted during the interval.

  • Packets
    The number of received and sent packets over the host network interface during the interval.

  • Quality
    The assessment of the number of dropped packets and errors.

  • Connectivity Percentage of properly established TCP connections compared to TCP connections that were refused or timed out.
    Note: The Connectivity measure can be used as an indicator of whether or not there's network traffic on a host. Please note however that 0% connectivity doesn't necessarily indicate that there is a problem with a host. Assuming no TCP errors are present, it may simply mean that no users have attempted to connect to the host process during the selected time frame.