Host availability

You can track host availability on the Host page. The Availability tile percentage represents the percentage of the selected time range that the host was online and responsive to requests.

Manually checking host availability

The Host health tile on your home dashboard gives you a quick overview of host health, including whether the host availability state indicates possible trouble for any of your hosts.

  • Select the Host health tile or select Hosts in the navigation menu to view the Hosts page, which lists all the machines (physical and virtual) in your environment that have OneAgent installed on them over the selected time range.

  • Select a host to go to that host's own Host page, where you can view host details, including all available metrics for the host.

Host availability states

Running

The Running availability state indicates that the host (OneAgent) is online and responsive to requests.

Offline

The Offline availability state indicates that the host is unexpectedly offline. This usually means that the host stopped reporting to Dynatrace and a timeout occurred. Possible causes include:

  • The host crashed
  • There are issues with communication with the host
  • The host became so unstable that monitoring is unavailable
  • A virtual machine was terminated without proper system shutdown

Shutdown

The Shutdown availability state indicates a graceful operating system shutdown on the host. Possible causes include:

  • An expected operating system shutdown or reboot
  • An expected shutdown discovered by one of our cloud/hypervisor integrations (for example, an AWS Spot Instance is expected to disappear without any warning, so when it does, the host state is Shutdown (not unexpected, so not Offline)
  • A process monitored by a PaaS OneAgent has gone offline

Maintenance

The Maintenance availability state indicates a configured maintenance window (not a state sent from the host).

For more information, see Maintenance windows and How to define a maintenance window.

Unmonitored

The Unmonitored availability state indicates that OneAgent isn't running on the host.

  • OneAgent was manually stopped

  • OneAgent was uninstalled

Davis alerts on host availability

By default, Davis generates a new availability problem whenever the connection to a running OneAgent (host) is lost unexpectedly.

There can be multiple root causes for losing a connection to a monitored host:

  1. The host may have gone offline unexpectedly, and so OneAgent receives no shutdown signal. This is considered an ungraceful shutdown or crash.
  2. Network issues may prevent Dynatrace from receiving monitoring signals from a running host. In such cases, it’s unknown if the host is still running.
  3. The host may shut down gracefully, which means that the operating system sent a shutdown signal notifying OneAgent that an operator is intentionally shutting down the server.

The default host availability-alerting behavior automatically alerts on causes 1 and 2, not on cause 3.

Configuration

  • In environments where running virtual machines are abruptly destroyed without graceful system shutdown, you may see a lot of false-positive alerts for hosts going offline ungracefully. You can disable alerting on unexpectedly down hosts globally or per host.

  • If your DevOps team wants to open alerts for graceful host shutdowns (cause 3 above), you can do so globally or per host.

After you’ve configured a host availability alerting strategy, the next time an affected host becomes unavailable (and depending on your alerting configuration), your Problems page will display a Host or monitoring unavailable problem card such as:

Host unavailable