Host availability
You can track host availability on the Host page. The Availability tile percentage represents the percentage of the selected time range that the host was online and responsive to requests.
Manually checking host availability
- In the Dynatrace menu, go to Hosts to list all the machines (physical and virtual) in your environment that have OneAgent installed on them over the selected time range.
- Select a host to go to that host's own host overview page, where you can view host details, including all available metrics for the host.
Host availability states
-
Running—Indicates that the host (OneAgent) is online and responsive to requests.
-
Offline—Indicates that the host is unexpectedly offline. This usually means that the host stopped reporting to Dynatrace and a timeout occurred. Possible causes include:
- The host crashed
- There are issues with communication with the host
- The host became so unstable that monitoring is unavailable
- A virtual machine was terminated without proper system shutdown
-
Shutdown—Indicates a graceful operating system shutdown on the host. Possible causes include:
- An expected operating system shutdown or reboot
- An expected shutdown discovered by one of our cloud/hypervisor integrations, for example, an AWS Spot Instance is expected to disappear without any warning, so when it does, the host state is
Shutdown
(not unexpected, so notOffline
) - A process monitored by a PaaS OneAgent has gone offline
-
Maintenance—Indicates a configured maintenance window (not a state sent from the host). For more information, see Maintenance windows and How to define a maintenance window.
-
Unmonitored—Indicates that OneAgent isn't running on the host. Possible causes:
-
OneAgent was manually stopped. To check or change this setting per host
- In the Dynatrace menu, go to Settings > Monitoring overview.
- Select the Hosts tab.
- Find the host and see the Monitoring setting.
or
- In the Dynatrace menu, go to Hosts and select a host name to open the host overview page.
- Select More (…) > Settings.
- On the General page, see Turn on monitoring to gain visibility into this host, its processes, services, and applications.
-
OneAgent was uninstalled.
-
Host units are depleted.
-
Davis alerts on host availability
By default, Davis generates a new availability problem whenever the connection to a running OneAgent (host) is lost unexpectedly.
There can be multiple root causes for losing a connection to a monitored host:
- The host may have gone offline unexpectedly, and so OneAgent receives no shutdown signal. This is considered an ungraceful shutdown or crash.
- Network issues may prevent Dynatrace from receiving monitoring signals from a running host. In such cases, it’s unknown if the host is still running.
- The host may shut down gracefully, which means that the operating system sent a shutdown signal notifying OneAgent that an operator is intentionally shutting down the server.
The default host availability-alerting behavior automatically alerts on causes 1 and 2, not on cause 3.
Configuration
-
In environments where running virtual machines are abruptly destroyed without graceful system shutdown, you may see a lot of false-positive alerts for hosts going offline ungracefully. You can disable alerting on unexpectedly down hosts globally or per host.
-
If your DevOps team wants to open alerts for graceful host shutdowns (cause 3 above), you can do so globally or per host.
After you’ve configured a host availability alerting strategy, the next time an affected host becomes unavailable (and depending on your alerting configuration), your Problems page will display a Host or monitoring unavailable problem card.