Infrastructure - Hostgroups

The Hostgroups tab displays the hostgroups and the patterns that define their members.

Infrastructure - Hostgroups tab
Infrastructure - Hostgroups tab

Specify the hostgroup properties in the Create Hostgroup or Edit Hostgroup dialog box.

Hostgroups properties — General

The General tab contains basic hostgroup properties:

  • Name
  • Description
  • Automatically remove offline Hosts after XX hours: after specified period of time the offline host will be removed from the Infrastructure monitoring dashboard. When host appear online again, it will rejoin the group.

Hostgroups properties — Hosts

The Hosts tab contains the host pattern, and the list of the hosts in the group.

To add a host by pattern, type one of the string comparison operators in the Add Hosts by Pattern text box. Matching operators appear in the box as you type, so you can select from the list of matches. Patterns are simple logical expressions that can include the following:

  • The string comparison operators: equalsstartswithcontains, and endswith.
  • The logical operators: notandor (in order of precedence). Logical operators can be grouped together with parentheses to change order of precedence. For example: (contains 'at' or contains 'de') and endswith 'corp'.
  • A string in single quotes. Escape any strings containing a single quote with a backslash character, for example 'isn\'t'.

All functions are case-insensitive and require one string literal as parameter. For example:

  • equals '1234.at.emea.mydomain.corp'
  • startswith '1234'
  • contains 'emea'
  • endswith '.com'
  • not contains 'internal'
  • contains 'at' or contains 'de' and endswith 'corp'
Host pattern
Host pattern

To manually add hosts to the hostgroup, click Add. The pattern text is updated with or equals '<hostaddress>'.

Hostgroup priority

A host has to belong to one and only one hostgroup. However some hosts may match patterns of several hostgroups. To guarantee a unique match, you can adjust group priority. To do so, right-click the group list, and select Higher Priority or Lower Priority from the context menu.

Hostgroups properties — Thresholds

The Thresholds tab contains thresholds, which determine health criteria for the hosts in the group. All the hosts from the group use these thresholds by default. You can, however, specify host-specific thresholds in the host properties.

Hostgroup - thresholds
Hostgroup thresholds

If any of the thresholds groups (CPU or memory or network or disk) is violated, the host is considered unhealthy, and is shown as unhealthy at the Infrastructure monitoring dashboard in the AppMon Client or in the Hosts tile of the Host health web view in AppMon Web.

Click Configure Exclusions to exclude specific disks, mount points, or NICs from monitoring. See Infrastructure — Exclusions for more information.

Evaluation principle

The server evaluates the host health every minute. The CPU, memory and network measurements of the last 15 minutes are split into one minute chunks, by average aggregation. If 13 chunks violate the relevant criteria, the host is considered unhealthy. The health is deemed good again when at least three of the chunks are healthy. The healthy chunks don't have to be consequential.

Disks health status changes immediately without a watch period when a threshold is violated, so the host unhealthy straight away.

CPU

Violation of any of the thresholds spoils CPU health by the previously mentioned evaluation principle.

Also, it raises a built-in Host CPU unhealthy incident. By default only one incident at time is raised. If other threshold is violated while incident is still active, no additional incident is raised. For example, if the incident of Max Usage threshold violation is ongoing, and the Max System threshold is violated too, second incident won't arise.

AppMon 2018 April Check the Raise a built-in Host CPU Unhealthy incident for every violated threshold checkbox to raise a separate incident for each threshold violation. In this case in the example above, if the incident of Max Usage threshold violation is ongoing, and the Max System threshold is violated too, the second incident arises.

Memory

Violation of the Minimum available subgroup or maximum page faults per second spoils memory health by the previously mentioned evaluation principle. In the Minimum Available subgroup both Min available MB and Min available % must exceed to contribute. So the final logical formula is: ('minimum available MB' AND 'minimum available %') OR 'maximum page faults per second'.

Tip

To ignore one of the values combined by the and operator, just set one threshold so it is always resolved as true.

For example, you can set a very high size or percentage. Memory available < 100% always returns true, so a violation occurs when min. available MB < threshold becomes true.

Also, it raises a built-in Host Memory unhealthy incident. By default only one incident at time is raised. If other threshold is violated while incident is still active, no additional incident is raised. For example, if the incident of Min Available threshold violation is ongoing, and the Max Page Faults threshold is violated too, second incident doesn't arise.

AppMon 2018 April Check the Raise a built-in Host Memory Unhealthy incident for every violated threshold checkbox to raise a separate incident for each threshold violation. In this case in the example above, if the incident of Min Available threshold violation is ongoing, and the Max Page Faults threshold is violated too, second incident arises.

Network

Violation of the Max Bandwidth Utilization threshold by any network interface controller (NIC) spoils network health by the previously mentioned evaluation principle.

Also, it raises a built-in Host Network unhealthy incident. By default only one incident at time is raised. If other NIC violates the threshold while incident is still active, no additional incident is raised. For example, if the incident for threshold violation by NIC-1 device is ongoing, and the NIC-2 device violates threshold too, second incident doesn't arise. The name of the single incident doesn't reflect which device violated the threshold.

AppMon 2018 April Check the Raise a built-in Host Network Unhealthy incident for every violated threshold checkbox to raise a separate incident for each network device. In this case in the example above, if the incident for threshold violation by NIC-1 device is ongoing, and the NIC-2 device violates threshold too, second incident arises. The name of incidents reflect which device violated the threshold.

Disk

Voilation of both Min free MB and Min free % by any disk spoils host health. Unlike CPU, Memory, and Network, this violation spoils host health immediately. Just as with Memory evaluation, you can set one of the thresholds so it is always resolved as true to ignore it in the evaluation.

Also, it raises a built-in Host Disk unhealthy incident. By default only one incident at time is raised. If other disk violates the threshold while incident is still active, no additional incident is raised. For example, if the incident for threshold violation by the disk C is ongoing, and the disk D violates threshold too, second incident doesn't arise. The name of the single incident doesn't reflect which disk violated the threshold.

AppMon 2018 April Check the Raise a built-in Host Disk Unhealthy incident for every violated threshold checkbox to raise a separate incident for each disk. In this case in the example above, if the incident for threshold violation by disk C is ongoing, and the disk D violates threshold too, second incident arises. The name of incidents reflect which disk violated the threshold.

Compare measures between operating systems

Measures are relatively comparable between different operating systems, with the following limitations:

  • Windows does not deliver a CPU load like *NIX systems. CPU load is omitted in health calculations on Windows hosts.
  • Only hard page faults are considered as page faults. Windows systems have hard page faults even with free memory.
  • Page faults on AIX versions earlier than 5.2 report soft and hard page faults.
  • Disk space available is determined from the point of view of the Agent. If the Agent has restrictions such as disk space quota, only the disk space available to the Agent is reported as free space.
  • Page faults require disk access, which includes user rights to memory mapped files. Applications such as backup software use memory mapped files a lot and might cause temporarily high page fault rates.

See What is Monitored on What Platform? for a detailed comparison of operating systems.

Hostgroups properties — Availability

The Availability tab defines the list of processes, monitored on this hostgroup.

Process pattern
Process pattern
  • Host Availability: Defines Minimum and Maximum number of hosts in the hostgroup.
  • Process Availability: Defines processes to be monitored by the Host Monitoring or technology-specific Agent. Running and unavailable processes are displayed in the Infrastructure monitoring dashboard. Use a pattern matching to add the process to the list.

Once a process is not available, an incident arises. By default only one incident at time is raised. If other process becomes unavailable while incident is still active, no additional incident is raised. For example, if the incident for process1 unavailability is ongoing, and the process2 becomes unavailable too, second incident doesn't arise.

AppMon 2018 April Check the Raises a Process Availability Violation incident for every process pattern checkbox to raise a separate incident for each process. In this case in the example above, if the incident for process1 unavailability is ongoing, and the process2 becomes unavailable too, second incident arises.

To set a process pattern:

  1. Click the + button to the right of the process name list.
  2. Enter a pattern match string in the Pattern column. The process name to match with the pattern includes the process path. The pattern matching supports a * wildcard at the beginning and end. For example, *w3wp.exe -ap ".NET 4.5" specifies that the host monitors all processes matching the pattern.
  3. In the Display Name column, type a descriptive name for the process pattern.