How do I monitor network communications?

Dynatrace infrastructure monitoring offers more than visibility into hosts and processes. With network communication monitoring, Dynatrace also gives you insight into the quality of the communications between your hosts and the processes that run on them. It isn't enough to know that a process has sufficient server resources and responds in a timely manner. You also need assurance that your processes are clearly communicating their responses to calling parties and have uninterrupted access to all required resources. You also need to know which processes are consuming your network resources. Such network communication insight can be gained by monitoring the data packets that are exchanged between processes and the hosts they run on.

Network monitoring overhead

Overhead generated by network monitoring is negligible and varies based on the analyzed traffic volume. Dynatrace monitors the overhead generated by network monitoring. If overhead increases above 5% of available CPU, Dynatrace, throttling occurs. The network module is then paused for slightly less than 3 minutes. After this time, the network is re-enabled. If the threshold is still exceeded, network throttling occurs again, with the network module paused for twice as long. This continues until the threshold is no longer exceeded. The timing of the pauses doubles each time that the network is re-enabled and the threshold remains exceeded, up to a maximum pause time of 45 minutes.

Enable network monitoring

Network monitoring of all hosts in your environment is enabled by default. You can however disable and re-enable monitoring of individual hosts by going to Settings > Monitoring overview and on the Hosts tab, clicking the Monitoring: Off/On switch.

Analyzing network health

By default, your homepage includes the Network status tile that shows you three key overall network health metrics: Processes, Hosts and Volume.

Network status tile

Click the Network status tile to go to your Network page.

Network overview page

The Environment detail section of the Network page consists of three tabs: Hosts, Interfaces, and Processes.

Click a host listed on the Hosts tab to view a quick chart for Traffic in and Traffic out for that host.

hosts tab of environment detail

Use the Show chart for selection to switch to Retransmissions and Connectivity charts for that host.

selecting different chart

Click Analyze process connections to get more information about process connections and related metrics, or click Analyze process connections to get more information about process connections and related metrics.

Analyze process connections

You can view all connections made and received by the host. Connections are displayed in a way that’s similar to Smartscape topology view. The middle column represents the analyzed host. The left-hand side represents the hosts and processes that connect to the analyzed host. The right-hand side shows the outbound communications of the analyzed host.

Click any process node (the middle column) to view relevant network metrics for that process' connections (displayed in the right-hand pane). For each connection, you’ll see network Transfer, Connectivity, and Retransmissions rates.

Host details

You can also click one of the four health statistics (CPU, Memory, Disk, or NIC) to view details of the metrics that contribute to each measurement.

host with consuming processes

Click the Consuming processes button to go to the selected host's Processes page to view the list of processes running on the selected host. With network monitoring enabled, you'll see three new columns: Traffic, Retransmissions and Connectivity.

Select an individual process to highlight that process' contribution to the overall value of the metric displayed in the chart above.

Click the process name to drill down to the process.

Note that Dynatrace monitors only selected processes, so it's expected that on some hosts that metric breakdowns won't add up to 100%.

Process metrics in Dynatrace

Click on the Connectivity panel in the process overview to view the Traffic, Connectivity and Quality details for this process.

Connectivity

Connectivity is presented as two separately calculated metrics:

  • Connectivity Percentage of properly established TCP connections, compared to TCP connections that were refused or timed out.

  • Local connectivity Percentage of processes that establish TCP connections to other processes on the same host. Such connections don't generate network traffic. To keep the Connectivity value accurate, Local connectivity is treated as a separate connectivity metric.

Overloaded or poorly configured processes can have trouble accepting new network connections. This results in timeouts or resets of TCP handshakes. Such issues are tracked as TCP connection refused and TCP connection timeout errors.

Dynatrace also compares the number of such errors with the total number of connection attempts to calculate Connectivity metrics: the percentage of connections that have been successfully established. Ideally, Connection metrics are never lower than 100%. Anything less suggests failed user actions that will be obvious to your customers.

Quality

When a network link or segment is overloaded or underperforming, it drops data packets. This is because overloaded network equipment queues are purged during periods of excessive traffic or limited hardware resources. In response, TCP protocol mechanisms attempt to fix the situation by retransmitting the dropped packets. Such retransmissions are detected by Dynatrace and displayed on all relevant Host and Process pages and Quality tabs. Below is an example of an unnaturally high Retransmissions rate for a process.

Ideally, retransmission rates should not exceed 0.5% on local area networks and 2% in Internet- or cloud-based networks. Retransmission rates above 3% negatively affect user experience in most modern applications. Retransmission issues are especially noticeable by customers using mobile devices in poor network coverage areas.