Network monitoring with Dynatrace
Dynatrace infrastructure monitoring offers more than visibility into hosts and processes. With network communication monitoring, Dynatrace also gives you insight into the quality of the communication between your hosts and the processes that run on them. It isn't enough to know that a process has sufficient server resources and responds in a timely manner—you also need assurance that your processes are clearly communicating their responses to calling parties and have uninterrupted access to all required resources. You also need to know which processes are consuming your network resources. Such network communication insight can be gained by monitoring the data packets that are exchanged between processes and the hosts they run on.
Enabling network monitoring
Network monitoring of all hosts in your environment is enabled by default. You can however disable and re-enable network monitoring for individual hosts by going to Settings > Monitoring overview > Host settings and clicking the Network quality metrics switch.
Analyzing network health
By default, your homepage includes a tile that shows you three key overall network health metrics: Traffic, Retransmissions and Connectivity.
Click the Network tile to go to your Hosts page.
Click a host listed on the Hosts page to view performance details of that host.
The average rate at which data was transmitted during the interval.
The number of received and sent packets over the host network interface during the interval.
The assessment of the number of dropped packets and errors.
Percentage of properly established TCP connections compared to TCP connections that were refused or timed out.
Note: The Connectivity measure can be used as an indicator of whether or not there's network traffic on a host. Please note however that 0% connectivity doesn't necessarily indicate that there is a problem with a host. Assuming no TCP errors are present, it may simply mean that no users have attempted to connect to the host process during the selected time frame.
Click the Consuming processes button to go to the selected host's Processes page to view the list of processes running on the selected host. With network monitoring enabled you'll see three new columns: Traffic, Retransmissions and Connectivity.
Select an individual process to highlight that process' contribution to the overall value of the metric displayed in the chart above.
Note that Dynatrace monitors only selected processes, so it's expected that on some hosts that metric breakdowns won't add up to 100%.
What are data retransmissions?
When a network link or segment is overloaded or under performing, it drops data packets. This is because overloaded network equipment queues are purged during periods of excessive traffic or limited hardware resources. In response, TCP protocol mechanisms attempt to fix the situation by re-transmitting the dropped packets. Such retransmissions are detected by Dynatrace and displayed on all relevant Host and Process pages and Quality tabs.
Ideally, retransmission rates should not exceed 0.5% on local area networks and 2% in Internet or cloud based networks. Retransmission rates above 3% will negatively affect user experience in most modern applications. Retransmission issues are especially noticeable by customers using mobile devices in poor network coverage areas.
TCP connection time-out errors
Overloaded or poorly configured processes can have trouble accepting new network connections. This results in timeouts or resets of TCP handshakes. Such issues are tracked as TCP connection refused and TCP connection timeout errors.
Dynatrace also compares the number of such errors with the total number of connection attempts to calculate Connectivity metrics—the percentage of connections that have been successfully established. Ideally, Connection metrics are never lower than 100%. Anything less suggests failed user actions that will be obvious to your customers.
Network monitoring overhead
Overhead generated by network monitoring is negligible and varies based on the analyzed traffic volume. Dynatrace monitors the overhead generated by network monitoring. If overhead increases above 5% of available CPU, Dynatrace automatically disables network monitoring until traffic decreases.