Header background

FDI analytics of network performance problems

In my previous post, I discussed network communication monitoring basics and the Dynatrace value proposition related to that. We have mentioned that Dynatrace can tell you which processes experience network degradation problems.

If network link or segment is overloaded and under-performing it will start dropping packets. It is due to overloaded queues on network equipment that get purged in case of excessive traffic or lack of hardware resources. Than, TCP protocol mechanisms try to fix that by retransmitting a missing packet. We in Dynatrace detect such situations and show them as “Retransmissions” on host and process levels. This metric is available on host and process dashboards and appears on “Quality” tab charts.

Typical retransmission rates should not exceed 0.5% in local area networks and 2% in Internet or cloud based networks. Anything above 3% seriously affects user experience of most of todays applications. This is especially visible when accessing a web page with a mobile device in poor network coverage areas.

Screen Shot 2014-09-25 at 04.25.03

Such a fact will certainly be visible not only on infrastructure levels but also, for most applications, will affect service response time and user experience in terms of load times. Different applications, due to their web page design, can have different sensitivity to poor network connection quality. Regardless, Dynatrace always detects problems on all 3 layers (infrastructure, service and end user) and can tell you the impact of the problem.

Screen Shot 2014-09-25 at 04.27.00

Here’s an example where (artificially generated) packet loss cause high TCP retransmission ratio on apache web server. That in turn caused service response time to increase as the server stack needs more time to re-transmit missing packets. This also have an impact on application and end user experience as users now have to wait longer for their webpages to complete loading.

Screen Shot 2014-09-25 at 04.23.48flow

We connect these dots together and by reporting application level problem due to poor end user experience, already directly relate this fact to poor network quality represented by high retransmission rate. This is visible in root cause area of the problem.