Understanding Application Performance on the Network – Part III: TCP Slow-Start

In Part II, we discussed performance constraints caused by both bandwidth and congestion. Purposely omitted was a discussion about packet loss – which is often an inevitable result of heavy network congestion. I’ll use this blog entry on TCP slow-start to introduce the Congestion Window (CWD), which is fundamental for Part IV’s in-depth review of Packet Loss.

TCP Slow-Start

TCP uses a slow-start algorithm as it tries to understand the characteristics (bandwidth, latency, congestion) of the path supporting a new TCP connection. In most cases, TCP has no inherent understanding of the characteristics of the network path; it could be a switched connection on a high-speed LAN to a server in the next room, or it could be a low-bandwidth, already congested connection to a server halfway around the globe. In an effort to be a good network citizen, TCP uses a slow-start algorithm based on an internally-maintained congestion window (CWD) which identifies how many packets may be transmitted without being acknowledged; as the data carried in transmitted packets is acknowledged, the window increases. The CWD typically begins at two packets, allowing an initial transmission of two packets and then ramping up quickly as acknowledgements are received.

At the beginning of a new TCP connection, the CWD starts at 2 packets and increases as acknowledgements are received.
At the beginning of a new TCP connection, the CWD starts at 2 packets and increases as acknowledgements are received.

The CWD will continue to increase until one of three conditions is met:

Condition Determined by Blog discussion
Receiver’s TCP Window limit Receiver’s TCP Window size Part VII
Congestion detected (via packet loss) Triple Duplicate ACK Part IV
Maximum write block size Application configuration Part VIII

Generally, TCP slow-start will not be a primary or significant bottleneck. Slow-start occurs once per TCP connection, so for many operations there may be no impact. However, we will address the theoretical case of a TCP slow-start bottleneck, some influencing factors, and then present a real-world case.

The Maximum Segment Size and the CWD

The Maximum Segment Size (MSS) identifies the maximum TCP payload that can be carried by a packet; this value is set as a TCP option as a new connection is established. Probably the most common MSS value is 1460, but smaller sizes may be used to allow for VPN headers or to support different link protocols. Beyond the additional protocol overhead introduced by a reduced MSS, there is also an impact on the CWD, since the algorithm uses packets as its flow control metric.

We can consider the CWD’s exchanges of data packets and subsequent ACKs as TCP turns, or TCP round trips; each exchange incurs the round-trip path delay. Therefore, one of the primary factors influencing the impact of TCP slow-start is network latency. A smaller MSS value will result in a larger number of packets – and additional TCP turns – as the sending node increases the CWD to reach its upper limit. It is possible that with a small MSS (536 Bytes) and high path delay (200 msec) that slow-start might introduce 3 seconds of delay to an operation as the CWD increases to a receive window limit of 65KB.

How Important is TCP Slow-Start?

While significant, even a 3-second delay is probably not interesting for large file transfers, or for applications that reuse TCP connections. But let’s consider a simple web page with 20 page elements, averaging about 120KB in size. A misconfigured proxy server prevents persistent TCP connections, so we’ll need 20 new TCP connections to load the page. Each connection must ramp up through slow-start as content is downloaded. With a small MSS and/or high latency, each page component will experience a significant slow-start delay.

Transaction Trace Manifestation

The Time Plot can be used to help quantify delays associated with TCP slow-start; graph TCP Frames in Transit. (Since the CWD uses frames as its metric, this is essentially a CWD graph.)

Time Plot View
Time Plot view showing TCP frames in transit, illustrating slow-start as the Congestion Window increases from 2 to 98.

You may also visualize TCP slow-start’s TCP turns using the Bounce Diagram.

The Bounce Diagram can be used to illustrate TCP turns during slow-start.
The Bounce Diagram can be used to illustrate TCP turns during slow-start.

Corrective Actions

If you find that TCP slow-start’s impact on performance is significant, there are a few approaches to mitigating the impact. These include using persistent TCP connections (avoiding frequent slow-starts) and ensuring the largest MSS possible is used (reducing the TCP turns as the congestion window increases). Some appliances – such as load balancers and application delivery controllers – permit configuring the initial CWD value to a larger value, in turn eliminating some TCP turns; this could provide noticeable benefit for high-latency links presuming adequate bandwidth.

Do your browser-based applications reuse TCP connections efficiently? Have you considered mitigating the impact of TCP slow-start by reconfiguring the CWD?

In Part IV, we’ll discuss the performance impact of packet loss, continuing with Congestion Window concepts and completing the bandwidth and congestion discussion we started in Part II. Stay tuned and feel free to comment below.

Gary is a Subject Matter Expert in Network Performance Analytics at Dynatrace, responsible for DC RUM’s technical marketing programs. He is a co-inventor of multiple performance analysis features, and continues to champion the value of network performance analytics. He is the author of Network Application Performance Analysis (WalrusInk, 2014).