In Part VI, we dove into the Nagle algorithm – perhaps (or hopefully) something you’ll never see. In Part VII, we get back to “pure” network and TCP roots as we examine how the TCP receive window interacts with WAN links.

TCP Window Size

Each node participating in a TCP connection advertises its available buffer space using the TCP window size field. This value identifies the maximum amount of data a sender can transmit without receiving a window update via a TCP acknowledgement; in other words, this is the maximum number of “bytes in flight” – bytes that have been sent, are traversing the network, but remain unacknowledged. Once the sender has reached this limit and exhausted the receive window, the sender must stop and wait for a window update.

TCP Window Size: The sender transmits a full window, then waits for window updates before continuing. As these window updates arrive, the sender advances the window and may transmit more data.
The sender transmits a full window, then waits for window updates before continuing. As these window updates arrive, the sender advances the window and may transmit more data.

Long Fat Networks

High-speed, high-latency networks, sometimes referred to as Long Fat Networks (LFNs), can carry a lot of data. On these networks, small receive window sizes can limit throughput to a fraction of the available bandwidth. These two factors – bandwidth and latency – combine to influence the potential impact of a given TCP window size. LFNs networks make it possible – common, even – for a sender to transmit very fast (high bandwidth) an entire TCP window’s worth of data, having then to wait until the packets reach the distant remote site (high latency) so that acknowledgements can be returned, informing the sender of successful data delivery and available receive buffer space.

The math (and physics) concepts are straightforward. As the network speed increases, data can be clocked out onto the network medium more quickly; the bits are literally closer together. As latency increases, these bits take longer to traverse the network from sender to receiver. As a result, more bits can fit on the wire. As LFNs become more common, exhausting a receiver’s TCP window becomes increasingly problematic for some types of applications.

Bandwidth Delay Product

The Bandwidth Delay Product (BDP) is a simple formula used to calculate the maximum amount of data that can exist on the network (referred to as bits or bytes in flight) based on a link’s characteristics:

  • Bandwidth (bps) x RTT (seconds) = bits in flight
  • Divide the result by 8 for bytes in flight

If the BDP (in bytes) for a given network link exceeds the value of a session’s TCP window, then the TCP session will not be able to use all of the available bandwidth; instead, throughput will be limited by the receive window (assuming no other constraints, of course).

The BDP can also be used to calculate the maximum throughput (“bandwidth”) of a TCP connection given a fixed receive window size:

  • Bandwidth = (window size *8)/RTT

In the not-too-distant past, the TCP window had a maximum value of 65535 bytes. While today’s TCP implementations now generally include a TCP window scaling option that allows for negotiated window sizes to reach 1GB, many factors limit its practical utility. For example, firewalls, load balancers and server configurations may purposely disable the feature. So the reality is that we often still need to pay attention to the TCP window size when considering the performance of applications that transfer large amounts of data, particularly on enterprise LFNs.

As an example, consider a company with offices in New York and San Francisco; they need to replicate a large database each night, and have secured a 20Mbps network connection with 85 milliseconds of round-trip delay. Our BDP calculation tell us that the BDP is 212,500 (20,000,000 x .085 /8); in other words, a single TCP connection would require a 212KB window in order to take advantage of all of the bandwidth. The BDP calculation also tell us that the configured TCP window size of 65535 will permit approximately 6Mbps throughput (65535*8/.085), less than 1/3 of the link’s capacity.

A link’s BDP and a receiver’s TCP window size are two factors that help us to identify the potential throughput of an operation. The remaining factor is the operation itself, specifically the size of individual request or reply flows. Only flows that exceed the receiver’s TCP window size will benefit from, or be impacted by, these TCP window size constraints. Two common scenarios help illustrate this. Let’s say a user needs to transfer a 1GB file:

  • Using FTP (in stream mode) will cause the entire file to be sent in a single flow; this operation could be severely limited by the receive window.
  • Using SMB (at least older versions of the protocol) will cause the file to be sent in many smaller write commands, as SMB used to limit write messages to under 64KB; this operation would not be able to take advantage of a TCP receive window of greater than 64K. (Instead, the operation would more likely be limited by application turns and link latency; we discuss chattiness in Part VIII.)

Transaction Trace Illustration

To evaluate a trace for this window size constraint, use the Time Plot view. For Series 1, graph the sender’s payload in transit (i.e., bytes in flight); for Series 2, graph the receiver’s advertised TCP window, using a single y axis scale for reference. If the payload in transit reaches (or closely approaches) the receive window size, then it is likely that an increase in the window size will allow for improved throughput.

TCP Window Size: This Time Plot view shows the sender's TCP Payload in Transit (blue) reaching the receiver's advertised TCP window (brown); the window size is limiting throughput.
This Time Plot view shows the sender’s TCP Payload in Transit (blue) reaching the receiver’s advertised TCP window (brown); the window size is limiting throughput.

The Bounce Diagram can also be used to illustrate the impact of a TCP window constraint, emphasizing the impact of latency on data delivery and subsequent TCP acknowledgements.

TCP Window Size: Illustration of a TCP window constraint; each cluster of blue frames represents a complete window's worth of payload, and the sender must then wait for window updates.
Illustration of a TCP window constraint; each cluster of blue frames represents a complete window’s worth of payload, and the sender must then wait for window updates.

Note that the TCP window scaling option is negotiated in the TCP three-way handshake as the connection is set up; without these SYN/SYN/ACK handshake packets in the trace file, there is no way of knowing whether window scaling is active, or more accurately, what the scaling value might be. (Hint: if you observe window sizes in a trace file that appear abnormally small – such as 500 bytes – then it is likely that window scaling is active; you may not know the actual window size, but it will be greater than 64KB.)

Corrective actions

For a TCP window constraint on a LFN, assuming adequate available bandwidth, primary solution options focus on increasing the receiver’s TCP window or enabling TCP window scaling. Reducing latency – which in turn reduces the BDP – will allow greater throughput for a given TCP window; relocating a server or optimizing path selection are examples of how this reduction in latency might be accomplished.

Is TCP window scaling enabled for your key applications – especially those that serve users over LFNs? Are your file transfers and replications performing in harmony with the network they traverse?

In Part VIII, the final entry in this series, we’ll talk about application chattiness – the more common app turns kind, but also a behavior I call application windowing. Stay tuned and feel free to comment below.