Analysing the performance of active TCP connections in ultra-fast networks
With the Internet expected to support an estimated 30 billion devices by 2020 (on the so-called Internet-of-Things), the need for ultra-fast speeds within the network has never been greater. This need can be explained by Little’s Theorem in queuing theory which necessitates that the latency of an application (i.e., the total time required to transfer an entire data object, as distinct from the latency of a single data packet), should be kept at a minimum when reducing the total number of connections “queuing” or "contending" for the available network capacity at any given time.
This law explains why TCPs congestion control is particularly well-suited to the task of communicating on the Internet-of-Things. When a new TCP connection first starts up, it has no idea how much network capacity is available for it to send its data object. The amount of network capacity available is determined by the lowest capacity (“bottleneck”) link on the path from the sender to the receiver – a limit which is imposed both by the physical transmission speed of links and the number of active connections sharing the link. TCPs congestion control learns the “right speed” to send data empirically and dynamically by trying to send data at faster and faster rates until it experiences a packet drop (or any other congestion indicator).
SureLink-XG Congestion Window Trajectory for a sample of 10 TCP CUBIC flows (with the SACK Option Enabled) on a 1 Gbps link - results are based on the TCP CUBIC (RFC8312) implementation in the net/ipv4 Linux Kernel C-programming code (ref. RFC5681, RFC6582, RFC6675)
However, TCP does not immediately start by sending data at the maximum rate because this would have a detrimental effect on other TCP connections that are sharing the network. Instead, TCP uses a principle called the conservation of packets where new data packets are sent into the network only when TCP receives confirmation (TCP acknowledgements) that packets that were sent previously have left the network. TCP also generally uses a different algorithm when starting-up (slow-start) to when it “knows” how much capacity is available and is in steady-state (congestion avoidance). Even in this latter stage, TCP continues to probe the network to determine if the opportunity to send data at faster rates has arisen; that is, when other competing connections have completed their data object transfers.
Figure charting the historical development of the RFC Standards for TCP Congestion Control
One of the main tasks of the TCP protocol is therefore to schedule the transmissions of data packets depending on the prevailing network conditions. Whereas TCP congestion control algorithms have prevented a congestion collapse of the Internet to date (mainly by scrupulously accounting for packets that are still in the network), the complexity of the algorithms makes the tasks of analysing performance on the Internet extremely difficult and computationally intensive – particularly in ultra-fast networks.
In fact, the TCP congestion control algorithm has evolved significantly over time to maximise the utilisation of links in ultra-fast networks where the rate at which packets are sent into the network must be increased rapidly whilst maintaining the same level of control over network congestion. Indeed, the result of this particular requirement has led to the evolution of the Internet from a homogeneous congestion control environment, to a heterogeneous congestion control environment with one study indicating that 44.51% of web servers use TCP BIC/CUBIC (the default for Linux-based OSs), 10.27 - 19% of servers use Compound TCP (the default for some Windows OSs), and only about 16.85 - 25.58% of servers use standard (AIMD) TCP. Google QUIC – a relatively new protocol which is used to push about 42.1% of Google data from applications such as Chrome and YouTube – is a multiplexed stream transport protocol which, whilst running over UDP, uses TCP Cubic as the default congestion control algorithm. QUIC (first proposed in 2012) performs better than TCP with increased RTT (i.e., the time it takes for TCP to send a packet and receive an acknowledgement) and is unique in that it moves congestion control to the application level thus enabling a rapid evolution of the protocol as opposed to the kernel space TCP variants. Despite its meteoric rise, QUIC is still in the process of standardisation and currently remains a draft proposal.
On the other hand, QUIC’s default congestion control algorithm – CUBIC (RFC8312, Rhee et. al.) – has evolved from its initial proposal (2005), through to implementations in Linux (since 2006) to the Internet standards track. Specifically, CUBIC does not make any changes to the prerequisite TCP algorithms – RFC5681 on TCP Congestion Control (M. Allman et. al.), RFC6582 on the TCP Fast Recovery algorithm (T. Henderson et. al.), and RFC6675 on a conservative loss recovery algorithm based on Selective Acknowledgements (E. Blanton et. al.). The prerequisite algorithms (SACK is optional for CUBIC) have been developed through extensive and rigorous experimental validation over the years within the scientific community – a process that is necessary for any widely deployed algorithm.
Even though CUBIC greatly simplifies the windows adjustment algorithm of its predecessor (BIC-TCP), it is still difficult to analyse the performance of active TCP CUBIC connections in an ultra-fast network. For example, the average number of unacknowledged packets that TCP can send (the congestion window) depends on the number of packets sent between two successive loss events, which in turn depends on the prevailing network conditions. In fact, one of the great challenges posed when analysing network performance from measurements is that information on the congestion window is not readily available from direct observations of network traffic – simply because this information is not included in the TCP packet headers.
Overview of the SureLink-XG mathematical modelling process
However, other (simpler) variables such as the number of active TCP flows, the RTT, and the Maximum Segment Size (MSS) can be determine (or estimated) either directly from packet captures or via the widely deployed flow analysis methods. This metadata can then be used to evaluate the performance of active TCP connections in detail; for example, the trajectory of the congestion window for a given TCP connection; i.e., the TCP connection speed/performance. Of course such a detailed analysis is possible using network simulators which model the exact operation of TCPs slow-start, congestion avoidance, fast re-transmit and fast recovery algorithms – but such calculations require manual configuration, are computationally intensive, and thus take far too long to complete. For example, NS-3 has been rated as one of the best performing Network Simulators in a recent study. This assessment was made for a relatively high packet drop probability (p = 0.10) in large scale networks. It is also important to note that the results indicate that all the network simulators have similar performance for a small number of nodes – which is the case when evaluating a bottleneck link.
Furthermore, the packet drop probability is much lower for a high-speed bottleneck link! To qualify this statement, consider that the size of the congestion window required to fully utilise a link (i.e., to "fill the pipe") using TCP is the bandwidth-delay product (BDP). For a 1 Gbps link and connection RTT of 0.22 secs, a TCP flow with an MSS of 576 Bytes requires a minimum congestion window of 48,000 TCP segments to fill the pipe. For TCP CUBIC, the packet drop probability is given by the deterministic model, p = RTT/(congestion window/1.17)^(4/3), which for our example gives p = 1.556×10^(-7); that is, (1/p) = 6.4 million TCP Segments are sent between successive drops !
Scenarios where a large number of packets are sent into the network correspond to modelling a large number of network events – and this process typically takes a long time. This issue was investigated at Caltech for HS-TCP flows with the SACK option enabled and results produced showed that it can take up to 20 hours to analyse 200 seconds of network activity on a 1 Gbps link! That study produced a patch for NS-2 which is included as standard for NS-2.33 through to NS-3.X in order to stabilises this time to between 25 minutes and 1 Hour in most scenarios studied – but this is still far slower that “real-time” analysis. Using SureLink-XG, it is possible to model TCP CUBIC with the SACK option enabled on a similar processor (2.5 GHz dual core processor) in just 60 seconds – with further significant improvements expected! We note that because SureLink-XG is implemented in C++, the execution time can easily and reliably be evaluated using the chrono library introduced in C++ 11.
SureLink-XG (green) vs NS-2 (red) execution time - Simulation duration - 300 sec, TCP Segment Size - 576 bytes, RTT - 0.22 seconds.
The combination of slow execution times and the complexity of TCPs Congestion control mechanism has resulted in the lack of rigorous mathematical analysis to accompany network performance measurements. It is thus often easier to deduce what is happening on the network… rather than why it is happening!
SureLink-XG is an alternative modelling approach that produces the same level of accuracy per-TCP Connection/per-packet, and the results are produced more than 30X faster than when using a network simulator! SureLink-XG performance data includes full details on the congestion window trajectory; for example TCP CUBICs multiplicative decrease, concave growth, convex growth and fast convergence (see the results above). Such detailed analysis is only possible by modelling the exact operation of TCP protocols: including the slow-start, congestion avoidance, fast re-transmit and fast recovery algorithms. Furthermore, SureLink-XGs unique modular architecture makes it faster and easier to match the mathematical models to reality (for example, the current implementation of SureLink-XG is based on the net/ipv4 TCP Linux Kernel C-programming code). In addition to the faster-than-real-time analysis for 1 Gbps+ links, SureLink-XG automatically configures each link analysis based on flow data obtained from a data file and does not require further configurations/coding via C++, OTcl/Python, a distinct advantage for the large number of flows on ultra-fast links.
SureLink-XG analysis is currently commercially available for performance evaluation on a link-by-link basis. The analysis can be performed; for example, at a fibre network exchange (at the OLT) or at cell sites in mobile networks. The speed of the analysis means that it is now possible to directly relate the configuration of a link (that is, the link speed and buffer dimensions) to the performance of individual TCP flows (i.e., to the experience of an individual subscribers). The linearity of the execution time with link speeds means that a detailed analysis on busy hour traffic can be provided on a 24-hour basis for links as fast as 100 Gbps.
The SureLink-XG concept was invented and developed independently in its entirety by Frank M., (Ph.D.).
The architecture and implementation of the enterprise ready SureLink-XG solution are also independently created and developed in their entirety by Frank M., (Ph.D.).
This work is dedicated to my family and particularly Katho who has stood in my corner for whats been a fight against all the odds!