The design of core/backbone network links is based on queuing theory; namely, given the ratio of the arrival rate of packets to the link transmission rate (i.e., link utilization) it is possible to determine the delay of packets at the queue; the latter of which should be kept at a minimum to avoid congestion [Link to Cisco Backbone Capacity Planning Paper].
In the classical queuing theory, packet delays increase exponentially at a given utilization (defined as the “knee” of the delay-utilization curve) leading to increased buffer occupancy (contention) and eventually to packet loss when the buffer is full (congestion). In order to avoid link congestion, it is therefore necessary to keep the link below a pre-determined utilization; say 80%, which avoids large queue build ups and any subsequent buffer overflow. One of the key challenges that arises when translating this powerful mathematical principle into a pragmatic internet engineering practice is the modelling of the rate of packet arrivals.
Network operators commonly use well-established and widely deployed traffic measurement tools when determining the data arrival rate at a given backbone link. A typical approach involves polling network interface counters such as number of bytes sent/received every 5-minutes (or even courser) as defined by the Management Information Bases (MIB) which is done using the Simple Network Management Protocol (SNMP). This yields the total amount of traffic sent through the network interface over this time-interval, from which the average traffic rate (measured say in Mbps) can be derived. However, if (and it is a rather important “if”) the data arrival rate is not a steady stream of bits, “bursts” of data exceeding the estimated 5-minute average will increase utilization beyond the ideal operating point causing packet delays and eventually packet loss. This observation helps to explain why “some moderate” congestion is indeed inevitable on the Internet.
In the classical theory, one of the distributions that is used to model the arrival processes; i.e., the Poisson distribution, is based on the same principle of counting the number of arrivals in a given time interval which can be approximated by negative exponential inter-arrival times. Classical queuing theory also requires assumptions about the service time of packets (for example negative exponential distribution) which would account for variations in the packet size. If we assume that a link is able to transmit just one packet at a time, the link is said to have a single server, an assertion which constitutes the third key assumption. In the Kendall notation, such a link could be described as an M/M/1 queuing system where the notation lists the assumptions as follows: packet inter-arrival time distribution/service time distribution/number of servers. These assumptions yield a closed-form equation for the delay of packets in an M/M/1 system as packet delay = (ρ × s)/(1 ̶ ρ), where “ρ” is the utilization and “s” is the mean packet service/transmission time of the link [Link to Book on IP and ATM Performance, Chapter 4].
Unfortunately, this theory cannot be readily translated into practice because observations of network traffic exhibit “self-similarity “– a property which makes it particularly difficult to define an average packet inter-arrival time or distribution. Self-similarity in the case of aggregated network traffic (i.e., the total number of packets or bytes per unit time) is the property of exhibiting observable “bursts” which are extended periods above the mean at a wide range of timescales [Please see the first figure in the Previous Blog on Performance Engineering]. In other words, even if the 5-minute average is constant (by definition), the property of self-similarity predicts that this average would change if measurements are taken over a different 5 minute interval or even at sub 5-minute intervals. This means that internet traffic is "wild" on different timescales and does not have a "typical average" as would be expected for Poisson distributed traffic.
In order to derive a relationship between delay and utilization in practice, an approach attributed to Telkamp [Link to Cisco Backbone Capacity Planning Paper] used a set of packet-level measurements from an operational IP backbone link carrying both IP and VPN traffic to generate inputs for a trace-driven simulation of a queue. Each set of packet measurements or “traces” contained timestamps of the arrival time for every packet over a 5-minute interval. The traces were then multiplexed resulting in an aggregated trace with a higher 5-minute average which was then run through a fixed-speed queue. For example, three traces with 5-minute average rates of 126 Mbps, 206 Mbps and 240 Mbps were multiplexed to produce a trace with a 5-minute average of 572 Mbps, which was run through a fixed 622 Mbps queue (that is, a 5-minute utilization of 92%). The packet delay was then calculated during the simulation to produce a single data point in the delay-utilization curve. When the process was repeated with different combinations of the input traces, multiple data points were produced for a specific fixed-speed queue.
Using the delay-utilization curve that is produced thus, it is possible to determine the maximum utilization of the link that would keep delays at a minimum. In this case, the fraction by which the link rate exceeds the data arrival rate for a given delay (e.g., 2 milliseconds) is the overprovisioning factor.
Such an analysis is extremely useful (regardless of whether traffic is self-similar or not) if we assume that the 5-minute average of the input traffic is based say on a 95th percentile measurement; that is, 95% of measurements are below this 5-minute average arrival rate [Link to Dr Peering Article on Metered Traffic Measurements]. However, it is the author’s opinion that assuming a fixed arrival rate when varying the link transmission rate (e.g., in order to lower utilization) imposes rather than uncovers a “traffic invariant” which is then implied in the predictions of the resulting parsimonious model of the network. Indeed, such an approach is reminiscent of the legacy POTS (Plain Old Telephone Service) networks engineering which was based on the fixed data rate required for voice communications.
In order to digitize speech, the analogue speech voltage signal from a microphone is passed through a bandpass filter followed by an Analogue-to-Digital Converter (ADC) which performs sampling and quantization to produce the digital output. After bandpass filtering, the speech signal content is limited to a bandwidth of between 300 Hz and 3400 Hz as it enters the ADC where it is sampled at 8 KHz, well above the Nyquist sampling rate. In converting each sampled digital voltage to a digital format, more than 8 bits/sample are necessary to make the reconstructed digital speech indistinguishable from the bandlimited input. Using 16 bits per sample, for example, results in a small reconstruction error and thus high speech quality at the receiver. For high quality speech transmission, the raw data rate required is therefore 16 bits × 8 kHz = 128 Kbps [Link to Book on Speech Coding, Chapter 1]. Because of this fixed data rate, everything in the POTS network boils down to knowing the long term arrival rate and the aggregation capacity of links is accurately determined based on the queuing theory analysis.
In stark contrast, the bit rate required when data is transferred over the internet is largely dependent on the application running on the user device. In keeping with the analogy from the discussion above, we would not be able to determine a “typical” bandwidth (e.g., 300 Hz to 3400Hz) if we attempted to “digitize a computer’s speech.” This difference is perhaps the reason behind the statement by the authors of the paper Where Mathematics Meets the Internet [Link to the Willinger and Paxson Paper].
“But in fact much of the voice traffic modelling has proven nothing short of disastrous when applied to data networks, for the simple but profound reason that the rules all change when it is computers and not humans doing the talking.”
A very significant omission in the queueing theory analysis (and one that is made conspicuous by its absence) is the closed-loop operation of TCP where the congestion control mechanism determines the flow of packets over very fine timescales, from hundreds of milliseconds downwards. In order to study the impact of such closed-loop flow control, the author of this Blog conducted network simulations in which TCP was (in a sense) allowed to “push the utilization beyond the knee of the delay-utilization curve.” Several flows were multiplexed each with a fixed propagation delay along the path and a variable transmission delay due to queuing; both of which added up to the round trip latency per packet. The average packet inter-arrival time was then derived and used to calculate the average data arrival rate and the link utilization. Differences in the propagation delays between the flows allowed for the evaluation of TCP throughput and fairness.
The results shown in the table below suggest very strongly that TCP can indeed stabilize packet delays and losses across the link at very high utilization, i.e., greater than 98% - an important result which is not anticipated by the classical queuing theory. Specifically, stability of connection speed suggests that the packet delays at the queue are not unbounded as predicted by queuing theory. At such high utilization rates, the overprovisioning factor is very nearly unity, which suggests that TCP’s steady-state packet arrival rate is a critical parameter when designing networks, if indeed such a quantity can be defined.
One of the aspects that makes the engineering of internet performance painfully difficult has been shown to be that the measured internet traffic is wild and remains so even on quite course time scales. It is thus difficult to use models to gauge performance via the conventional traffic engineering approach and specifically queuing theory. The results obtained by the author of this Blog suggest that TCP can stabilize packet delays and losses at extremely high link utilization; i.e., when links are “running very hot,” and achieve a stable minimum subscriber connection speed. In fact, engineering links to achieve a peak connection speed remains one of the stated ultimate goals of engineering internet performance [Link to the Willinger and Paxson Paper].
“…when the ultimate goal is to enable the tens of millions of Internet users to determine what performance they can obtain from the network, irrespective of where they are and when they want this information, and how to improve the engineering of the network to meet their myriad needs, then the analysis problems acquire a central element of scale, extending well beyond what has previously been attempted.”
This lofty ambition of engineering the internet for “laissez-faire” connectivity has been recently demonstrated using the SureLink-XG concept which implements precise mathematical models to predict TCP’s steady-state operation and determine the capacity of links that supports a peak subscriber Internet connection speed [Link to the first Blog on Performance Engineering]. The speed requirements of internet subscribers are already fully specified by the internet package speed tiers and the number of simultaneous connections can be determined through flow monitoring. SureLink-XG is thus able to re-define congestion as a subscriber-centric (rather than a network-centric) phenomenon and determine the network capacity that increases individual connection speeds to the required level.