When deploying and operating broadband networks, it is essential to implement an analytical approach that determines if congestion is occurring at a potential bottleneck link or at a link with a high level of subscriber aggregation. Indeed, it is entirely possible for performance problems to occur when an internet connection is slowed down by a fast link rather than by a slow link owing to the presence of competing traffic on shared links.
When information is downloaded on the internet say from a cloud server to the user device (e.g., when streaming videos), data will typically arrive via a fast link in a Wide Area Network (WAN) which extends over a large geographical region and proceed to a slower link in the Local Area Network (LAN) which span a relatively small geographical area in the vicinity of the subscriber. Alternatively, when data is uploaded on the Internet say from user devices to the cloud servers (e.g., when sharing videos on social media), multiple input streams may arrive at a router whose output capacity (a fast WAN link) is less than the sum of the inputs (slower LAN links) [Link to Stevens TCP Illustrated].
The definition above provides a useful first impression by suggesting that bottlenecks result from overloads caused by high load sessions (likely to occur in the downstream direction) or due to the convergence of a significant number of moderate load sessions at the same queue (which would likely occur in the upstream direction) [Download MIT Presentation on Traffic Behaviour and Queuing in a QoS Environment]. However, such an analysis is based purely on the maximum transmission rate of the links at an interconnection (router/switch) and does not fully account for the contention for bandwidth on shared links and the different levels of subscriber aggregation that would be encountered in LANs and WANs.
During a bulk data transfer, the dominant data transfer protocol on the Internet (TCP) starts by sending a small number of back-to-back packets, the sum of which is referred to as the congestion window (cwnd) such that 1 ≤ cwnd ≤ 3. If we assume that two back-to-back packets arrive at the link with the minimum transmission rate along the path, a “gap” is imposed between the back-to-back packets (see the figure below) by the time it takes to transmit a packet on the slowest link [Link to Van Jacobson Paper on Congestion Avoidance and Control]. Acknowledgements (ACKs) from the receiver will therefore arrive at the sender separated by this “gap” which incidentally, is necessarily less than the round trip time (RTT). Each ACK results in the transmission of two additional back-to-back packets from the bulk of data during slow-start (exponentially increasing cwnd e.g., from 2 to 4 to 8…, per RTT), whilst each ACK results in the transmission of a single additional packet in congestion avoidance until a total of cwnd ACKs have been received when two additional back-to-back packets are transmitted (producing a linear increase in cwnd e.g., from 2 to 3 to 4…, per RTT). The rate at which data is transmitted is therefore at least cwnd/RTT when measured in packets per second and thus the congestion window provides a useful indicator of the data transmission rate. However, the case where multiple flows share a link is complex and requires a rigorous analysis such as the one presented by authors at the Hamilton Institute [Link to Paper on Modelling TCP Throughput and Fairness] who used an analysis based on Positive Linear Systems to obtain a closed-form expression for the steady-state congestion window of a flow when several connections share a link. However, the authors rely on assumptions about drop-tail queues and synchronised packet drops amongst the flows. The figure below shows how the congestion window contracts thus reducing the speed of a connection when an additional user starts to share a link.
If we assume for simplicity that each flow has a connection speed that is at least as fast as all the other flows sharing a particular link (an assumption that doesn’t necessarily hold in practice, mainly due to differences in flow RTT), the share of bandwidth c [Mbps] available for a given flow when n connections share a LAN link may be greater than the share of the bandwidth C [Mbps] available for the same flow on a WAN link which is shared by a larger number of N connections; i.e., c/n > C/N. In this case, the fast link (rather than the slow link) is the performance limiting factor from the perspective of the subscriber. Formulations such as the one presented by Bonald et. al. [Link to France Telecom R&D Paper] are therefore indispensable when engineering Internet performance as they estimate the aggregated bandwidth requirements based on the level of subscriber aggregation and demand.
Several observations can be made about the analytical method proposed by Bonald et. al. in terms of the ability to predict the aggregated bandwidth requirements.
The exact operation of TCP by which the rate of the flow adjusts gradually and imperfectly to claim a share of bandwidth as described above is ignored in order to reduce the complexity of the analysis. However; and as demonstrated by authors at the Hamilton Institute, modelling the exact operation of TCP is necessary if the complete knowledge of the steady state performance of a given flow is to be attained. Indeed, without such knowledge, it is difficult to predict/solve performance problems for individual subscribers
The simplicity of the estimation analysis relies on the major assumption that flows share the capacity perfectly fairly. In fact, network simulation statistics show that relatively small variations in RTT cause a wide variation in the flow throughput particularly at the higher speed tiers (e.g., 10Mbps +) such as the ones associated with optical fibre deployments; e.g., Fibre To The Home (FTTH) broadband services [Link to SureLink-XG Blog on Performance Engineering]
Finally the estimation analysis makes predictions of the traffic demanded by the subscriber (the so-called equivalent bandwidth per subscriber) based on several hard-to-obtain variables such as mean flow volume, mean flow duration, think time and activity rate corresponding to the transfer of digital documents (e.g., web pages, e-mail, video sequences etc.) by a subscriber at a home/business location. However, it is much simpler and more accurate to predict the traffic demand based only on the IP address of a subscriber and the corresponding Internet access package speed tier
Subscribers are directly connected to the broadband services in the so-called “last mile” where several technologies e.g., Digital Subscriber Line (DSL) and Gigabit Passive Optical Networks (G-PON) can be implemented. In G-PON/FTTH technology, a cluster of subscribers share a single "feeder" optical fibre cable via a passive splitter which connects directly to the Optical Line Terminal (OLT) at a Central Office (CO)/Telephone Exchange. In this way a single fibre optic cable is shared in a “round robin” fashion among the small cluster of subscribers using Time Division Multiple Access (TDMA) at a predefined maximum rate; e.g., 1.25 Gbps for a G-PON port. A line card at the OLT is equipped with several G-PON ports and performs an Ethernet aggregation onto a link with a capacity of 10 Gbps or 40 Gbps. A network card then further aggregates the flows from the line cards at a rate of up to 80 Gbps (see the figure below). In practice, potential bottlenecks are assessed based on congestion ratios (refer to the definition given above for a bottleneck link) and the access/aggregation link capacity is estimated using tools that make simplifying assumptions (as noted above for the formulation due to Bonald et. al.). Using this approach and given that DSL can aggregate up to 900 subscribers using a Digital Subscriber Line Access Multiplexer (DSLAM) at the CO, an estimated bandwidth demand of 2 Mbps per subscriber yields an aggregation capacity requirement of about 2 Gbps at the network card. On the other hand, FTTH can aggregate 16,000 subscribers at the OLT/CO and the same estimated subscriber bandwidth demand of 2 Mbps requires an aggregated bandwidth of 36 Gbps at the network card.
Beyond the last mile and towards the wider network, connectivity is needed when providing links in and around metropolitan areas; links that are sometimes referred to as the “middle mile.” Metropolitan ring networks are a special case that simplifies the overall network architecture which would otherwise require a complex and inefficient point-to-point infrastructure to connect a large number of users [Link to World Bank's Broadband Strategies Handbook]. Given that each network edge (NE) routers can aggregate at most 64,000 subscribers, about 70 DSLAMs (each connecting 900 subscribers) and 4 OLTs (each connecting 16,000 subscribers) can connect to a NE router (see the figure below). Recalling that each DSLAM requires 2 Gbps (2 × 1Gbps links) and each OLT requires 36 Gbps (4 × 10Gbps links), each NE can therefore terminate 140 (1Gpbs) aggregation links in DSL and 16 (10Gbps) aggregation links in G-PON. High capacity OLTs connect directly to a primary NE whilst low capacity OLTs and DSLAMS connect to secondary NEs. Authors from Orange Labs in France [Download Orange Labs Paper] have shown that an increase in the subscriber data demand beyond the 2 Mbps projected in the example above could lead to a very significant increase in Capital/Operational Expenditures (CAPEX/OPEX) and that supporting the higher bit rates can also yield a sharp increase in energy consumption. These pertinent concerns have led the authors to propose alternative architectures for the access and aggregation networks. Further to these conclusions, results produced by the author of this Blog [Link to Blog on Performance Engineering] have shown that the aggregated bandwidth requirements can be higher still if the exact operation of TCP is modelled during the analysis.
Metropolitan Areas Networks (e.g., Metropolitan ring networks) connect to the national network at a Point of Presence (PoP) which may be defined as an interface point where equipment such as servers, routers and switches are housed for the purpose of network interconnection. The various points of presence for the major cities/ regions of a country are usually connected via the Internet backbone network which consists of high-capacity routers connected by high-speed links. Although the analogy is not perfect, backbone networks serve the same function as a country’s highways, allowing for fast connections over the long distances between cities.
In fact, each subscriber modem (which is connected to the ONU) is visible in the WAN as a unique (external) IP address, with Network Address Translation (NAT) making it possible for multiple users to share the same access point at a given home/business location [Link to NAT article]. In addition, the subscriber's Internet access speed is determined by the package purchased from the ISP which means that the speed tier alone is adequate for determining the Quality of Service (QoS) requirements (…and that is without further concern for multiple users sharing the same access point via NAT). The performance engineering method developed by the author greatly simplifies the analysis of aggregation bandwidth requirements by relying only on the knowledge of the level of subscriber aggregation based on the external IP address and the speed tier information.
The key to performance engineering is therefore determining the level of subscriber aggregation at a selected link – information that is readily derived from a process that is widely implemented in operational networks; namely, flow monitoring. At the conceptual level, internet flow monitoring works on the principle of mirroring transit traffic. Network Interfaces can be programmed to automatically make an electronic copy/mirror of part of the packet (typically the header) of the data traffic flowing through an observation point. These records can then be sampled (to decide how much data is recorded) and filtered (to remove unwanted records) before post processing. During post processing, a flow can be classified based on several definitions; one of which is that of packets that stem from the same source [Link to University of Twente Thesis].
The author's method uses parameters such as the number of flows sharing a link to calculate the aggregated bandwidth requirements and; as has been discussed, these parameters are readily obtained in an operational network during the post processing stage of flow monitoring. The mathematical analysis implemented includes the exact operation of TCP and does not make any simplifying assumptions such as the assumptions that flows share the bandwidth perfectly fairly (as proposed by Bonald et. al.) or that the packet drops are synchronised (as proposed by the authors at the Hamilton Institute) - albeit at a higher computational cost. In this way, it is possible to make informed assumptions during network deployment and later qualify these assumptions by taking direct measurements from an operational network. This approach is advocated particularly when solving performance problems such as subscribers not receiving the advertised internet connection speeds from their chosen Internet Service Provider (ISP). The new method easily accounts for both increasing numbers of simultaneous connections and increasing subscriptions to higher speed tiers when solving performance problems for individual subscribers. Finally, the accuracy achieved due to the rigorous mathematical analysis has important implications for the evaluation of infrastructure capital/operational costs and potential energy savings.