Pacing at the fair-share of the bottleneck bandwidth *without* explicit network-level feedback
Pacing at the optimal rate should not only; 1) maximize utilization without exacerbating congestion (the latter of which occurs when the packet arrival rate at a bottleneck is greater than the transmission rate) but also; 2) ensure that the available bandwidth is equally shared amongst competing connections (at the optimal rate, flows with a large BDP tend to be more opportunistic when contending for the available bottleneck bandwidth).
Google’s new congestion control algorithm, BBR, readily meets the first requirement – sending a Bandwidth-Delay Product (BDP) number of segments at the Bottleneck Bandwidth (BtlBw) rate – thus operating at the so-called Klienrock point. Clearly both the cwnd and the pacing rate are key control parameters for maximum utilization and operating at the Klienrock operating point, respectively.
The recently published literature suggests that BBR v2 relies on a SACK/DCTCP inspired ECN to meet the second requirement of pacing at the optimal rate. However, the packet loss rate (SACK) approach may converge slowly and network-level (IP-layer) feedback from an intermediate network device (e.g., AQM) may not be available on the commodity device.
This article focuses on the “prospective” stage of the BBR connection (START-UP and DRAIN) and considers simple modifications to the BBR sender that can rapidly meet both requirements for optimal pacing without having to rely on network-level (IP-layer) feedback.
The results above show a high degree of fairness can be achieved without the aid of network-level feedback.
Fair-share packet conservation
One of the key indicators of congestion is packet drops when a buffer is full; or (alternatively) when a gateway marks the packet to notify the source to reduce the sending rate for that connection.
The key principle that enables maximum utilization at a fair-share of the bottleneck bandwidth (and as noted in the seminal RED publication) is as follows – the probability that a packet is marked from a given connection is roughly proportional to the connections share of the bandwidth.
Based on this observation, a high rate of packet drops is a likely indicator of unfair utilization. This means that BBR connections MUST have to adhere to a strict packet conservation regime (a new packet is not put into the network until an old one leaves) following a loss event to ensure maximum fair-share utilization of the bottleneck bandwidth.
The sender can achieve this objective (maximum fair-share utilization) of the bottleneck bandwidth by taking the following necessary steps:
1. Interpret loss (or ECN) as an indication that the current pacing rate is too high and adhere to a strict packet conservation regime during RECOVERY
2. Determine a fair-share pacing rate and CWND based on the packets in-flight at the end of RECOVERY
3. Re-initialise and seed the minmax filter in the current round to the current best estimate of the delivery rate before proceeding to PROBE for the available bandwidth
On the conceptual level, strict packet conservation can be achieved by a simple modification to the BBR send routine which effectively disables pacing at the (presumably inaccurate) BtlBw rate during RECOVERY.
bdp = BtlBwFilter.currentMax * RTpropFilter.currentMin
if (inflight >= cwnd_gain * bdp)
// wait for ack or retransmission timeout
if (now >= nextSendTime)
packet = nextPacketToSend()
if (! packet)
app_limited_until = inflight
packet.app_limited = (app_limited_until > 0)
packet.sendtime = now
packet.delivered = delivered
packet.delivered_time = delivered_time
// check if we are disabling timercallback
if (tcp->mode != RECOVERY)
nextSendTime = now + packet.size /
(pacing_gain * BtlBwFilter.currentMax)
This proposed temporary halt to bursts at the previously estimated BtlBw rate during RECOVERY means that a new or retransmitted segment is sent on each (Duplicate/Partial) ACK (and is thus transmitted at the bottleneck bandwidth rate in proportion to the fair-share of the bandwidth).
TCPs Selective Acknowledgement (SACK) option can be used to continuously update the packets in-flight during RECOVERY by counting out “packets_delivered” (packets newly marked as delivered) and “packets_lost” (packets newly marked as lost out). In this calculation, we store the number of packets in-flight at the end of RECOVERY as “max_packets_delievered.”
Disabling the timercallback during RECOVERY results in a multiplicative decrease in the number of packets in-flight. Such a multiplicative decrease means that it is necessary to start FAST RECOVERY when the packets in-flight is approximately half of “max_packets_out” (maximum packets in-flight when the sender enters RECOVERY).
When the BBR connection recovers and starts to probe for additional bandwidth, the CWND is restored to “max_packets_delivered” – meeting the first requirement of pacing at the optimal rate – maximizing the bottleneck bandwidth utilization.
In addition to maximizing the utilization, the BBR connection must also ensure that queues are minimized by correctly estimating the delivery rate (from which the pacing rate is derived) after exiting RECOVERY.
The simple calculation that can be used to determine the best estimate of the RECOVERY delivery rate is based on the model that BBR uses in the absence of a BtlBw estimate (i.e., when the connection first starts-up).
deliveryRate = (max_packets_delivered * packet.size) / RTprop
It must also be noted that BBR paces at the maximum value produced by the BtlBwFilter which (according to the specifications) had previously been tracking the (hitherto accurate) bottleneck bandwidth rate over 10 packet-timed round trips before entering RECOVERY.
We therefore propose that the BtlBw filter is re-initialised and seeded with the best estimate of the RECOVERY deliveryRate in the current round when responding quickly and precisely to the network congestion and maintaining optimal operation.
Results and discussion
The proposals outlined in this article were evaluated by calculating the data delivery rate of 5 connections over a 3-minute interval for various RTTs ranges. Two different bottleneck bandwidths 125Mbps and 250Mbps were evaluated for uniformly distributed RTTs ranging from a minimum of 50ms to a maximum N x 50ms (where N ranged from 2 to 10). The evaluation is carried out for a deep buffer with the dimensions of 4 X BDP for a segment size of 1500 Bytes.
The model of the congestion control mechanism implemented is ostensibly BBR v2 which drains to BDP/2 during PROBE_RTT with no other modifications apart from pacing conservatively/asymptotically at a margin that is well below (10% – 15%) the measured/filtered BtlBw. In the BBR v2 code, it is noted that pacing at below the measured bottleneck bandwidth reduces queues at the bottleneck (which favours fair-sharing). It is also noted here that pacing does not appear to have a strictly linear relationship with the number of packets in the queue – particularly when traffic mixes.
Jain’s index shows that BBR can achieve a high degree of fairness from START-UP even when completely unaided by the network. The following observations can be made based on these results:
In the absence of explicit network feedback, the packet drop rate is expected to be roughly proportional to a flow’s share of the bottleneck bandwidth.
Strict packet conservation in RECOVERY will reduce the cwnd by the fraction of segments that encounter congestion within a round.
The flow’s utilization quickly converges to the fair-share within the few rounds of START-UP where there is congestion.
The next step will be to examine the proposed changes in a heterogeneous congestion control environment as well as evaluating other signals of an impending congestion event.
In loving memory of Jennifer Alexander-Mung'au (1982 - 2020) - we miss you - to finishing what we start, and starting what we need to finish.