# Performing fast calculations of network demand using multi-core CPUs

Rather than running at *faster clock-speeds*, modern CPUs increasingly rely on *multi-core technology* to deliver performance. High performance devices such as smart phones and desktop PCs leverage the power of multi-core CPUs to enable multitasking; i.e., the capability to run several *different *applications without degrading the overall performance/responsiveness of the operating system. However, improving the performance of a *single* application using a multi-core CPU is a non-trivial task – particularly when the application requires the execution of several consecutive tasks; i.e., the results of one task form the inputs of the next task.

This latter statement accurately describes the challenge faced when using multiple CPU cores to solve a network modelling problem – for example, calculating the *network demand* based on the *exact operation* of the TCPs congestion control algorithm. Despite the apparent contradiction of performing (would-be) *consecutive* tasks... well, *concurrently*, SureLink-XG has recently demonstrated that it is indeed possible to significantly *parallelise* and *solve* such network modelling problems on multiple cores – doubling its speed on a 2.5 GHz dual-core processor to achieve computational speeds that are now at least 60X faster than comparable network modellers such as OPNET and NS-2/3.

CPU Usage for the *SureLink-XG serial algorithm (top) and the SureLink-XG parallel algorithm (bottom). *The *parallelisation* of the SureLink-XG algorithm (executing the algorithm on 2 CPU cores at the same time) means that CPU usage increases from 25% to 50% which in turn significantly increases the computational speed!

This development makes the 2018 news of Intel’s most powerful processor which is capable of clock speeds of 5 GHz across all of its 28 cores, or *AMDs Threadripper-2* with a staggering 32 cores – very pertinent to the task of calculating the network demand in ultra-fast networks. On such processors – and based on the latest developments in the algorithms kernel – SureLink-XG is theoretically capable of computational speeds that are orders of magnitude faster than the discrete event modelling approach adopted by OPNET, OMNET and NS-2/3 – simply by applying more CPU cores to the problem.

Until very recently, the two alternatives available for a detailed analysis on the flow of packets in *statistically multiplexed* (i.e., *queuing) systems* such as the Internet have been *discrete time* and *discrete event simulations* (*DTS* vs *DES*). In the former, the simulated “clock” is advanced by a fixed time instance regardless of any changes in the network state. In the latter, the simulated clock “jumps” to the next time instant at which the network state will change; e.g., the arrival of a TCP segment at a client device. Despite the obvious efficiency of the latter approach, a well-known disadvantage of DES is the slow execution time, particularly for high-speed links, which can be somewhat improved upon by applying queuing models in a *hybrid simulation* - albeit with the loss of some accuracy. Indeed, observing the strict chronology of events and the sequential order in which they must be performed is a major reason why it can take an hour (or more) to calculate 200 seconds of network activity on a 1Gbps link!

The *SureLink-XG serial algorithm* (*dubbed the “Mach” algorithm*) used a *unique algorithm* to show that it is possible to complete the same calculation in just 60 seconds; that is at least 30X faster than discrete event simulation on comparable hardware. The new *SureLink-XG parallel algorithm* (*dubbed the “Warp” algorithm*) has now nearly halved this time by performing *different parts of the same calculation near-simultaneously* on two CPU cores. The fact that the new parallel implementation is not exactly double the speed of the serial implementation shows that a strict chronology of network events remains an *(almost) immutable law*. However, a factor of approximately twice demonstrates that the *serial *process described for discrete event simulation has effectively been *parallelised* using the new SureLink-XG algorithm! It is also interesting that the speed advantage stabilises rapidly and is also realised for high-speed links which require simulating a large number of network events. An additional speed advantage has been gained by compiling the SureLink-XG executable for 64-bit processors, but clearly this advantage is not as scalable as the advantage gained from *concurrent processing*.

Executing the SureLink-XG algorithm on multiple cores significantly increases the speed of execution and sets the stage for analysing high-speed links

Note that great care has been taken not to exceed the *hardware concurrency* limit (i.e., the number of CPU cores) during the evaluation of SureLink-XGs parallel algorithm. Despite the possibility of a greater degree of parallelism/concurrency via *hyperthreading*; e.g., using Intel’s *simultaneous multithreading* technology, there is the increased risk of degrading the performance of the application due to *context switching* – i.e., the process of loading and unloading the state of the various tasks during the on-off cycles required for time-sharing on a single CPU core.

The most direct approach to measure the network demand is to use the *Simple Network Management Protocol* (SNMP) to access the interface counters kept by the *Management Information Bases* (MIB) such as the number of bytes sent or received. Indeed, Cisco has published a how-to guideline on calculating bandwidth utilization using SNMP. However, this approach does not account for the *bursty* flow of Internet traffic which arises due to the operation of the *congestion control algorithm* and can be significant on high-speed links. In order to account for *bursty* traffic, greater accuracy is required when measuring the flow of internet traffic. However, *computing* the number of packets flowing into an ultra-fast network is still preferable to *counting*/*measuring* the number of packets in the network (i.e., via *continuous packet capture*) due to the sheer volume of data! Even when *packets* are aggregated into *flows* (e.g., a set of IP packets belonging to the same network application), the size of a flow data repository can easily exceed tens of terabytes. However, because flow monitoring achieves a significant reduction in the overall data volume – in the order of 1/2000 – a significant proportion of network operators support flow monitoring.

A survey of the traffic measurement/modelling solutions that can be used during the design and operation of ultra-fast data networks

Ultra-fast calculations of the amount of data flowing into the network can further reduce the high cost of the substantial infrastructure required for storage and analysis in the measurements approach. In fact, *the method described in this publication can be generalised for any number of nodes and any congestion control algorithm, bringing packet-level mathematical precision to the task of designing and operating ultra-fast data networks*.

When it comes to performing calculations to determine the flow of information on the Internet, the trade-off is typically one between the *execution time* and the *level of detail*. *Stochastic or trace driven simulations* of a network queue are relatively fast and can provide information such as the packet delay for a given link utilization – either using direct measurement or if the *arrival rate* and *service rate* of packets are *both* known/can be estimated using a well-known statistical distributions; e.g., a *Poisson* arrival rate, with exponentially distributed service times. But again, these parameters are a function of the *congestion control algorithm* which either implies processing a huge volume of data, or facing the difficulty of estimating the requisite statistics from direct measurements. On the other hand, network modelling through discrete event simulation provides a detailed analysis by incorporating the operation of the congestion control algorithm. However, this approach is very slow (particularly for high-speed links) and because manual configuration is required for each analysis, this approach does not lend itself well to the analysis of network measurements.

SureLink-XG *auto-configuration* means that it is now possible to directly analyse network measurements with *unprecedented* speed and accuracy. Given that it is a far more tractable problem to count the number of subscribers than it is to count the number of packets flowing into the network every second, the speed attribute implies that it will soon be possible to analyse high-speed links (up-to and beyond 100Gbps) in *real-time*. Also, by modelling the exact operation of the congestion control algorithm, SureLink-XG accuracy has implications for the network expansion budget when considering that tsunami of data that will be generated in the *Internet of Things*.

Acknowledgements:

This work is dedicated to *Mrs Mate*, *Mr Kasumba* and *Kevin Paulson* for making me a better mathematician (each in their turn), *Rob Miles* for teaching me how to code in C++ and assembly (even though I didn't fully appreciate it at the time :), and finally *Kai-Kit Wong* for encouraging me to read as many papers as I could get my hands on and thus making me a better researcher :)