Approaches to network interface design by offloading/onloadinganalysis and optimization by full-system simulation
- Ortiz García, Andres
- Julio Ortega Lopera Directeur/trice
- Alberto Prieto Espinosa Directeur/trice
Université de défendre: Universidad de Granada
Fecha de defensa: 11 novembre 2008
- Francisco José Quiles Flor President
- Antonio Francisco Díaz García Secrétaire
- José Manuel García Carrasco Rapporteur
- Alberto Peinado Domínguez Rapporteur
- Marcelo Cintra Rapporteur
Type: Thèses
Résumé
In the last years, diverse network interface designs have been proposed to cope with the link bandwidth increase that is shifting the communication bottleneck towards the nodes in the network, The main point behind some of these network interfaces is to reach an efficient distribution of the communication overheads among the different processing units of the node, thus leaving more host CPU cycles for the applications and other operating system tasks. Among these proposals, protocol offloading searches for an efficient use of the processing elements in the network interface card (NIC) to free the host CPU from network processing. The lack of both, conclusive experimental results about the possible benefits, and a deep understanding of the behavior of these alternatives in their different parameter spaces, has caused some controversy about the usefulness of this technique. On the other hand, the availability of multicore processors and programmable NICs, such as TOEs (TCP/IP Offloading Engines), provides new opportunities for designing efficient network interfaces to cope with the gap between the improvement rates of link bandwidths and microprocessor performance. This gap poses important challenges related to the high computational requirements associated to the traffic volumes and the wider functionality to be supported by the network interface has to support. This way, taking into account the rate of link bandwidth improvement and the ever changing and increasing application demands, efficient network interface architectures require scalability and flexibility. An opportunity to reach these goals comes from the exploitation of the parallelism in the communication path by distributing the protocol processing work across processors which are available in the computer, i.e. multicore microprocessors and programmable NICs. Thus, after a brief review of the different solutions that have been previously proposed for speeding up network interfaces, this thesis analyzes the onloading and offloading alternatives. Both strategies try to release host CPU cycles by taking advantage of the communication workload execution in other processors present in the node. Nevertheless, whereas onloading uses another general-purpose processor, either included in a chip multiprocessor (CMP) or in a symmetric multiprocessor (SMP), offloading takes advantage of processors in programmable network interface cards (NICs). From our experiments, implemented by using a full-system simulator, we provide a fair and more complete comparison between onloading and offloading. Thus, it is shown that the relative improvement on peak throughput offered by offloading and onloading depends on the rate of application workload to communication overhead, the message sizes, and on the characteristics of the system architecture, more specifically the bandwidth of the buses and the way the NIC is connected to the system processor and memory. In our implementations, offloading provides lower latencies than onloading, although the CPU utilization and interrupts are lower for onloading. With the background provided by the results which were obtained by using our offloading approaches, we propose a hybrid network interface that can take advantage both, of the offloading and onloading approaches. We also explain the results obtained from the perspective of the previously described LAWS model and propose some changes in this model to get a more accurate approach to the experimental results. From these results, it is possible to conclude that offloading allows a relevant throughput and latency improvement in some circumstances that can be qualitatively predicted by the LAWS model. Thus, we have modified the original LAWS model, including three new parameters that enable the fitting of the experimental results more in a accurately way. Finally, we use a real web server application for loading the server and making several experiments in order to get a general view of the behavior of our offloading approaches under a real and typical application.