Análisis y evaluación de arquitecturas heterogéneas basadas en Intel Xeon Phi para problemas científicos basados en patrones de cómputo stencil

  1. Hernandez Hernandez, Mario
Supervised by:
  1. Juan Manuel Cebrian González Director
  2. José María Cecilia Canales Director
  3. José Manuel García Carrasco Director

Defence university: Universidad de Murcia

Fecha de defensa: 29 April 2016

Committee:
  1. Manuel Ujaldón Martínez Chair
  2. Domingo Giménez Cánovas Secretary
  3. José García Rodríguez Committee member
Department:
  1. Computer Engineering and Technology

Type: Thesis

Abstract

Abstract The growth in research fields based on simulation and modelling, along withthe ever increasing needs of the services sector (web services and databases),is pushing the capability limits of high performance architectures. To addressthis issue, most supercomputers are moving to heterogeneous designs, wheretraditional latency-oriented processors are packed together with a substantialnumber of throughput-oriented accelerators. There is a vast sea of applicationsfound running in supercomputers, and some of them share specific computationpatterns. Finite difference methods are among the most common computationalpatterns in many fields of Science and Engineering. This pattern, also known asStencil, is commonly used to solve partial differential equations (PDEs). This Thesis focuses on the design, analysis and evaluation of scientific applicationsfor high performance computing based on heterogeneous architectures.More specifically, we have focused on the development of scientific codes basedon Stencilpatterns for an x86 heterogeneous architecture. This architecture packtogether both latency-oriented and throughput-oriented processors, namely IntelXeon chips and Intel Xeon Phi cards. Our evaluation has covered the most importantworking modes of Xeon Phi, native and offload. For the native mode wehave proposed a series of guidelines to optimize both performance and energy efficiencyfor applications based on Stencilpatterns, showing three example kernels(acoustic, seismic and heat diffusion) as a case study. The guidelines cover thevectorization and parallelization processes, as well as other simple but powerfuloptimization techniques. In the offload mode we have shown how to run thecode using two Xeon Phi cards, which allowed to improve both performance andhandle larger dataset sizes. We have proposed to relax the accuracy of the coderesults to alleviate the overhead of data communications between acceleratorsand the Xeon processor. Finally, throughout the Thesis, we have aimed to compare our results withthose obtained in two other high-performance architectures: the Intel Xeonmulticore and the Nvidia CUDA architecture. We have done this comparisonboth in runtime and energy efficiency.