Diseño de GPUs eficientes energéticamente explotando la coherencia entre fotogramas y optimizando los accesos a memoria

  1. Corbalán Navarro, David
Supervised by:
  1. Juan Luis Aragón Alcaraz Director
  2. Antonio González Colás Director

Defence university: Universidad de Murcia

Fecha de defensa: 26 June 2023

Committee:
  1. Julio Sahuquillo Borras Chair
  2. José Luis Abellan Miguel Secretary
  3. José María Arnau Montañés Committee member

Type: Thesis

Abstract

The use of mobile devices such as smartphones, tablets and smartwatches has become so rampant in recent years that they are now part of our daily lives. in recent years, to such an extent that they are now part of our daily lives. At the same time, users are demanding more and more functionalities and capabilities such as, for example, more performance in the video games section. In this case, video games are driven by the device's graphics processor, known as GPU (Graphics Processing Unit) and although they require the use of the CPU to process other non-graphical tasks such as the physics engine or artificial intelligence, most of the work is destined to the GPU. In this Thesis we propose, design, implement and evaluate three techniques for mobile device GPUs to reduce power consumption. To achieve this, we take advantage of the principle of temporal coherence between frames, which states that two frames close in time should be similar due to the inherently continuous nature of the motion of objects in a scene. As proposals of this Thesis, first, Ω-Test is presented as a solution to a problem called overdraw, which accounts for 37.7% on average (given the benchmarks evaluated). To effectively alleviate this problem, Ω-Test takes advantage of the Z-Buffer of the previous frame (referred to as Ω-Table), instead of starting from scratch. This causes many more fragments to be prematurely discarded, thus reducing the scene overdraw. This has reduced shading from overdraw by 32.7% on average, resulting in a 16.3% increase in performance and a reduction in power consumption by 15.2% on average. As a second proposal, Triangle-Dropping is presented, a technique that saves the processing of occluded primitives in the geometry phase. In the benchmarks evaluated it has been observed that, on average, 37.7% of the geometry of a scene ends up being written in the Parameter Buffer, of which 60.1% is completely occluded. To drastically reduce the number of occluded triangles, Triangle-Dropping predicts the visibility of the triangles of the current frame by taking advantage of that of the previous frame. Thanks to this series of optimizations, Triangle Dropping manages to reduce 31.4% of the geometry present in the Parameter Buffer, which is 57% of the occluded geometry, thus resulting in an overall performance increase of 20.2%, while reducing power consumption by 14.5% on average. Finally, DTM-NUCA (Dynamic Texture Mapping-NUCA) is proposed, a technique capable of increasing the effective capacity of the texture caches of the fragment processing cores (the so-called Fragment Processors). In order to take advantage in a more efficient way of the texture caches of the Fragment Processors, DTM-NUCA implements a mechanism based on a NUCA (Non-Uniform Cache Access) architecture that allows texture blocks to be shared between Fragment Processors. This dynamic mapping scheme increases performance by 16.9% in the case of the centralized Affinity Table and by 15.7% in the case of the distributed version. In addition, energy consumption is reduced by 11.1% and 10.3% for the centralized and distributed configurations respectively. TEAPOT, a simulation framework for mobile GPUs, has been used for the realization of the Thesis. In short, throughout this Thesis, advanced micro-architectural techniques focused on mobile GPUs have been proposed, developed and evaluated in order to considerably reduce their power consumption and increase their performance, making the user experience much more satisfactory. Autor/es principal/es: Corbalán Navarro, David