Improving Energy and Area Scalability of the Cache Hierarchy in CMPs
- Valls Mompó, Joan Josep
- Julio Sahuquillo Borras Director/a
- María Engracia Gómez Requena Director/a
Universidad de defensa: Universitat Politècnica de València
Fecha de defensa: 02 de marzo de 2017
- Manuel Eugenio Acacio Sánchez Presidente
- Salvador Petit Secretario/a
- José Manuel García Carrasco Vocal
Tipo: Tesis
Resumen
As the core counts increase in each chip multiprocessor generation, CMPs should improve scalability in performance, area, and energy consumption to meet the demands of larger core counts. Directory-based protocols constitute the most scalable alternative. A conventional directory, however, suffers from an inefficient use of storage and energy. First, the large, non-scalable, sharer vectors consume unnecessary area and leakage, especially considering that most of the blocks tracked in a directory are cached by a single core. Second, although increasing directory size and associativity could boost system performance by reducing the coverage misses, it would come at the expense of area and energy consumption. This thesis focuses and exploits the important differences of behavior between private and shared blocks from the directory point of view. These differences claim for a separate management of both types of blocks at the directory. First, we propose the PS-Directory, a two-level directory cache that keeps the reduced number of frequently accessed shared entries in a small and fast first-level cache, namely Shared Directory Cache, and uses a larger and slower second-level Private Directory Cache to track the large amount of private blocks. Experimental results show that, compared to a conventional directory, the PS-Directory improves performance while also reducing silicon area and energy consumption. In this thesis we also show that the shared/private ratio of entries in the directory varies across applications and across different execution phases within the applications, which encourages us to propose Dynamic Way Partitioning (DWP) Directory. DWP-Directory reduces the number of ways with storage for shared blocks and it allows this storage to be powered off or on at run-time according to the dynamic requirements of the applications following a repartitioning algorithm. Results show similar performance as a traditional directory with high associativity, and similar area requirements as recent state-of-the-art schemes. In addition, DWP-Directory achieves notable static and dynamic power consumption savings. This dissertation also deals with the scalability issues in terms of power found in processor caches. A significant fraction of the total power budget is consumed by on-chip caches which are usually deployed with a high associativity degree (even L1 caches are being implemented with eight ways) to enhance the system performance. On a cache access, each way in the corresponding set is accessed in parallel, which is costly in terms of energy. This thesis presents the PS-Cache architecture, an energy-efficient cache design that reduces the number of accessed ways without hurting the performance. The PS-Cache takes advantage of the private-shared knowledge of the referenced block to reduce energy by accessing only those ways holding the kind of block looked up. Results show significant dynamic power consumption savings. Finally, we propose an energy-efficient architectural design that can be effectively applied to any kind of set-associative cache memory, not only to processor caches. The proposed approach, called the Tag Filter (TF) Architecture, filters the ways accessed in the target cache set, and just a few ways are searched in the tag and data arrays. This allows the approach to reduce the dynamic energy consumption of caches without hurting their access time. For this purpose, the proposed architecture holds the X least significant bits of each tag in a small auxiliary X-bit-wide array. These bits are used to filter the ways where the least significant bits of the tag do not match with the bits in the X-bit array. Experimental results show that this filtering mechanism achieves energy consumption in set-associative caches similar to direct mapped ones. Experimental results show that the proposals presented in this thesis offer a good tradeoff among these three major design axes.