Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors

Esteve García, Albert

Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors

Esteve García, Albert

Dirigida por:

María Engracia Gómez Requena Director/a
Alberto Ros Bardisa Director
Antonio Robles Martínez Director/a

Universidad de defensa: Universitat Politècnica de València

Fecha de defensa: 10 de julio de 2017

Tribunal:

Pedro Juan López Rodríguez Presidente/a
José Ángel Gregorio Monasterio Secretario/a
Vijayanand Nagarajan Vocal

Tipo: Tesis

Teseo: 143153 DIALNET RiuNet editor

Resumen

Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access latency to distributed caches. The effectiveness of those proposals depends to a large extent on the amount of detected private data. However, the mechanisms proposed so far either do not consider either thread migration or the private use of data within different application phases, or do entail high overhead. As a result, a considerable amount of private data is not detected. In order to increase the detection of private data, this thesis proposes a TLB-based mechanism that is able to account for both thread migration and private application phases with low overhead. Classification status in the proposed TLB-based classification mechanisms is determined by the presence of the page translation stored in other core's TLBs. The classification schemes are analyzed in multilevel TLB hierarchies, for systems with both private and distributed shared last-level TLBs. This thesis introduces a page classification approach based on inspecting other core's TLBs upon every TLB miss. In particular, the proposed classification approach is based on exchange and count of tokens. Token counting on TLBs is a natural and efficient way for classifying memory pages. It does not require the use of complex and undesirable persistent requests or arbitration, since when two ormore TLBs race for accessing a page, tokens are appropriately distributed classifying the page as shared. However, TLB-based ability to classify private pages is strongly dependent on TLB size, as it relies on the presence of a page translation in the system TLBs. To overcome that, different TLB usage predictors (UP) have been proposed, which allow a page classification unaffected by TLB size. Specifically, this thesis introduces a predictor that obtains system-wide page usage information by either employing a shared last-level TLB structure (SUP) or cooperative TLBs working together (CUP).