Optimizing signatures in hardware transactional memory systems

  1. Quislant del Barrio, Ricardo
Dirigida por:
  1. Oscar Plata González Director/a
  2. Eladio Damián Gutiérrez Carrasco Codirector/a
  3. Emilio López Zapata Codirector/a

Universidad de defensa: Universidad de Málaga

Fecha de defensa: 29 de octubre de 2012

Tribunal:
  1. Francisco Tirado Fernández Presidente/a
  2. María Inmaculada García Fernández Secretario/a
  3. Mikel Lujan Vocal
  4. Manuel Eugenio Acacio Sánchez Vocal
  5. José María Llaberia Griño Vocal

Tipo: Tesis

Teseo: 333050 DIALNET

Resumen

With the advent of chip multiprocessors, hardware manufactures have put a big burden on software development community. The majority of programmers are used to sequential programming, whereas writing high-performance multithreaded programs is currently mastered by a small group. Parallel programming is a complex task that requires an understanding of new hardware concepts, algorithms, and programming tools. Transactional Memory (TM) emerges as an alternative to the conventional multithreaded programming to ease the writing of concurrent programs. The programmer is provided with the transaction construct, which defines a group of computations that are executed atomically and in isolation. Transactional memory establishes an optimistic concurrency model where transactions run in parallel unless a conflict is detected. Conflict detection is performed transparently by the transactional system, and it is critical for performance. In case of hardware-implemented transactional memory, conflict detection is usually carried out by means of signatures, which keep track of the addresses that have been accessed by each transaction. Such signatures are implemented as hashing structures, Bloom filters specifically, that can yield false positives when checking for membership of an address. False positives can seriously degrade the performance of the system. In this thesis, we propose various signature optimizations for hardware transactional memory, focusing on the reduction of the false positive rate mainly. First, an alternative to Bloom filters is devised, called interval filter, which tracks addresses read and written by transactions in form of contiguous memory location chunks, so-called intervals, motivated by the fact that applications frequently exhibit locality access patterns. In the line of exploiting locality of reference, we propose locality-sensitive signatures that defines new maps for the hash functions of Bloom filters in order to reduce the number of bits inserted into the filter for those addresses nearby located. As a result, false conflicts are significantly reduced for transactions that exhibit spatial locality. We also propose two signature schemes two tackle the problem of asymmetry in transactional data sets: a multiset signature and a reconfigurable asymmetric signature. Transactions frequently show an uneven cardinality of their sets of read and written addresses, while read and write filters are usually implemented with the same size each. Multiset signatures merge both filters in a single one, so that read and write false positive rates equalize each other. Multiset signatures are in turn optimized by exploiting locality and other properties of data access patterns. As regards reconfigurable asymmetric signatures, they can be dynamically configured at runtime to devote different amount of hardware either to the read set or to the write set of transactions. Finally, we conducted a scalability study to show the response of our proposals, compared to the common schemes, in the presence of contention, large transactions and different number of cores. All experiments in this thesis have been performed by using state-of-the-art hardware simulators and benchmarks of the field of transactional memory. Specifically, we used Wisconsin GEMS, along with Simics for the transactional memory and chip multiprocessor simulators. We used the entire Stanford STAMP benchmark suite, which is specially designed for transactional memory research, together with EigenBench, a novel synthetic benchmark for studying orthogonal characteristics of TM systems. Finally, CACTI and Synopsys were also used for hardware area, power and time estimates.