Toward Energy-Efficient High-Performance Organizations of the Memory Hierarchy in Chip-Multiprocessors Architectures

  1. Villa, Francisco J.
  2. Acacio Sánchez, Manuel
  3. García Carrasco, José Manuel
Revista:
Journal of Computer Science and Technology

ISSN: 1666-6038

Año de publicación: 2006

Título del ejemplar: Seventeenth Issue

Volumen: 6

Número: 1

Páginas: 1-7

Tipo: Artículo

Otras publicaciones en: Journal of Computer Science and Technology

Resumen

Chip-multiprocessor systems or CMPs have emerged as a high-perfomance organization for the increasing number of transistors available on a chip, and are projected to dominate the market of server and desktop computers. CMPs require innovative designs of on-chip memory hierarchies, especially designed to address the problems that arise in this novel kind of architecture: higher memory bandwidh demand from more processing cores and the increasing latency of off-chip cache misses. Moreover, the energy consumption topic is even more pressing than in traditionalmultiprocessors, as the CMPs are commonly used in embedded systems. This paper presents a survey of some of the proposals that have recently appeared facing these topics.

Referencias bibliográficas

  • References [1] V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D Burger. "Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures". In Proc. of 27th Int'l Symp. on Computer Architecture, pages 248-259, June 2000.
  • [2] M. Annavaram, E. Grochowski, and J. Shen. "Mitigating Amdahl's Law Through EPI Throttling". In Proc of 32th Int'l Symp. on Computer Architecture, pages 298-309, June 2005.
  • [3] L.A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing". In Proc. of 27th Int'l Symp. on Computer Architecture, pages 282-293, June 2000.
  • [4] B. Beckmann and D. Wood. "Managing Wire Delay in Large Chip-Multiprocessor Caches". In Proc. of 37th Int'l Symp. on Microarchitecture, pages 319-330, December 2004.
  • [5] Z. Chishti, M. D. Powell, and T. N. Vijaykumar. "Optimizing Replication, Communication, and Capacity Allocation in CMPs". In Proc. of 32th Int'l Symp. on Computer Architecture, pages 357-368, May 2005.
  • [6] Intel Corporation. "Dual-Core Update to the Intel Itanium 2 Processor. Reference Manual", January 2006.
  • [7] J. Donald and M. Martonosi. "Temperature-Aware Design Issues for SMT and CMP Architectures". In Proc. of 2004 Workshop on Complexity Effective Design, June 2004.
  • [8] M. Ekman and P. Stenstrom. "Performance and Power Impact of Issue-width in Chip-Multiprocessor Cores". In Proc. of Int'l Conf. on Parallel Processing, pages 359-368, October 2003.
  • [9] E. Grochowski, R. Ronen, J. Shen, and H. Wang. "Best of Both Latency and Throughput". In Proc. on Int'l Conf. on Computer Design, pages 236-243, October 2004.
  • [10] L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. "The Stanford Hydra CMP". IEEE Micro, 20(2): 71 84, March 2000.
  • [11] J.Huh, D. Burger, and S. W. Keckler. "Exploring the Design Space of Future CMPs". In Proc. of 2001 Int'l Conf. on Parallel Architectures and Compilation Techniques, pages 199-210, September 2001.
  • [12] J Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. "A NUCA Substrate for Flexible CMP Cache Sharing". In Proc. of 10th Int'l Conf. on Supercomputing, pages 31-40, June 2005.
  • [13] R. Iyer. "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms". In Proc. of Int'l Conf. on Supercomputing, pages 257-266, June 2004.
  • [14] N. P. Jouppi. "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers". In Proc. of 17th Int'l Symp. on Computer Architecture, pages 364-373, May 1990.
  • [15] I. Kadayif, M. Kandemir, and I. Kolcu. "Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors". In Proc. of Design, Automation and Test in Europe Conference and Exhibition, pages 882-887, February 2004.
  • [16] R. Kalla, B. Sinharoy, and J.M. Tendler. "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro, 24(2): 40-47, March-April 2004.
  • [17] S. Kaxiras, G. Narlikar, A. D. Berenbaum, and Z. Hu. "Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads". In Proc. of Int'l Conf. on Compilers, Architectures and Synthesis for Embedded Systems, pages 211-220, November 2001.
  • [18] C. Kim, D. Burger, and S. W. Keckler. " An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches". In Proc. of 10th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 211-222, October 2002.
  • [19] C. Kim, D. Burger, and S. W. Keckler. "Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches". IEEE Micro, 23(6):99-107, November/December 2003.
  • [20] S. Kim, D. Chandra, and Y. Solihin. "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture". In Proc. of 10th Int'l Conf. on Parallel Architecture and Compilation Techniques, pages 111-122, September 2004.
  • [21] K. Krewell. "UltraSPARC IV Mirrors Predecessor". Micro. Report, pp. 1-3, November 2003.
  • [22] K. Krewell. "Sun's Niagara pours on the cores". Microprocessor Report, 18(9):11-13. September 2004.
  • [23] R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen. "Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction". In Proc. of Int'l Symp. on Microarchitecture, pages 81-92, December 2003.
  • [24] S. Kumar, D. Jiang, R. Chandra, and J.P. Singh. "Evaluating Synchronization on Shared Address Space Multiprocessors: Methodology and Performance". In Proc. of Int'l Conf. on Measurement and Modeling of Computer Systems, pages 23-34. May 1999.
  • [25] B. C. Lee and D. Brooks. "Effects of Pipeline Complexity on SMT/CMP Power-Performance Efficiency". In Proc. of 2005 Workshop on Complexity Effective Design, June 2005.
  • [26] J. Li and J. F. Martinez. "Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors". In Proc. of 12th Int'l Symp. on High-Performance Computer Architecture, pages 77-87, February 2006.
  • [27] J. Li and F. Martinez. "Power-Performance Implications of Thread-level Parallelism in Chip Multiprocessors". In Proc. of Int'l Symp. on Performance Analysis of Systems and Software, pages 124-134. March 2005.
  • [28] Y. Li, B. Lee, D. Brooks, Z. Hu, and K. Skadron. "CMP Design Space Exploration Subject to Physical Constraints". In Proc. of 12th Int'l Symp. on High Performance Computer Architecture, pages 15-26, February 2006.
  • [29] C. Liu, A. Sivasubramaniam, and M. Kandemir. "Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs". In Proc. of 10th Int'l Symp. on High Performance Computer Architecture, pages 176-185, February 2004.
  • [30] C. Liu, A. Sivasubramaniam, and M. Kandemir, and M.J. Irwin. "Exploiting Barriers to Optimize Power Consumption of CMPs". In Proc. of 19th Int'l Parallel and Distributed Processing Symp., pages 5a-5b, April 2005.
  • [31] B. A. Nayfeh, L. Hammond, and K. Olukotun. "Evaluation of Design Alternatives for a Multiprocessor Microprocessor". In Proc. of 23th Int'l Symp. on Computer Architecture, pages 66-77, June 1996.
  • [32] R. Sasanka, S.V. Adve, Y. Chen, and E. Debes. "The Energy Efficiency of CMP vs. SMT for Multimedia Workloads". In Proc. Int'l Conf. on Supercomputing, pages 196-206, June 2004.
  • [33] A. Settle, D. Connors, E. Gilbert, and A. Gonzalez. "A Dinamically Reconfigurable Cache of Multithreaded Processors". Journal of Embedded Computing: Special Issue on Single-Chip Multi-Core Architectures, December 2005.
  • [34] E. Speight, H. Shafi, L. Zhang, and R. Rajamony. "Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors". In Proc. of 32th Int'l Symp. on Computer Architecture, pages 346-356, May 2005.
  • [35] G. E. Suh, L. Rudolph, and S. Devadas. "Dynamic Cache Partitioning for CMP/SMT Systems". Journal of Supercomputing, 28(1): 7-26, April 2004.
  • [36] A. Yamawaki and M. Iwane. "Organization of Shared Memory with Synchronization for Multiprocessor-on-a-chip". In Proc. of 9th Int'l Conf. on Parallel and Distributed Systems, pages 83-90, December 2002.
  • [37] M. Zhang and K. Asanovic. "Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors". In Proc. of 32nd Int'l Symp. on Computer Architecture, pages 336-345, June 2005.