Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Published: 01 May 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Improvements in semiconductor technology have made it possible to include multiple processor cores on a single die. Chip Multi-Processors (CMP) are an attractive choice for future billion transistor architectures due to their low design complexity, high clock frequency, and high throughput. In a typical CMP architecture, the L2 cache is shared by multiple cores and data coherence is maintained among private L1s. Coherence operations entail frequent communication over global on-chip wires. In future technologies, communication between different L1s will have a significant impact on overall processor performance and power consumption. On-chip wires can be designed to have different latency, bandwidth, and energy properties. Likewise, coherence protocol messages have different latency and bandwidth needs. We propose an interconnect composed of wires with varying latency, bandwidth, and energy characteristics, and advocate intelligently mapping coherence operations to the appropriate wires. In this paper, we present a comprehensive list of techniques that allow coherence protocols to exploit a heterogeneous interconnect and evaluate a subset of these techniques to show their performance and power-efficiency potential. Most of the proposed techniques can be implemented with a minimum complexity overhead.

    References

    [1]
    {1} SGI Altix 3000 Configuration. "http://www.sgi.com/products/servers/altix/configs.html".
    [2]
    {2} M. E. Acacio, J. Gonzalez, J. M. Garcia, and J. Duato. The Use of Prediction for Accelerating Upgrade Misses in CCNUMA Multiprocessors. In Proceedings of PACT-11, 2002.
    [3]
    {3} V. Agarwal, M. Hrishikesh, S. Keckler, and D. Burger. Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures. In Proceedings of ISCA-27, pages 248- 259, June 2000.
    [4]
    {4} H. Bakoglu. Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley, 1990.
    [5]
    {5} R. Balasubramonian, N. Muralimanohar, K. Ramani, and V. Venkatachalapathy. Microarchitectural Wire Management for Performance and Power in Partitioned Architectures. In Proceedings of HPCA-11, February 2005.
    [6]
    {6} K. Banerjee and A. Mehrotra. A Power-optimal Repeater Insertion Methodology for Global Interconnects in Nanometer Designs. IEEE Transactions on Electron Devices, 49(11):2001-2007, November 2002.
    [7]
    {7} P. Bannon. Alpha 21364: A Scalable Single-Chip SMP. October 1998.
    [8]
    {8} B. Beckmann and D. Wood. TLC: Transmission Line Caches. In Proceedings of MICRO-36, December 2003.
    [9]
    {9} B. Beckmann and D. Wood. Managing Wire Delay in Large Chip-Multiprocessor Caches. In Proceedings of MICRO-37, December 2004.
    [10]
    {10} E. E. Bilir, R. M. Dickson, Y. Hu, M. Plakal, D. J. Sorin, M. D. Hill, and D. A. Wood. Multicast Snooping: A New Coherence Method using a Multicast Address Network. SIGARCH Comput. Archit. News, pages 294-304, 1999.
    [11]
    {11} F. A. Briggs, M. Cekleov, K. Creta, M. Khare, S. Kulick, A. Kumar, L. P. Looi, C. Natarajan, S. Radhakrishnan, and L. Rankin. Intel 870: A Building Block for Cost-Effective, Scalable Servers. IEEE Micro, 22(2):36-47, 2002.
    [12]
    {12} R. Chang, N. Talwalkar, C. Yue, and S. Wong. Near Speed-of-Light Signaling Over On-Chip Electrical Interconnects. IEEE Journal of Solid-State Circuits, 38(5):834-838, May 2003.
    [13]
    {13} Corporate Institute of Electrical and Electronics Engineers, Inc. Staff. IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992. 1993.
    [14]
    {14} A. Cox and R. Fowler. Adaptive Cache Coherency for Detecting Migratory Shared Data. pages 98-108, May 1993.
    [15]
    {15} D. E. Culler and J. P. Singh. Parallel Computer Architecture: a Hardware/software Approach. Morgan Kaufmann Publishers, Inc, 1999.
    [16]
    {16} W. Dally and J. Poulton. Digital System Engineering. Cambridge University Press, Cambridge, UK, 1998.
    [17]
    {17} M. Galles and E. Williams. Performance Optimizations, Implementation, and Verification of the SGI Challenge Multiprocessor. In HICSS (1), pages 134-143, 1994.
    [18]
    {18} G. Gerosa and et al. A 2.2 W, 80 MHz Superscalar RISC Microprocessor. IEEE Journal of Solid-State Circuits, 29(12):1440-1454, December 1994.
    [19]
    {19} R. Ho, K. Mai, and M. Horowitz. The Future of Wires. Proceedings of the IEEE, Vol. 89, No. 4, April 2001.
    [20]
    {20} P. Hofstee. Power Efficient Processor Architecture and The Cell Processor. In Proceedings of HPCA-11 (Industrial Session) , February 2005.
    [21]
    {21} J. Huh, J. Chang, D. Burger, and G. S. Sohi. Coherence Decoupling: Making Use of Incoherence. In Proceedings of ASPLOS-XI, pages 97-106, 2004.
    [22]
    {22} J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A NUCA Substrate for Flexible CMP Cache Sharing. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 31-40, New York, NY, USA, 2005. ACM Press.
    [23]
    {23} P. Kongetira. A 32-Way Multithreaded SPARC Processor. In Proceedings of Hot Chips 16, 2004. (http://www.hotchips.org/archives/).
    [24]
    {24} K. Krewell. UltraSPARC IV Mirrors Predecessor: Sun Builds Dualcore Chip in 130nm. Microprocessor Report, pages 1,5-6, Nov. 2003.
    [25]
    {25} R. Kumar, V. Zyuban, and D. Tullsen. Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads, and Scaling. In Proceedings of the 32nd ISCA, June 2005.
    [26]
    {26} A.-C. Lai and B. Falsafi. Memory Sharing Predictor: The Key to a Speculative Coherent DSM. In Proceedings of ISCA-26, 1999.
    [27]
    {27} A.-C. Lai and B. Falsafi. Selective, Accurate, and Timely Self-Invalidation Using Last-Touch Prediction. In Proceedings of ISCA-27, pages 139-148, 2000.
    [28]
    {28} J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proceedings of ISCA-24, pages 241-251, June 1997.
    [29]
    {29} A. R. Lebeck and D. A. Wood. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In Proceedings of ISCA-22, pages 48-59, 1995.
    [30]
    {30} K. M. Lepak and M. H. Lipasti. Temporally Silent Stores. In Proceedings of ASPLOS-X, pages 30-41, 2002.
    [31]
    {31} J. Li, J. F. Martinez, and M. C. Huang. The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors. In HPCA '04: Proceedings of the 10th International Symposium on High Performance Computer Architecture , page 14, Washington, DC, USA, 2004. IEEE Computer Society.
    [32]
    {32} N. Magen, A. Kolodny, U. Weiser, and N. Shamir. Interconnect Power Dissipation in a Microprocessor. In Proceedings of System Level Interconnect Prediction, February 2004.
    [33]
    {33} P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50-58, February 2002.
    [34]
    {34} M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, 2005.
    [35]
    {35} M. M. K. Martin, M. D. Hill, and D. A. Wood. Token Coherence: Decoupling Performance and Correctness. In Proceedings of ISCA-30, 2003.
    [36]
    {36} M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M. M. K. Martin, and D. A. Wood. Improving Multiple-CMP Systems Using Token Coherence. In HPCA, pages 328-339, 2005.
    [37]
    {37} M. L. Mui, K. Banerjee, and A. Mehrotra. A Global Interconnect Optimization Scheme for Nanometer Scale VLSI With Implications for Latency, Bandwidth, and Power Dissipation. IEEE Transactions on Electronic Devices, Vol. 51, No. 2, February 2004.
    [38]
    {38} S. Mukherjee, J. Emer, and S. Reinhardt. The Soft Error Problem: An Architectural Perspective. In Proceedings of HPCA-11 (Industrial Session), February 2005.
    [39]
    {39} N. Nelson, G. Briggs, M. Haurylau, G. Chen, H. Chen, D. Albonesi, E. Friedman, and P. Fauchet. Alleviating Thermal Constraints while Maintaining Performance Via Silicon-Based On-Chip Optical Interconnects. In Proceedings of Workshop on Unique Chips and Systems, March 2005.
    [40]
    {40} P. Stenström, M. Brorsson, and L. Sandberg. An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing. pages 109-118, May 1993.
    [41]
    {41} J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. Technical report, IBM Server Group Whitepaper, October 2001.
    [42]
    {42} H. S. Wang, L. S. Peh, and S. Malik. A Power Model for Routers: Modeling Alpha 21364 and Infi niBand Routers. In IEEE Micro, Vol. 24, No. 1, January 2003.
    [43]
    {43} S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of ISCA-22, pages 24-36, June 1995.

    Cited By

    View all
    • (2022)Nanoscale Electrically Driven Light Source Based on Hybrid Semiconductor/Metal NanoantennaThe Journal of Physical Chemistry Letters10.1021/acs.jpclett.2c0098613:20(4612-4620)Online publication date: 19-May-2022
    • (2021)Microprocessor Architecture and Design in Post Exascale Computing Era2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)10.1109/ICSP51882.2021.9408861(20-32)Online publication date: 9-Apr-2021
    • (2021)Breaking the von Neumann bottleneck: architecture-level processing-in-memory technologyScience China Information Sciences10.1007/s11432-020-3227-164:6Online publication date: 27-Apr-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 34, Issue 2
    May 2006
    383 pages
    ISSN:0163-5964
    DOI:10.1145/1150019
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture
      June 2006
      383 pages
      ISBN:076952608X

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 May 2006
    Published in SIGARCH Volume 34, Issue 2

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Nanoscale Electrically Driven Light Source Based on Hybrid Semiconductor/Metal NanoantennaThe Journal of Physical Chemistry Letters10.1021/acs.jpclett.2c0098613:20(4612-4620)Online publication date: 19-May-2022
    • (2021)Microprocessor Architecture and Design in Post Exascale Computing Era2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)10.1109/ICSP51882.2021.9408861(20-32)Online publication date: 9-Apr-2021
    • (2021)Breaking the von Neumann bottleneck: architecture-level processing-in-memory technologyScience China Information Sciences10.1007/s11432-020-3227-164:6Online publication date: 27-Apr-2021
    • (2017)An Energy-Efficient Directory Based Multicore Architecture with Wireless Routers to Minimize the Communication LatencyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.257128228:2(374-385)Online publication date: 1-Feb-2017
    • (2016)Scalability of Broadcast Performance in Wireless Network-on-ChipIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.253733227:12(3631-3645)Online publication date: 1-Dec-2016
    • (2014)Mutually Aware Prefetcher and On-Chip Network Designs for Multi-CoresIEEE Transactions on Computers10.1109/TC.2013.9963:9(2316-2329)Online publication date: 1-Sep-2014
    • (2014)Optical overlay NUCA: A high speed substrate for shared L2 caches2014 21st International Conference on High Performance Computing (HiPC)10.1109/HiPC.2014.7116711(1-10)Online publication date: Dec-2014
    • (2013)Automatic OpenCL work-group size selection for multicore CPUsProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618827(387-398)Online publication date: Oct-2013
    • (2013)Built‐in fast gather control network for efficient support of coherence protocolsIET Computers & Digital Techniques10.1049/iet-cdt.2012.00567:2(69-80)Online publication date: Mar-2013
    • (2013)Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessorsThe Journal of Supercomputing10.1007/s11227-012-0865-865:2(771-796)Online publication date: 1-Aug-2013
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media