Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1995896.1995941acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

A composite and scalable cache coherence protocol for large scale CMPs

Published: 31 May 2011 Publication History

Abstract

The number of on-chip cores of modern chip multiprocessors (CMPs) is growing fast with technology scaling. However, it remains a big challenge to efficiently support cache coherence for large scale CMPs. The conventional snoopy and directory coherence protocols cannot be smoothly scaled to many-core or thousand-core processors. Snoopy protocols introduce large power overhead due to enormous amount of cache tag probing triggered by broadcast. Directory protocols introduce performance penalty due to indirection, and large storage overhead due to storing directories.
This paper addresses the efficiency problem when supporting cache coherency for large-scale CMPs. By leveraging emerging optical on-chip interconnect (OP-I) technology to provide high bandwidth density, low propagation delay and natural support for multicast/broadcast in a hierarchical network organization, we propose a composite cache coherence (C3) protocol that benefits from direct cache-to-cache accesses as in snoopy protocol and small amount of cache probing as in directory protocol. Targeting at quickly completing coherence transactions, C3 organizes accesses in a three-tier hierarchy by combining a mix of designs including local broadcast prediction, filtering, and a coarse-grained directory. Compared to directory-based protocol[18], our evaluations on a thousand-core CMP show that C3 improves performance by 21%, reduces network latency of coherence messages by 41% and saves network energy consumption by 5.5% on average for PARSEC applications.

References

[1]
M. E. Acacio, et. al., "Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture," In SC, 2002.
[2]
M. E. Acacio, et. al., "The use of Prediction for Accelerating Upgrade Misses in a cc-NUMA Multiprocessors,"In PACT, pp. 155--164, 2002.
[3]
A. Agarwal, et. al., "An Evaluation of Directory Schemes for Cache Coherence,"In ISCA, pp.353--362, 1988.
[4]
N. Agarwal, et. al., "In-Network Coherence Filtering: Snoopy Coherence without Broadcasts,"In MICRO, 2009.
[5]
N. Agarwal, et. al., "In-Network Snoop Ordering: Snoopy Coherence on Unordered Interconnects,"In HPCA, 2009.
[6]
J. Balfour and W. J. Dally, "Design tradeoffs for tiled cmp onchip networks,"In ICS, pp.187--198, 2006.
[7]
C. Batten and et. al., "Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics,"In High Performance Interconnects, pp.21--30, 2008.
[8]
S. Beamer, et. al., "Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics,"In ISCA, pp.117--128, 2010.
[9]
C. Bienia, et. al., "The parsec benchmark suite: Characterization and architectural implications,"In PACT, pp.72--81,2008.
[10]
B. Black, et. al., "Die stacking (3d) microarchitecture,"In MICRO pp. 469--479, 2006.
[11]
S. Borkar, "Thousand core chips - a technology perspective,"In DAC, pp.746--749, 2007.
[12]
CACTI, http://www.hpl.hp.com/research/cacti/
[13]
L. M. Censier and P. Feautrier," A New Solution to Coherence Problems in Multicache Systems,"In IEEE Trans. on Computers, pp. 1112--1118, 1978.
[14]
M. J. Cianchetti, et. al., "Phastlane: A Rapid Transit Optical Routing Network,"In ISCA, pp.441--450, 2009.
[15]
S. Chaudhry, et. al., "Rock: A High-Performance Sparc CMT Processor,"In IEEE Micro, 29(2):6--16, 2009.
[16]
W. J. Dally and B. Towles, "Principles and practices of Interconnection Networks,"Morgan Kaufmann, 2004.
[17]
N. Eisley, et. al., "In-network cache coherence,"In MICRO, pp. 321--332, 2006.
[18]
A. Gupta, et. al., "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes,"In ICPP, pp. 312--321, 1990.
[19]
J.-H. Ha and T. M. Pinkston," A Hybrid Cache Coherence Protocol for a Decoupled Multi-Channel Optical Network: SPEED DMON, "In ICPP, pp.164--171, 1996.
[20]
L. Hammond, et. al., ";A single-chip multiprocessor,"In IEEE Computer, 30(9):79--85, 1997.
[21]
Semiconductor Industry Association,"International Technology Roadmap for Semiconductors,"http://www.itrs.net/Links/2009ITRS/Home2009.htm, 2009.
[22]
A. Jaleel, et. al., "High performance cache replacement using re-reference interval prediction (RRIP),"In &ISCA, pp.60--71, 2010.
[23]
N. Enright-Jerger, et. al.," Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support,"In ISCA, 2008.
[24]
N. Enright-Jerger, et. al., "Virtual Tree Coherence: Leveraging Regions and In-Network Multicast Trees for Scalable Cache Coherence,"In MICRO, 2008.
[25]
A. Joshi, et. al., "Silicon-Photonic Clos Networks for Global On-Chip Communication,"In &NOCS, 2009.
[26]
J. H. Kelm, et. al., "WAYPOINT: scaling coherence to 1000-core architectures,"In PACT, pp. 99--109, 2010.
[27]
T. Kgil, et. al., "Picoserver:Using 3d stacking technology to enable a compact energy efficient chip multiprocessor,"In ASPLOS, pp. 117--128, 2006.
[28]
J. Kim, et. al., "Flattened butterfly topology for on-chip networks,"In MICRO, 2007.
[29]
N. Kirman, et. al., "Leveraging optical technology in future bus-based chip multiprocessors,"In MICRO, pp. 492--503, 2006.
[30]
N. Kirman and J. Martinez," An efficient all-optical on-chip interconnect based on oblivious routing,"In ASPLOS, 2010.
[31]
D. Lenoski, et. al.," Design and Scalble Shared-Memory Multiprocessors: The DASH Approach,"In COMPCON, pp. 62--67, 1990.
[32]
Z. Li, et. al.,"Spectrum: A Hybrid Nanophotonic-Electric On-Chip Network,"In DAC, pp. 575--580, 2009.
[33]
M. M. K. Martin, et. al., "Bandwidth Adaptive Routing,"In HPCA, 2002.
[34]
M. M. K. Martin, et. al., "Token Coherence: Decoupling Performance and Correctness,"In ISCA, 2003.
[35]
M. Marty, et. al., "Improving multiple-cmp systems using token coherence,"In HPCA, 2005.
[36]
D. Miller, "Rationale and Challenges for Optical Interconnects to Electronic Chips,"In Proceedings of the IEEE, 88(6):728--749, 2000.
[37]
G. Kurian, et. al.,"ATAC: A 1000-core cache-coherent processor with on-chip optical network,"In PACT, pp.447--488, 2010.
[38]
A. Moshovos, et. al., "JETTY: Filtering snoops for reduced energy consumption in SMP servers,"In HPCA, pp.85--96, 2001.
[39]
"Noxim, An Open Network-on-Chip Simulator,"http://noxim.sourceforge.net
[40]
nVidia, "Quadro fx 3700m," http://www.nvidia.com/object/product_quadro_fx_3700_m_us.html.
[41]
K. Olukotun, et. al., "The case for a single-chip multiprocessor," In ASPLOS, pp. 2--11, 1996.
[42]
Y. Pan, et. al.," Firefly: Illuminating Future Network-on-Chip with Nanophotonics,"Int. Symp. on Computer Architecture, ISCA';09, pp. 429--440, 2009.
[43]
Y. Pan In Int. Symp. on High-Performance Computer Architecture (HPCA),2010.
[44]
PTLsim.http://www.ptlsim.org/
[45]
PTM interconnect model.http://www.eas.asu.edu/~ptm/ interconnect.html
[46]
K. Strauss, et. al., "Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors,"In Int. Symp. on Micorarchitecture, pp. 327--342, 2007.
[47]
A. N. Udipi, et. al.,"Towards Scalable, Energy-Efficient Bus-Based On-Chip Networks,"In Int. Symp. on High-Performance Computer Architecture (HPCA), pp. 1--12, 2010.
[48]
S. Vangal, et. al., "An 80-tile 1.28tflops network-on-chip in 65nm cmos,"In IEEE Int. Solid-State Circuits Conf., pp. 98--590, 2007.
[49]
D. Vantrease, et. al., "Corona: System implications of emerging nanophotonic technology,"In Int. Symp. on Computer Architecture, pp.153--164, 2008.
[50]
D. Vantrease, et. al., "Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cachec Coherence Protocols,"In Int. Symp. on High-Performance Computer Architecture (HPCA), 2011.
[51]
J. Xue, et. al.,"An Intra-Chip Free-Space Opitcal Interconnect," Int. Symp. on Computer Architecture, ISCA2010.
[52]
J. Zebchuk, et. al., "A Tagless Coherence Directory," In Int. Symp. on Microarchitecture, MICRO, pp. 423--434, 2009.

Cited By

View all
  • (2024)Towards Efficient On-Chip Communication: A Survey on Silicon Nanophotonics and Optical Networks-on-ChipJournal of Systems Architecture10.1016/j.sysarc.2024.103171152(103171)Online publication date: Jul-2024
  • (2022)Accelerating Cache Coherence in Manycore Processor through Silicon Photonic ChipletProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549338(1-9)Online publication date: 30-Oct-2022
  • (2020)CAMON: Low-Cost Silicon Photonic Chiplet for Manycore ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.292649539:9(1820-1833)Online publication date: Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '11: Proceedings of the international conference on Supercomputing
May 2011
398 pages
ISBN:9781450301022
DOI:10.1145/1995896
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache coherence protocol
  2. cmp
  3. nanophotonics
  4. optical network
  5. thousand-core

Qualifiers

  • Research-article

Conference

ICS '11
Sponsor:
ICS '11: International Conference on Supercomputing
May 31 - June 4, 2011
Arizona, Tucson, USA

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Efficient On-Chip Communication: A Survey on Silicon Nanophotonics and Optical Networks-on-ChipJournal of Systems Architecture10.1016/j.sysarc.2024.103171152(103171)Online publication date: Jul-2024
  • (2022)Accelerating Cache Coherence in Manycore Processor through Silicon Photonic ChipletProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549338(1-9)Online publication date: 30-Oct-2022
  • (2020)CAMON: Low-Cost Silicon Photonic Chiplet for Manycore ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.292649539:9(1820-1833)Online publication date: Sep-2020
  • (2019)A Survey of On-Chip Optical InterconnectsACM Computing Surveys10.1145/326793451:6(1-34)Online publication date: 28-Jan-2019
  • (2019)Efficient Heap Data Management on Software Managed Manycore Architectures2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID)10.1109/VLSID.2019.00065(269-274)Online publication date: Jan-2019
  • (2018)A Process-Variation-Tolerant Method for Nanophotonic On-Chip NetworkACM Journal on Emerging Technologies in Computing Systems10.1145/320807314:2(1-23)Online publication date: 11-Jul-2018
  • (2018)Scalable Path-Setup Scheme for All-Optical Dynamic Circuit Switched NoCs in Cache Coherent CMPsACM Journal on Emerging Technologies in Computing Systems10.1145/315484014:1(1-27)Online publication date: 8-Mar-2018
  • (2017)A Method for Fast Evaluation of Sharing Set Management Strategies in Cache Coherence ProtocolsArchitecture of Computing Systems - ARCS 201710.1007/978-3-319-54999-6_9(111-123)Online publication date: 4-Mar-2017
  • (2016)Software Coherence Management on Non-coherent Cache Multi-coresProceedings of the 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID)10.1109/VLSID.2016.70(397-402)Online publication date: 4-Jan-2016
  • (2016)Efficient pointer management of stack data for software managed multicores2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP.2016.7760774(67-74)Online publication date: Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media