Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO.2006.27acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

In-Network Cache Coherence

Published: 09 December 2006 Publication History

Abstract

With the trend towards increasing number of processor cores in future chip architectures, scalable directory-based protocols for maintaining cache coherence will be needed. However, directory-based protocols face well-known problems in delay and scalability. Most current protocol optimizations targeting these problems maintain a firm abstraction of the interconnection network fabric as a communication medium: protocol optimizations consist of endto- end messages between requestor, directory and sharer nodes, while network optimizations separately target lowering communication latency for coherence messages. In this paper, we propose an implementation of the cache coherence protocol within the network, embedding directories within each router node that manage and steer requests towards nearby data copies, enabling in-transit optimization of memory access delay. Simulation results across a range of SPLASH-2 benchmarks demonstrate significant performance improvement and good system scalability, with up to 44.5% and 56% savings in average memory access latency for 16 and 64-node systems, respectively, when compared against the baseline directory cache coherence protocol. Detailed microarchitecture and implementation characterization affirms the low area and delay impact of in-network coherence.

References

[1]
{1} http://www-128.ibm.com/developerworks/power/library/pa- expert1.html.
[2]
{2} http://www.intel.com/multi-core/.
[3]
{3} http://www.sun.com/processors/throughput/.
[4]
{4} "International technology roadmap for semiconductors," http://public.itrs.net.
[5]
{5} J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003.
[6]
{6} D. L. Dill, "The mur¿ verification system." in Proc. 8th Int. Conf. Comp. Aided Verif., Aug. 1996, pp. 390-393.
[7]
{7} W. Dally and B. Towles, Principles and Practices of Interconnection Networks. San Francisco, CA: Morgan Kaufmann Publishers, 2003.
[8]
{8} L. Lamport, "How to make a multiprocessor computer that correctly executes multiprocess programs," IEEE Trans. on Comp., vol. c-28, no. 9, pp. 690-691, Sept. 1979.
[9]
{9} http://www-flash.stanford.edu/apps/SPLASH/.
[10]
{10} K. P. Lawton, "Bochs: A portable pc emulator for unix/x," Linux J., vol. 1996, no. 29es, p. 7, 1996.
[11]
{11} M. Zhang and K. Asanovic, "Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors," in Proc. 32nd Int. Symp. Comp. Arch., Jun. 2005, pp. 336-345.
[12]
{12} S. Mukherjee, et al., "The Alpha 21364 network architecture," in Proc. Hot Interconnects 9, Aug. 2001.
[13]
{13} S. J. Wilton and N. P. Jouppi, "An enhanced access and cycle time model for on-chip caches," DEC Western Research Laboratory, Tech. Rep. 93/5, 1994.
[14]
{14} M. B. Taylor et al., "The RAW microprocessor: A computational fabric for software circuits and general-purpose programs," IEEEMICRO , vol. 22, no. 2, pp. 25-35, Mar./Apr. 2002.
[15]
{15} A. Agarwal et al., "An evaluation of directory schemes for cache coherence," in Proc. 15th Int. Symp. Comp. Arch., Jun. 1988, pp. 280-289.
[16]
{16} S. Gjessing, et al., "The SCI cache coherence protocol," Kluwer Academic Publishers, 1992.
[17]
{17} S. Kaxiras and J. R. Goodman, "The glow cache coherence protocol extensions for widely shared data," in Proc. 10th int. conf. Supercomputing , May 1996, pp. 35-43.
[18]
{18} L.-S. Peh and W. J. Dally, "A delay model and speculative architecture for pipelined routers," in Proc. 7th Int. Symp. High Perf. Comp. Arch., Jan. 2001, pp. 255-266.
[19]
{19} R. Stets, et al., "The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing," in Proc. 6th Int. Symp. High Perf. Comp. Arch., Feb. 2000, pp. 265-276.
[20]
{20} D. Dai and D. K. Panda, "Exploiting the benefits of multiple-path network in DSM systems: Architectural alternatives and performance evaluation," IEEE Trans. Comp., vol. 48, no. 2, pp. 236-244, 1999.
[21]
{21} D. Dai and D. Panda, "Reducing cache invalidation overheads in wormhole routed DSMs using multidestination message passing," in Proc. 1996 Int. Conf. Par. Processing, Aug. 1996, pp. 138-145.
[22]
{22} E. E. Bilir, et al., "Multicast snooping: a new coherence method using a multicast address network," in Proc. 26th Int. Symp. Comp. Arch., Jun. 1999, pp. 294-304.
[23]
{23} L. Barroso et al., "Piranha: A scalable architecture based on single-chip multiprocessing," in Proc. 27th Int. Symp. Comp. Arch., Jun. 2000, pp. 282-293.
[24]
{24} S. V. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," IEEE Computer, vol. 29, no. 12, pp. 66-76, 1996.
[25]
{25} D. Lenoski, et al., "The DASH prototype: implementation and performance," SIGARCH Comp. Arch. News, vol. 20, no. 2, pp. 92-103, 1992.
[26]
{26} J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA highly scalable server," in Proc. 24th Int. Symp. Comp. Arch., Jun. 1997, pp. 241-251.
[27]
{27} X. Shen, Arvind, and L. Rudolph, "CACHET: an adaptive cache coherence protocol for distributed shared-memory systems," in Proc. 13th Int. Conf. Supercomputing, Jun. 1999, pp. 135-144.
[28]
{28} J. Huh, et al., "Speculative incoherent cache protocols," IEEE Micro, vol. 24, no. 6, Nov./Dec. 2004.
[29]
{29} M. M. K. Martin, M. D. Hill, and D. A. Wood, "Token coherence: Decoupling performance and correctness," in Proc. 30th Int. Symp. Comp. Arch., Jun. 2003, pp. 182-193.
[30]
{30} D. Chaiken, J. Kubiatowicz, and A. Agarwal, "Limitless directories: A scalable cache coherence scheme," in Proc. 4th Int. Conf. on Arch. Support for Prog. Lang. and Op. Sys., Jun. 1991, pp. 224-234.
[31]
{31} M. E. Acacio, et al., "A new scalable directory architecture for large-scale multiprocessors," in Proc. 7th Int. Symp. High Perf. Comp. Arch., Jan. 2001, pp. 97-106.
[32]
{32} H. Nilsson and P. Stenström, "The Scalable Tree Protocol - A Cache Coherence Approach for Large-Scale Multiprocessors," in Proc. 4th IEEE Symp. Par. and Dist. Processing, Dec. 1992, pp. 498-506.
[33]
{33} Y.-C. Maa, D. K. Pradhan, and D. Thiebaut, "Two economical directory schemes for large-scale cache coherent multiprocessors," SIGARCH Comp. Arch. News, vol. 19, no. 5, p. 10, 1991.
[34]
{34} L. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor," in IEEE Trans. Comp., July 1995, pp. 878- 890.
[35]
{35} L. Cheng, et al., "Interconnect-aware coherence protocols," in Proc. 33rd Int. Symp. Comp. Arch., Jun. 2006, pp. 339-351.
[36]
{36} H. E. Mizrahi, et al., "Introducing memory into the switch elements of multiprocessor interconnection networks," in Proc. 16th Int. Symp. Comp. Arch., Jun. 1989, pp. 158-166.
[37]
{37} J. R. Goodman and P. J. Woest, "The wisconsin multicube: a new large-scale cache-coherent multiprocessor," in Proc. 15th Int. Symp. Comp. Arch., Jun. 1988, pp. 422-431.

Cited By

View all
  • (2023)Transient-Execution Attacks: A Computer Architect PerspectiveACM Computing Surveys10.1145/360361956:3(1-38)Online publication date: 6-Oct-2023
  • (2019)Software-Defined Multimedia Streaming System Aided By Variable-Length Interval In-Network CachingIEEE Transactions on Multimedia10.1109/TMM.2018.286234921:2(494-509)Online publication date: 1-Feb-2019
  • (2016)LDACACM Transactions on Architecture and Code Optimization10.1145/298363213:4(1-28)Online publication date: 15-Nov-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
December 2006
493 pages
ISBN:0769527329

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 09 December 2006

Check for updates

Qualifiers

  • Article

Conference

Micro-39
Sponsor:

Acceptance Rates

MICRO 39 Paper Acceptance Rate 42 of 174 submissions, 24%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Transient-Execution Attacks: A Computer Architect PerspectiveACM Computing Surveys10.1145/360361956:3(1-38)Online publication date: 6-Oct-2023
  • (2019)Software-Defined Multimedia Streaming System Aided By Variable-Length Interval In-Network CachingIEEE Transactions on Multimedia10.1109/TMM.2018.286234921:2(494-509)Online publication date: 1-Feb-2019
  • (2016)LDACACM Transactions on Architecture and Code Optimization10.1145/298363213:4(1-28)Online publication date: 15-Nov-2016
  • (2014)Leveraging on-chip networks for efficient prediction on multicore coherenceProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616825(1-4)Online publication date: 24-Mar-2014
  • (2014)FGPCProceedings of the 17th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems10.1145/2641798.2641837(295-302)Online publication date: 21-Sep-2014
  • (2014)Integrated Coherence PredictionACM Transactions on Design Automation of Electronic Systems10.1145/261175619:3(1-22)Online publication date: 23-Jun-2014
  • (2012)An optimized multicore cache coherence design for exploiting communication localityProceedings of the great lakes symposium on VLSI10.1145/2206781.2206797(59-62)Online publication date: 3-May-2012
  • (2012)Predicting Coherence Communication by Tracking Synchronization Points at Run TimeProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.40(351-362)Online publication date: 1-Dec-2012
  • (2011)Inferring packet dependencies to improve trace based simulation of on-chip networksProceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip10.1145/1999946.1999971(153-160)Online publication date: 1-May-2011
  • (2011)A composite and scalable cache coherence protocol for large scale CMPsProceedings of the international conference on Supercomputing10.1145/1995896.1995941(285-294)Online publication date: 31-May-2011
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media