research-article

Distributed cooperative caching

Authors:

José González, and

Ramon CanalAuthors Info & Claims

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

October 2008

Pages 134 - 143

https://doi.org/10.1145/1454115.1454136

Published: 25 October 2008 Publication History

Abstract

This paper presents the Distributed Cooperative Caching, a scalable and energy-efficient scheme to manage chip multiprocessor (CMP) cache resources. The proposed configuration is based in the Cooperative Caching framework [3] but it is intended for large scale CMPs. Both centralized and distributed configurations have the advantage of combining the benefits of private and shared caches. In our proposal, the Coherence Engine has been redesigned to allow its partitioning and thus, eliminate the size constraints imposed by the duplication of all tags. At the same time, a global replacement mechanism has been added to improve the usage of cache space. Our framework uses several Distributed Coherence Engines spread across all the nodes to improve scalability. The distribution permits a better balance of the network traffic over the entire chip avoiding bottlenecks and increasing performance for a 32-core CMP by 21% over a traditional shared memory configuration and by 57% over the Cooperative Caching scheme.

Furthermore, we have reduced the power consumption of the entire system by using a different tag allocation method and by reducing the number of tags compared on each request. For a 32-core CMP the Distributed Cooperative Caching framework provides an average improvement of the power/performance relation (MIPS³/W) of 3.66x over a traditional shared memory configuration and 4.30x over Cooperative Caching.

References

[1]

M. Acacio, J. Gonzalez, J. Garcia, and J. Duato. A new scalable directory architecture for large-scale multiprocessors. In HPCA '01: 7th International Symposium on High-Performance Computer Architecture, pages 97--106, January 2001.

Digital Library

[2]

B. Beckmann, M. Marty, and D. Wood. Asr: Adaptive selective replication for cmp caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006.

Digital Library

[3]

J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In ISCA '06: 33rd Annual International Symposium on Computer Architecture, pages 264--276, June 2006.

Digital Library

[4]

J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS '07: 21st Annual International Conference on Supercomputing, pages 242--252, June 2007.

Digital Library

[5]

Z. Chishti, M. Powell, and T. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In MICRO-36: 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 55--66, December 2003.

Digital Library

[6]

J. Davis, J. Laudon, and K. Olukotun. Maximizing cmp throughput with mediocre cores. In PACT '05: 14th International Conference on Parallel Architectures and Compilation Techniques, pages 51--62, September 2005.

Digital Library

[7]

J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core opteron processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 102--103, February 2007.

[8]

P. Dubey. A platform 2015 workload model: Recognition, mining and synthesis moves computers to the era of tera. Intel White Paper, Intel Corporation, 2005.

[9]

H. Dybdahl and P. Stenstrom. An adaptive shared/private nuca cache partitioning scheme for chip multiprocessors. In HPCA '07: 13th International Symposium on High Performance Computer Architecture, pages 2--12, February 2007.

Digital Library

[10]

J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A nuca substrate for flexible cmp cache sharing. In ICS '05: 19th Annual International Conference on Supercomputing, pages 31--40, June 2005.

Digital Library

[11]

J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff. Energy characterization of a tiled architecture processor with on-chip networks. In ISLPED '03: International symposium on Low power electronics and design, pages 424--427, August 2003.

Digital Library

[12]

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. In ISCA '90: 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990.

Digital Library

[13]

P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002.

Digital Library

[14]

M. Martin, M. Hill, and D. Wood. Token coherence: decoupling performance and correctness. In ISCA '03: 30th Annual International Symposium on Computer Architecture, pages 182--193, June 2003.

Digital Library

[15]

M. Martin, D. J. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005.

Digital Library

[16]

M. Monchiero, R. Canal, and A. Gonzalez. Power/performance/thermal design space exploration for multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 19(5):666--681, May 2008.

Digital Library

[17]

R. Mullins. Minimising dynamic power consumption in on-chip networks. International Symposium on System-on-Chip, pages 1--4, November 2006.

[18]

M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, December 2006.

Digital Library

[19]

N. Sakran, M. Yuffe, M. Mehalel, J. Doweck, E. Knoll, and A. Kovacs. The implementation of the 65nm dual-core 64b merom processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 106--590, February 2007.

[20]

K. Strauss, X. Shen, and J. Torrellas. Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors. In MICRO-40: 40th Annual IEEE/ACM International Symposium on Microarchitecture, December 2007.

Digital Library

[21]

D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0. Technical report, HP Labs Palo Alto, June 2006.

[22]

S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In ISSCC '07: IEEE International Solid-State Circuits Conference, February 2007.

[23]

H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO-35: 35th Annual IEEE/ACM International Symposium on Microarchitecture, pages 294--305, November 2002.

Digital Library

[24]

M. Zhang and K. Asanovic. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA '05: 32nd Annual International Symposium on Computer Architecture, pages 336--345, June 2005.

Digital Library

Cited By

Li HSun MXia FXu XBilal M(2024)A Survey of Edge Caching: Key Issues and ChallengesTsinghua Science and Technology10.26599/TST.2023.901005129:3(818-842)Online publication date: Jun-2024
https://doi.org/10.26599/TST.2023.9010051
Nedbailo Y(2019)Avoiding common scalability pitfalls in shared-cache chip multiprocessor design2019 International Conference on Engineering and Telecommunication (EnT)10.1109/EnT47717.2019.9030579(1-5)Online publication date: Nov-2019
https://doi.org/10.1109/EnT47717.2019.9030579
Puche JPetit SSahuquillo JGómez M(2019)FOS: a low-power cache organization for multicoresThe Journal of Supercomputing10.1007/s11227-019-02858-xOnline publication date: 24-Apr-2019
https://doi.org/10.1007/s11227-019-02858-x
Show More Cited By

Index Terms

Distributed cooperative caching
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors
ISCA '10

Next generation tiled microarchitectures are going to be limited by off-chip misses and by on-chip network usage. Furthermore, these platforms will run an heterogeneous mix of applications with very different memory needs, leading to significant ...
Read More
Cooperative Caching for Chip Multiprocessors

This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through ...
Read More
Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Next generation tiled microarchitectures are going to be limited by off-chip misses and by on-chip network usage. Furthermore, these platforms will run an heterogeneous mix of applications with very different memory needs, leading to significant ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

October 2008

328 pages

ISBN:9781605582825

DOI:10.1145/1454115

General Chair:
Andreas Moshovos
University of Toronto, Canada
,
Program Chairs:
David Tarditi
Microsoft, USA
,
Kunle Olukotun
Stanford University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '08

Sponsor:

PACT '08: International Conference on Parallel Architectures and Compilation Techniques

October 25 - 29, 2008

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Sponsor:
sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Long Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
737
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)5

Other Metrics

View Author Metrics

Citations

Cited By

Li HSun MXia FXu XBilal M(2024)A Survey of Edge Caching: Key Issues and ChallengesTsinghua Science and Technology10.26599/TST.2023.901005129:3(818-842)Online publication date: Jun-2024
https://doi.org/10.26599/TST.2023.9010051
Nedbailo Y(2019)Avoiding common scalability pitfalls in shared-cache chip multiprocessor design2019 International Conference on Engineering and Telecommunication (EnT)10.1109/EnT47717.2019.9030579(1-5)Online publication date: Nov-2019
https://doi.org/10.1109/EnT47717.2019.9030579
Puche JPetit SSahuquillo JGómez M(2019)FOS: a low-power cache organization for multicoresThe Journal of Supercomputing10.1007/s11227-019-02858-xOnline publication date: 24-Apr-2019
https://doi.org/10.1007/s11227-019-02858-x
Dublish SNagarajan VTopham N(2016)Cooperative Caching for GPUsACM Transactions on Architecture and Code Optimization10.1145/300158913:4(1-25)Online publication date: 12-Dec-2016
https://dl.acm.org/doi/10.1145/3001589
Mukkara ABeckmann NSanchez D(2016)WhirlpoolACM SIGARCH Computer Architecture News10.1145/2980024.287236344:2(113-127)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2980024.2872363
Mukkara ABeckmann NSanchez D(2016)WhirlpoolACM SIGOPS Operating Systems Review10.1145/2954680.287236350:2(113-127)Online publication date: 25-Mar-2016
https://doi.org/10.1145/2954680.2872363
Mukkara ABeckmann NSanchez D(2016)WhirlpoolACM SIGPLAN Notices10.1145/2954679.287236351:4(113-127)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2954679.2872363
Mukkara ABeckmann NSanchez DConte TZhou Y(2016)WhirlpoolProceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872363(113-127)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2872362.2872363
Li GTemam OLiu ZGuo SWang D(2015)Cluster Cache MonitorInternational Journal of Parallel Programming10.1007/s10766-014-0339-043:6(1054-1077)Online publication date: 1-Dec-2015
https://dl.acm.org/doi/10.1007/s10766-014-0339-0
Wang JLo EYiu MTong JWang GLiu X(2014)Cache Design of SSD-Based Search Engine ArchitecturesACM Transactions on Information Systems (TOIS)10.1145/266162932:4(1-26)Online publication date: 28-Oct-2014
https://dl.acm.org/doi/10.1145/2661629
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents