Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1454115.1454136acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Distributed cooperative caching

Published: 25 October 2008 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents the Distributed Cooperative Caching, a scalable and energy-efficient scheme to manage chip multiprocessor (CMP) cache resources. The proposed configuration is based in the Cooperative Caching framework [3] but it is intended for large scale CMPs. Both centralized and distributed configurations have the advantage of combining the benefits of private and shared caches. In our proposal, the Coherence Engine has been redesigned to allow its partitioning and thus, eliminate the size constraints imposed by the duplication of all tags. At the same time, a global replacement mechanism has been added to improve the usage of cache space. Our framework uses several Distributed Coherence Engines spread across all the nodes to improve scalability. The distribution permits a better balance of the network traffic over the entire chip avoiding bottlenecks and increasing performance for a 32-core CMP by 21% over a traditional shared memory configuration and by 57% over the Cooperative Caching scheme.
    Furthermore, we have reduced the power consumption of the entire system by using a different tag allocation method and by reducing the number of tags compared on each request. For a 32-core CMP the Distributed Cooperative Caching framework provides an average improvement of the power/performance relation (MIPS3/W) of 3.66x over a traditional shared memory configuration and 4.30x over Cooperative Caching.

    References

    [1]
    M. Acacio, J. Gonzalez, J. Garcia, and J. Duato. A new scalable directory architecture for large-scale multiprocessors. In HPCA '01: 7th International Symposium on High-Performance Computer Architecture, pages 97--106, January 2001.
    [2]
    B. Beckmann, M. Marty, and D. Wood. Asr: Adaptive selective replication for cmp caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006.
    [3]
    J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In ISCA '06: 33rd Annual International Symposium on Computer Architecture, pages 264--276, June 2006.
    [4]
    J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS '07: 21st Annual International Conference on Supercomputing, pages 242--252, June 2007.
    [5]
    Z. Chishti, M. Powell, and T. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In MICRO-36: 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 55--66, December 2003.
    [6]
    J. Davis, J. Laudon, and K. Olukotun. Maximizing cmp throughput with mediocre cores. In PACT '05: 14th International Conference on Parallel Architectures and Compilation Techniques, pages 51--62, September 2005.
    [7]
    J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core opteron processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 102--103, February 2007.
    [8]
    P. Dubey. A platform 2015 workload model: Recognition, mining and synthesis moves computers to the era of tera. Intel White Paper, Intel Corporation, 2005.
    [9]
    H. Dybdahl and P. Stenstrom. An adaptive shared/private nuca cache partitioning scheme for chip multiprocessors. In HPCA '07: 13th International Symposium on High Performance Computer Architecture, pages 2--12, February 2007.
    [10]
    J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A nuca substrate for flexible cmp cache sharing. In ICS '05: 19th Annual International Conference on Supercomputing, pages 31--40, June 2005.
    [11]
    J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff. Energy characterization of a tiled architecture processor with on-chip networks. In ISLPED '03: International symposium on Low power electronics and design, pages 424--427, August 2003.
    [12]
    D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. In ISCA '90: 17th Annual International Symposium on Computer Architecture, pages 148--159, May 1990.
    [13]
    P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002.
    [14]
    M. Martin, M. Hill, and D. Wood. Token coherence: decoupling performance and correctness. In ISCA '03: 30th Annual International Symposium on Computer Architecture, pages 182--193, June 2003.
    [15]
    M. Martin, D. J. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005.
    [16]
    M. Monchiero, R. Canal, and A. Gonzalez. Power/performance/thermal design space exploration for multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 19(5):666--681, May 2008.
    [17]
    R. Mullins. Minimising dynamic power consumption in on-chip networks. International Symposium on System-on-Chip, pages 1--4, November 2006.
    [18]
    M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39: 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, December 2006.
    [19]
    N. Sakran, M. Yuffe, M. Mehalel, J. Doweck, E. Knoll, and A. Kovacs. The implementation of the 65nm dual-core 64b merom processor. In ISSCC '07: IEEE International Solid-State Circuits Conference, pages 106--590, February 2007.
    [20]
    K. Strauss, X. Shen, and J. Torrellas. Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors. In MICRO-40: 40th Annual IEEE/ACM International Symposium on Microarchitecture, December 2007.
    [21]
    D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0. Technical report, HP Labs Palo Alto, June 2006.
    [22]
    S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In ISSCC '07: IEEE International Solid-State Circuits Conference, February 2007.
    [23]
    H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO-35: 35th Annual IEEE/ACM International Symposium on Microarchitecture, pages 294--305, November 2002.
    [24]
    M. Zhang and K. Asanovic. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA '05: 32nd Annual International Symposium on Computer Architecture, pages 336--345, June 2005.

    Cited By

    View all
    • (2024)A Survey of Edge Caching: Key Issues and ChallengesTsinghua Science and Technology10.26599/TST.2023.901005129:3(818-842)Online publication date: Jun-2024
    • (2019)Avoiding common scalability pitfalls in shared-cache chip multiprocessor design2019 International Conference on Engineering and Telecommunication (EnT)10.1109/EnT47717.2019.9030579(1-5)Online publication date: Nov-2019
    • (2019)FOS: a low-power cache organization for multicoresThe Journal of Supercomputing10.1007/s11227-019-02858-xOnline publication date: 24-Apr-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
    October 2008
    328 pages
    ISBN:9781605582825
    DOI:10.1145/1454115
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. chip multiprocessors
    2. distributed cooperative caching
    3. energy efficiency
    4. memory hierarchy

    Qualifiers

    • Research-article

    Conference

    PACT '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Upcoming Conference

    PACT '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)5

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Survey of Edge Caching: Key Issues and ChallengesTsinghua Science and Technology10.26599/TST.2023.901005129:3(818-842)Online publication date: Jun-2024
    • (2019)Avoiding common scalability pitfalls in shared-cache chip multiprocessor design2019 International Conference on Engineering and Telecommunication (EnT)10.1109/EnT47717.2019.9030579(1-5)Online publication date: Nov-2019
    • (2019)FOS: a low-power cache organization for multicoresThe Journal of Supercomputing10.1007/s11227-019-02858-xOnline publication date: 24-Apr-2019
    • (2016)Cooperative Caching for GPUsACM Transactions on Architecture and Code Optimization10.1145/300158913:4(1-25)Online publication date: 12-Dec-2016
    • (2016)WhirlpoolACM SIGARCH Computer Architecture News10.1145/2980024.287236344:2(113-127)Online publication date: 25-Mar-2016
    • (2016)WhirlpoolACM SIGOPS Operating Systems Review10.1145/2954680.287236350:2(113-127)Online publication date: 25-Mar-2016
    • (2016)WhirlpoolACM SIGPLAN Notices10.1145/2954679.287236351:4(113-127)Online publication date: 25-Mar-2016
    • (2016)WhirlpoolProceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872363(113-127)Online publication date: 25-Mar-2016
    • (2015)Cluster Cache MonitorInternational Journal of Parallel Programming10.1007/s10766-014-0339-043:6(1054-1077)Online publication date: 1-Dec-2015
    • (2014)Cache Design of SSD-Based Search Engine ArchitecturesACM Transactions on Information Systems (TOIS)10.1145/266162932:4(1-26)Online publication date: 28-Oct-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media