Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1840845.1840929acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article

TurboTag: lookup filtering to reduce coherence directory power

Published: 18 August 2010 Publication History

Abstract

On-chip coherence directories of today's multi-core systems are not energy efficient. Coherence directories dissipate a significant fraction of their power on unnecessary lookups when running commercial server and scientific workloads. These workloads have large working sets that are beyond the reach of on-chip caches of modern processors. Limited to capturing a small part of the working set, private caches retain cache blocks only for a short period of time before replacing them with new blocks. Moreover, coherence enforcement is a known performance bottleneck of multi-threaded software, hence data-sharing in optimized high performance software is minimal. Consequently, the majority of the accesses to the coherence directory find no sharers in the directory because the data are not available in the on-chip private caches, effectively wasting power on the coherence checks. To improve energy-efficiency for future many-core systems, we propose TurboTag, a filtering mechanism to eliminate needless directory lookups. We analyze full-system traces of server and scientific workloads and find that over 69% of accesses to the directory find no sharers and can be entirely avoided. Taking advantage of this behavior, TurboTag achieves a 58% reduction in the directory's dynamic power consumption.

References

[1]
C.S. Ballapuram, A. Sharif, and H.S. Lee, "Exploiting Access Semantics and Program Behavior to Reduce Snoop Power in Chip Multiprocessors," ASPLOS XIII: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA: 2008.
[2]
L.A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," ISCA '00: Proceedings of the 27th International Symposium on Computer Architecture, New York, NY, USA: 2000.
[3]
B.H. Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," Communications of the ACM, vol. 13, 1970, pp. 422--426.
[4]
J.F. Cantin, M.H. Lipasti, and J.E. Smith, "Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking," ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, Washington, DC, USA: 2005.
[5]
S. Chinthamani and R. Iyer, "Design and Evaluation of Snoop Filters for Web Servers," Proceedings of the 2004 Symposium on Performance Evaluation of Computer Telecommunication Systems, San Jose, CA, USA: 2004.
[6]
M. Ekman, F. Dahlgren, and P. Stenström, "Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors," Proceedings of the First Workshop on Duplicating, Deconstructing, and Debunking, Anchorage, Alaska: 2002.
[7]
A. Gupta, W. Weber, and T. Mowry, "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes," ICPP '90: Proceedings of the 1990 International Conference on Parallel Processing, Urbana-Champaign, IL, USA: 1990.
[8]
N. Hardavellas, I. Pandis, R. Johnson, N.G. Mancheril, A. Ailamaki, and B. Falsafi, "Database Servers on Chip Multiprocessors: Limitations and Opportunities," Conference on Innovative Data Systems Research, CA, USA: 2007.
[9]
J. Kin, M. Gupta, and W.H. Mangione-Smith, "The Filter Cache: An Energy Efficient Memory Structure," MICRO 30: Proceedings of the 30th ACM/IEEE international symposium on Microarchitecture, Washington, DC, USA: 1997.
[10]
A. Moshovos, "RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence," CA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, Washington, DC, USA: 2005.
[11]
A. Moshovos, G. Memik, B. Falsafi, and A. Choudhary, "JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers," HPCA '01: Proceedings of the 7th International Symposium on High-Performance Computer Architecture, Washington, DC, USA: 2001.
[12]
J. Peir, S. Lai, S. Lu, J. Stark, and K. Lai, "Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching," ICSC '02: Proceedings of the 16th international conference on Supercomputing, New York, NY, USA: 2002.
[13]
S. Patel, S. Phillips, and A. Strong, "Sun's Next-Generation Multi-threaded Processor - Rainbow Falls," Hot Chips 21, Stanford, CA, USA: 2009.
[14]
V. Salapura, M. Blumrich, A. Gara, I.B. Thomas, and J.W. Research, "Improving the Accuracy of Snoop Filtering Using Stream Registers," MEDEA '07: Proceedings of the workshop on MEmory performance, New York, NY, USA: 2007.
[15]
D. Sanchez, L. Yen, M.D. Hill, and K. Sankaralingam, "Implementing Signatures for Transactional Memory," MICRO '07: Proceedings of the 40th IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA: 2007.
[16]
R. Singhal, "Inside Intel® Next Generation Nehalem Microarchitecture," Hot Chips 20, Stanford, CA, USA: 2008.
[17]
K. Strauss, X. Shen, and J. Torrellas, "Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors," ISCA '06: Proceedings of the 33rd international symposium on Computer Architecture, 2006.
[18]
S. Thoziyoor, N. Muralimanohar, J.H. Ahn, and N.P. Jouppi, "CACTI 5.1," 2008.
[19]
T.F. Wenisch, R.E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J.C. Hoe, "SimFlex: Statistical Sampling of Computer System Simulation," IEEE Micro, vol. 26, 2006, pp. 18--31
[20]
D.H. Woo, M. Ghosh, E. Ozer, S. Biles, and H.S. Lee, "Reducing Energy of Virtual Cache Synonym Lookup using Bloom Filters," CASES '06: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, New York, NY, USA: 2006.
[21]
J. Zebchuk, V. Srinivasan, M.K. Qureshi, and A. Moshovos, "A Tagless Coherence Directory," MICRO '09: Proceedings of the 42st IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA: 2009.

Cited By

View all

Index Terms

  1. TurboTag: lookup filtering to reduce coherence directory power

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISLPED '10: Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
    August 2010
    458 pages
    ISBN:9781450301466
    DOI:10.1145/1840845
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CAS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 August 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bloom
    2. coherence
    3. directory
    4. filter
    5. low power

    Qualifiers

    • Research-article

    Conference

    ISLPED'10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 398 of 1,159 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Coherency Traffic Reduction in Manycore Systems2022 25th Euromicro Conference on Digital System Design (DSD)10.1109/DSD57027.2022.00043(262-267)Online publication date: Aug-2022
    • (2018)Energy-efficient hybrid coherence protocol for multicore processorsCluster Computing10.5555/3287988.328800321:3(1521-1541)Online publication date: 1-Sep-2018
    • (2018)Energy-efficient hybrid coherence protocol for multicore processorsCluster Computing10.1007/s10586-018-1947-z21:3(1521-1541)Online publication date: 16-Feb-2018
    • (2017)ReDirectACM Transactions on Architecture and Code Optimization10.1145/316201514:4(1-23)Online publication date: 20-Dec-2017
    • (2017)Generating Fine-Grain Multithreaded Applications Using a Multigrain ApproachACM Transactions on Architecture and Code Optimization10.1145/315528814:4(1-26)Online publication date: 13-Dec-2017
    • (2017)ECSACM Transactions on Architecture and Code Optimization10.1145/315108314:4(1-29)Online publication date: 13-Dec-2017
    • (2017)Cooperative Multi-Agent Reinforcement Learning-Based Co-optimization of Cores, Caches, and On-chip NetworkACM Transactions on Architecture and Code Optimization10.1145/313217014:4(1-25)Online publication date: 14-Nov-2017
    • (2017)Near-Ideal Networks-on-Chip for Servers2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.16(277-288)Online publication date: Feb-2017
    • (2017)An adaptive cache coherence protocolJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.020102:C(163-174)Online publication date: 1-Apr-2017
    • (2016)Software Assisted Hardware Cache Coherence for Heterogeneous ProcessorsProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989092(279-288)Online publication date: 3-Oct-2016
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media