Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/605397.605401acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Temporally silent stores

Published: 01 October 2002 Publication History
  • Get Citation Alerts
  • Abstract

    Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition of silent stores to encompass sets of stores that change the value stored at a memory location, but only temporarily, and subsequently return a previous value of interest to the memory location. The stores that cause the value to revert are called temporally silent stores. We redefine multiprocessor sharing to account for temporal silence and show that in the limit, up to 45% of communication misses in scientific and commercial applications can be eliminated by exploiting values that change only temporarily. We describe a practical mechanism that detects temporally silent stores and removes the coherence traffic they cause in conventional multiprocessors. We find that up to 42% of communication misses can be eliminated with a simple extension to the MESI protocol. Further, we examine application and operating system code to provide insight into the temporal silence phenomenon and characterize temporal silence by examining value frequencies and dynamic instruction distances between temporally silent pairs. These studies indicate that the operating system is involved heavily in temporal silence, in both commercial and scientific workloads, and that while detectable synchronization primitives provide substantial contributions, significant opportunity exists outside these references.

    References

    [1]
    H. Akkary and M. A. Driscoll. A dynamic multithreading processor. In Proceedings of the 31st Annual International Symposium on Microarchitecture, pages 226-236, Dallas, TX, USA, 30 November-2 December 1998. ACM Press.
    [2]
    A. Alameldeen, C. Mauer, M. Xu, P. Harper, M. Martin, D. Sorin, M. Hill, and D. Wood. Evaluating non-deterministic multi-threaded commercial workloads. In Proceedings of Computer Architecture Evaluation using Commercial Workloads (CAECW-02), February 2002.
    [3]
    L. Barroso, K. Gharachorloo, and F. Bugnion. Memory system characterization of commercial workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998.
    [4]
    G. B. Bell, K. M. Lepak, and M. H. Lipasti. A characterization of silent stores. In Proceedings of PACT-2000, Philadelphia, PA, October 2000.
    [5]
    J. Borkenhagen and S. Storino. 5th Generation 64-bit Power-PC-Compatible Commercial Processor Design. IBM White-paper available from http://www.rs6000.ibm.com, 1999.
    [6]
    H. W. Cain, R. Rajwar, M. Marden, and M. H. Lipasti. An architectural characterization of java tpc-w. In Proc. of HPCA-7, January 2001.
    [7]
    M. Cintra and J. Torrellas. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors. In HPCA, 2002.
    [8]
    IBM Corporation. AIX v4.3 online documentation. http://nc-sp.upenn.edu/aix4.3html/, 2002.
    [9]
    D. Culler and J. P. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1999.
    [10]
    M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenström. The Detection and Elimination of Useless Misses in Multiprocessors. In 20th Annual International Symposium on Computer Architecture, May 1993.
    [11]
    J. R. Goodman and P. J. Woest. The wisconsin multicube: A new large-scale cache coherent multiprocessor. In Proceedings of the 15th Annual International Symposium on Computer Architecture, June 1988.
    [12]
    S. Kaxiras and J. R. Goodman. Improving CC-NUMA performance using instruction-based prediction. In Proceedings of HPCA-5, Orlando, January 1999.
    [13]
    T. Keller, A. M. Maynard, R. Simpson, and P. Bohrer. Simos-ppc full system simulator. http://www.cs.utexas.edu/users/cart/simOS.
    [14]
    G. Lauterbach and T. Horel. UltraSPARC-III: designing third generation 64-bit performance. IEEE Micro, 19(3):56-66, 1999.
    [15]
    K. M. Lepak, G. B. Bell, and M. H. Lipasti. Silent stores and store value locality. IEEE Transactions on Computers, 50(11), November 2001.
    [16]
    K. M. Lepak and M. H. Lipasti. On the value locality of store instructions. In Proceedings of ISCA-2000, Vancouver, B.C., Canada, June 2000.
    [17]
    K. M. Lepak and M. H. Lipasti. Silent stores for free. In Proceedings of MICRO-2000, Monterrey, CA, November 2000.
    [18]
    M. M. K. Martin, D. J. Sorin, A. Ailamaki, A. R. Alameldeen, R. M. Dickson, C. J. Mauer, K. E. Moore, M. Plakal, M. D. Hill, and D. A. Wood. Timestamp snooping: An approach for extending SMPs. ACM SIG-PLAN Notices, 35(11):25-36, November 2000.
    [19]
    C. Moore. POWER4 system microarchitecture. In Proceedings of the Microprocessor Forum, October 2000.
    [20]
    R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In MICRO-34, December 2001.
    [21]
    J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002.
    [22]
    S. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22th International Symposium on Computer Architecture, June 1995.

    Cited By

    View all
    • (2022)Tech Worker Perspectives on Considering the Interpersonal Implications of Communication TechnologiesProceedings of the ACM on Human-Computer Interaction10.1145/35675667:GROUP(1-22)Online publication date: 29-Dec-2022
    • (2022)"It's Just Like doing Meditation"Proceedings of the ACM on Human-Computer Interaction10.1145/35675647:GROUP(1-28)Online publication date: 29-Dec-2022
    • (2022)Integrating Real-Time and Non-Real-Time Collaborative ProgrammingProceedings of the ACM on Human-Computer Interaction10.1145/35675637:GROUP(1-19)Online publication date: 29-Dec-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
    October 2002
    318 pages
    ISBN:1581135742
    DOI:10.1145/605397
    • cover image ACM SIGOPS Operating Systems Review
      ACM SIGOPS Operating Systems Review  Volume 36, Issue 5
      December 2002
      296 pages
      ISSN:0163-5980
      DOI:10.1145/635508
      Issue’s Table of Contents
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 30, Issue 5
      Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
      December 2002
      296 pages
      ISSN:0163-5964
      DOI:10.1145/635506
      Issue’s Table of Contents
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 37, Issue 10
      October 2002
      296 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/605432
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ASPLOS02

    Acceptance Rates

    ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Tech Worker Perspectives on Considering the Interpersonal Implications of Communication TechnologiesProceedings of the ACM on Human-Computer Interaction10.1145/35675667:GROUP(1-22)Online publication date: 29-Dec-2022
    • (2022)"It's Just Like doing Meditation"Proceedings of the ACM on Human-Computer Interaction10.1145/35675647:GROUP(1-28)Online publication date: 29-Dec-2022
    • (2022)Integrating Real-Time and Non-Real-Time Collaborative ProgrammingProceedings of the ACM on Human-Computer Interaction10.1145/35675637:GROUP(1-19)Online publication date: 29-Dec-2022
    • (2022)Agency and AmplificationProceedings of the ACM on Human-Computer Interaction10.1145/35675527:GROUP(1-22)Online publication date: 29-Dec-2022
    • (2018)Static Prediction of Silent StoresACM Transactions on Architecture and Code Optimization10.1145/328084815:4(1-26)Online publication date: 16-Nov-2018
    • (2017)Detecting and mitigating data-dependent DRAM failures by exploiting current memory contentProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123945(27-40)Online publication date: 14-Oct-2017
    • (2014)Trash in cacheProceedings of the workshop on Memory Systems Performance and Correctness10.1145/2618128.2618133(1-9)Online publication date: 13-Jun-2014
    • (2012)Edge chasing delayed consistencyProceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability10.1145/2414729.2414733(15-24)Online publication date: 21-Oct-2012
    • (2012)XPoint cacheProceedings of the 21st international conference on Parallel architectures and compilation techniques10.1145/2370816.2370829(75-86)Online publication date: 19-Sep-2012
    • (2012)Supporting Overcommitted Virtual Machines through Hardware Spin DetectionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2011.14323:2(353-366)Online publication date: 1-Feb-2012
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media