Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2451116.2451172acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Published: 16 March 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Program optimizations based on data dependences may not preserve the memory consistency in the programs. Previous works leverage a hardware ATOMICITY primitive to restrict the thread interleaving for preserving sequential consistency in region optimizations. However, ATOMICITY primitive is over restrictive on the thread interleaving for optimizing real-world applications developed with the popular Total-Store-Ordering (TSO) memory consistency, which is weaker than sequential consistency. In this paper, we present a novel hardware TSO_ATOMICITY primitive, which has less restriction on the thread interleaving than ATOMICITY primitive to permit more efficient program execution than ATOMICITY primitive, but can still preserve TSO memory consistency in all region optimizations. Furthermore, TSO_ATOMICITY primitive requires similar architecture support as ATOMICITY primitive and can be implemented with only slight change to the existing ATOMICITY primitive implementation. Our experimental results show that in a start-of-art dynamic binary optimization system on a large set of workloads, ATOMICITY primitive can only improve the performance by 4% on average. TSO_ATOMICITY primitive can reduce the overhead associated with ATOMICITY primitive and improve the performance by 12% on average.

    References

    [1]
    R. Agarwal, J. Torrellas, "FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes", ISCA 2011
    [2]
    W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J. Lee, X. Fang, S. Midkiff, S and D. Wong, "BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support", MICRO 2009.
    [3]
    C. S. Ananian and K. Asanovic and B. C. Kuszmaul, C. E. Leiserson and S. Lie, "Unbounded Transactional Memory", HPCA 2005.
    [4]
    V. Bala, E. Duesterwald and S. Banerjia, "Dynamo: A transparent runtime optimization system", PLDI 2000.
    [5]
    L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skalesky, Y. Wang and Y. Zemach, "IA-32 Execution Layer: A Two Phase Dynamic Translator Designed to Support IA-32 Applications on Itaniumâ-based Systems", MICRO 2003.
    [6]
    H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil and P. O'Neil, "A Critique of ANSI SQL Isolation Levels", Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data.
    [7]
    C. Blundell, E. C. Levwis and M. K. Martin, "Deconstructing Transactions: The Subtleties of ATOMICITY", WDDD 2005.
    [8]
    C. Blundell, M. M. Martin and T. F. Wenisch, "InvisiFence: performance-transparent memory ordering in conventional multiprocessors". ISCA 2009.
    [9]
    H. Boehm and S. V. Adve, "Foundations of the C++ concurrency memory model", PLDI 2008.
    [10]
    E. Borin, Y. Wu, M. Breternitz Jr., C. Wang, "LAR-CC: Large atomic regions with conditional commits," CGO 2011.
    [11]
    E. Borin, Y. Wu, C. Wang, W. Liu, M. -Breternitz, S. Hu, E. Natanzon, S. Rotem and R. Rosner, "TAO: two-level atomicity for dynamic binary optimizations", CGO 2010.
    [12]
    D. Bruening, T. Garnett and S. Amarasinghe, "An infrastructure for adaptive Dynamic Optimization", CGO 2003.
    [13]
    S. Burckhardt, M. Musuvathi and V. Singh. "Verifying local transformations on relaxed memory models", CC 2010.
    [14]
    L. Ceze, J. Tuck, P. Montesinos and J. Torrellas, "BulkSC: bulk enforcement of sequential consistency", ISCA 2007.
    [15]
    K. Ebcioglu, E. R. Altman, "DAISY: dynamic compilation for 100% architectural compatibility", ISCA 1997.
    [16]
    G. Gao, V. Sarkar, "Location Consistency-A New Memory Model and Cache Consistency Protocol", IEEE Trans. Computers, 2000.
    [17]
    K. Gharachorloo, A. Gupta and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models", ICPP 1991.
    [18]
    K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors", ISCA 1990.
    [19]
    L. Hammond, V. Wong, M. Chen. B.D. Carlstrom, J.D. Davis, B. Hertzberg, M.K. Prabhu, H. Wijaya; C. Kozyrakis and K. Olukotun, "Transactional Memory Coherence and Consistency", ISCA 2004.
    [20]
    K. Krewell, "Transmeta Gets More Efficeon", Microprocessor report. v.17, October, 2003.
    [21]
    M. Herlihy and J. E. B. Moss, "Transactional memory: Architectural support for lock-free data structures", In Proceedings of the 20th annual International Symposium on Computer Architecture (ISCA) 1993.
    [22]
    L. Lamport, "How to Make a Multiprocessor Compute That Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, 1979.
    [23]
    C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation", PLDI 2005.
    [24]
    J. Manson, W. Pugh, S. V. Adve, "The Java memory model", POPL 2005.
    [25]
    D. Marino, A. Singh, T. Millstein, M. Musuvathi and S. Narayanasamy, "A Case for SC-Preserving Compiler", PLDI 2011.
    [26]
    K. E. Moore and J. Bobba and M. J. Moravan and M. D. Hill and D. A. Wood, "LogTM: Log-based Transactional Memory", HPCA 2006.
    [27]
    N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan and C. Zilles, "Hardware atomicity for reliable software speculation", ISCA 2007.
    [28]
    S. Owens, S. Sarkar and P. Sewell, "A Better X86 Memory Model: X86-TSO", Theorem Proving in Higher Order Logics, (TPHOLs), 2009.
    [29]
    S. Patel and S. Lumetta, "rePLay: A Hardware Framework for Dynamic Optimization". IEEE Transactions on Computers.50, 6 (Jun. 2001), 590--608.
    [30]
    S. Patel, T. Tung, S. Bose and M. Crum, "Increasing the size of atomic instruction blocks using control flow assertions", MICRO 2000.
    [31]
    R. Rajwar and M. Herlihy and K. Lai, "Virtualizing Transactional Memory", ISCA 2005.
    [32]
    P. Ranganathan, V. Pai and S. Adve, "Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models", SPAA 1997.
    [33]
    R. Rosner, Y. Almog, Y, M. Moffie, N. Schwartz and A. Mendelson, "Power Awareness through Selective Dynamically Optimized Frames", ISCA 2004.
    [34]
    A. Singh, S. Narayanasamy, D. Marino, T. Millstein, M. Musuvathi, "End-to-End Sequential Consistency", ISCA 2012.
    [35]
    S. Sridhar, J. S. Shapiro, E. Northup and P. Bungale, "HDTrans: An Open Source, Low-Level Dynamic Instrumentation System", VEE 2006.
    [36]
    C. Wang, Y. Wu, "Modeling and Performance Evaluation of TSO-Preserving Binary Optimization", PACT 2011.
    [37]
    D. L. Weaver and T. Germond, editors, "The SPARC architecture Manual (Version 9)", Prentice-Hall, 1994.
    [38]
    T. F. Wenisch, A. Ailamaki, B. Falsafi and A. Moshovos. "Mechanisms for store-wait-free multiprocessors", ISCA 2007.
    [39]
    Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A", Order Number: 253668032US.
    [40]
    Intel® Architecture Instruction Set Extensions Programming Reference", February 2012

    Cited By

    View all
    • (2019)Multi-objective Exploration for Practical Optimization Decisions in Binary TranslationACM Transactions on Embedded Computing Systems10.1145/335818518:5s(1-19)Online publication date: 7-Oct-2019
    • (2019)CoSpecProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358279(399-412)Online publication date: 12-Oct-2019

    Index Terms

    1. TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
      March 2013
      574 pages
      ISBN:9781450318709
      DOI:10.1145/2451116
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
        ASPLOS '13
        March 2013
        540 pages
        ISSN:0163-5964
        DOI:10.1145/2490301
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 4
        ASPLOS '13
        April 2013
        540 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499368
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 March 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. atomicity
      2. dynamic optimization
      3. memory consistency

      Qualifiers

      • Research-article

      Conference

      ASPLOS '13

      Acceptance Rates

      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Multi-objective Exploration for Practical Optimization Decisions in Binary TranslationACM Transactions on Embedded Computing Systems10.1145/335818518:5s(1-19)Online publication date: 7-Oct-2019
      • (2019)CoSpecProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358279(399-412)Online publication date: 12-Oct-2019

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media