Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Published: 16 March 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Program optimizations based on data dependences may not preserve the memory consistency in the programs. Previous works leverage a hardware ATOMICITY primitive to restrict the thread interleaving for preserving sequential consistency in region optimizations. However, ATOMICITY primitive is over restrictive on the thread interleaving for optimizing real-world applications developed with the popular Total-Store-Ordering (TSO) memory consistency, which is weaker than sequential consistency. In this paper, we present a novel hardware TSO_ATOMICITY primitive, which has less restriction on the thread interleaving than ATOMICITY primitive to permit more efficient program execution than ATOMICITY primitive, but can still preserve TSO memory consistency in all region optimizations. Furthermore, TSO_ATOMICITY primitive requires similar architecture support as ATOMICITY primitive and can be implemented with only slight change to the existing ATOMICITY primitive implementation. Our experimental results show that in a start-of-art dynamic binary optimization system on a large set of workloads, ATOMICITY primitive can only improve the performance by 4% on average. TSO_ATOMICITY primitive can reduce the overhead associated with ATOMICITY primitive and improve the performance by 12% on average.

    References

    [1]
    R. Agarwal, J. Torrellas, "FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes", ISCA 2011
    [2]
    W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J. Lee, X. Fang, S. Midkiff, S and D. Wong, "BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support", MICRO 2009.
    [3]
    C. S. Ananian and K. Asanovic and B. C. Kuszmaul, C. E. Leiserson and S. Lie, "Unbounded Transactional Memory", HPCA 2005.
    [4]
    V. Bala, E. Duesterwald and S. Banerjia, "Dynamo: A transparent runtime optimization system", PLDI 2000.
    [5]
    L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skalesky, Y. Wang and Y. Zemach, "IA-32 Execution Layer: A Two Phase Dynamic Translator Designed to Support IA-32 Applications on Itaniumâ-based Systems", MICRO 2003.
    [6]
    H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil and P. O'Neil, "A Critique of ANSI SQL Isolation Levels", Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data.
    [7]
    C. Blundell, E. C. Levwis and M. K. Martin, "Deconstructing Transactions: The Subtleties of ATOMICITY", WDDD 2005.
    [8]
    C. Blundell, M. M. Martin and T. F. Wenisch, "InvisiFence: performance-transparent memory ordering in conventional multiprocessors". ISCA 2009.
    [9]
    H. Boehm and S. V. Adve, "Foundations of the C++ concurrency memory model", PLDI 2008.
    [10]
    E. Borin, Y. Wu, M. Breternitz Jr., C. Wang, "LAR-CC: Large atomic regions with conditional commits," CGO 2011.
    [11]
    E. Borin, Y. Wu, C. Wang, W. Liu, M. -Breternitz, S. Hu, E. Natanzon, S. Rotem and R. Rosner, "TAO: two-level atomicity for dynamic binary optimizations", CGO 2010.
    [12]
    D. Bruening, T. Garnett and S. Amarasinghe, "An infrastructure for adaptive Dynamic Optimization", CGO 2003.
    [13]
    S. Burckhardt, M. Musuvathi and V. Singh. "Verifying local transformations on relaxed memory models", CC 2010.
    [14]
    L. Ceze, J. Tuck, P. Montesinos and J. Torrellas, "BulkSC: bulk enforcement of sequential consistency", ISCA 2007.
    [15]
    K. Ebcioglu, E. R. Altman, "DAISY: dynamic compilation for 100% architectural compatibility", ISCA 1997.
    [16]
    G. Gao, V. Sarkar, "Location Consistency-A New Memory Model and Cache Consistency Protocol", IEEE Trans. Computers, 2000.
    [17]
    K. Gharachorloo, A. Gupta and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models", ICPP 1991.
    [18]
    K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors", ISCA 1990.
    [19]
    L. Hammond, V. Wong, M. Chen. B.D. Carlstrom, J.D. Davis, B. Hertzberg, M.K. Prabhu, H. Wijaya; C. Kozyrakis and K. Olukotun, "Transactional Memory Coherence and Consistency", ISCA 2004.
    [20]
    K. Krewell, "Transmeta Gets More Efficeon", Microprocessor report. v.17, October, 2003.
    [21]
    M. Herlihy and J. E. B. Moss, "Transactional memory: Architectural support for lock-free data structures", In Proceedings of the 20th annual International Symposium on Computer Architecture (ISCA) 1993.
    [22]
    L. Lamport, "How to Make a Multiprocessor Compute That Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, 1979.
    [23]
    C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation", PLDI 2005.
    [24]
    J. Manson, W. Pugh, S. V. Adve, "The Java memory model", POPL 2005.
    [25]
    D. Marino, A. Singh, T. Millstein, M. Musuvathi and S. Narayanasamy, "A Case for SC-Preserving Compiler", PLDI 2011.
    [26]
    K. E. Moore and J. Bobba and M. J. Moravan and M. D. Hill and D. A. Wood, "LogTM: Log-based Transactional Memory", HPCA 2006.
    [27]
    N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan and C. Zilles, "Hardware atomicity for reliable software speculation", ISCA 2007.
    [28]
    S. Owens, S. Sarkar and P. Sewell, "A Better X86 Memory Model: X86-TSO", Theorem Proving in Higher Order Logics, (TPHOLs), 2009.
    [29]
    S. Patel and S. Lumetta, "rePLay: A Hardware Framework for Dynamic Optimization". IEEE Transactions on Computers.50, 6 (Jun. 2001), 590--608.
    [30]
    S. Patel, T. Tung, S. Bose and M. Crum, "Increasing the size of atomic instruction blocks using control flow assertions", MICRO 2000.
    [31]
    R. Rajwar and M. Herlihy and K. Lai, "Virtualizing Transactional Memory", ISCA 2005.
    [32]
    P. Ranganathan, V. Pai and S. Adve, "Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models", SPAA 1997.
    [33]
    R. Rosner, Y. Almog, Y, M. Moffie, N. Schwartz and A. Mendelson, "Power Awareness through Selective Dynamically Optimized Frames", ISCA 2004.
    [34]
    A. Singh, S. Narayanasamy, D. Marino, T. Millstein, M. Musuvathi, "End-to-End Sequential Consistency", ISCA 2012.
    [35]
    S. Sridhar, J. S. Shapiro, E. Northup and P. Bungale, "HDTrans: An Open Source, Low-Level Dynamic Instrumentation System", VEE 2006.
    [36]
    C. Wang, Y. Wu, "Modeling and Performance Evaluation of TSO-Preserving Binary Optimization", PACT 2011.
    [37]
    D. L. Weaver and T. Germond, editors, "The SPARC architecture Manual (Version 9)", Prentice-Hall, 1994.
    [38]
    T. F. Wenisch, A. Ailamaki, B. Falsafi and A. Moshovos. "Mechanisms for store-wait-free multiprocessors", ISCA 2007.
    [39]
    Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A", Order Number: 253668032US.
    [40]
    Intel® Architecture Instruction Set Extensions Programming Reference", February 2012

    Cited By

    View all
    • (2019)CoSpecProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358279(399-412)Online publication date: 12-Oct-2019
    • (2019)Multi-objective Exploration for Practical Optimization Decisions in Binary TranslationACM Transactions on Embedded Computing Systems10.1145/335818518:5s(1-19)Online publication date: 7-Oct-2019

    Index Terms

    1. TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 4
      ASPLOS '13
      April 2013
      540 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499368
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
        March 2013
        574 pages
        ISBN:9781450318709
        DOI:10.1145/2451116
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 March 2013
      Published in SIGPLAN Volume 48, Issue 4

      Check for updates

      Author Tags

      1. atomicity
      2. dynamic optimization
      3. memory consistency

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)CoSpecProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358279(399-412)Online publication date: 12-Oct-2019
      • (2019)Multi-objective Exploration for Practical Optimization Decisions in Binary TranslationACM Transactions on Embedded Computing Systems10.1145/335818518:5s(1-19)Online publication date: 7-Oct-2019

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media