Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Memory Consistency Models for Shared-Memory MultiprocessorsDecember 1995
1995 Technical Report
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
Published:01 December 1995
Reflects downloads up to 13 Jan 2025Bibliometrics
Skip Abstract Section
Abstract

The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the programmer, an implementation can exploit the full range of optimizations enabled by previous models. Furthermore, the same information enables automatic and efficient portability across a wide range of implementations. To expose the optimizations enabled by a model, we have developed a formal framework for specifying the low-level ordering constraints that must be enforced by an implementation. Based on these specifications, we present a wide range of architecture and compiler implementation techniques for efficiently supporting a given model. Finally, we evaluate the performance benefits of exploiting relaxed models based on detailed simulations of realistic parallel applications. Our results show that the optimizations enabled by relaxed models are extremely effective in hiding virtually the full latency of writes in architectures with blocking reads (i.e., processor stalls on reads), with gains as high as 80\%. Architectures with non-blocking reads can further exploit relaxed models to hide a substantial fraction of the read latency as well, leading to a larger overall performance benefit. Furthermore, these optimizations complement gains from other latency hiding techniques such as prefetching and multiple contexts. We believe that the combined benefits in hardware and software will make relaxed models universal in future multiprocessors, as is already evidenced by their adoption in several commercial systems.

Cited By

  1. ACM
    Flur S, Sarkar S, Pulte C, Nienhuis K, Maranget L, Gray K, Sezgin A, Batty M and Sewell P (2017). Mixed-size concurrency: ARM, POWER, C/C++11, and SC, ACM SIGPLAN Notices, 52:1, (429-442), Online publication date: 11-May-2017.
  2. ACM
    Flur S, Sarkar S, Pulte C, Nienhuis K, Maranget L, Gray K, Sezgin A, Batty M and Sewell P Mixed-size concurrency: ARM, POWER, C/C++11, and SC Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, (429-442)
  3. ACM
    McPherson A, Nagarajan V, Sarkar S and Cintra M (2015). Fence placement for legacy data-race-free programs via synchronization read detection, ACM SIGPLAN Notices, 50:8, (249-250), Online publication date: 18-Dec-2015.
  4. ACM
    Cleary J, Callanan O, Purcell M and Gregg D (2013). Fast asymmetric thread synchronization, ACM Transactions on Architecture and Code Optimization, 9:4, (1-22), Online publication date: 1-Jan-2013.
  5. Wernli E, Lungu M and Nierstrasz O Incremental dynamic updates with first-class contexts Proceedings of the 50th international conference on Objects, Models, Components, Patterns, (304-319)
  6. ACM
    Sarkar S, Sewell P, Alglave J, Maranget L and Williams D (2011). Understanding POWER multiprocessors, ACM SIGPLAN Notices, 46:6, (175-186), Online publication date: 4-Jun-2011.
  7. ACM
    Sarkar S, Sewell P, Alglave J, Maranget L and Williams D Understanding POWER multiprocessors Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (175-186)
  8. ACM
    Sewell P, Sarkar S, Owens S, Nardelli F and Myreen M (2010). x86-TSO, Communications of the ACM, 53:7, (89-97), Online publication date: 1-Jul-2010.
  9. ACM
    McKenney P and Walpole J (2008). Introducing technology into the Linux kernel, ACM SIGOPS Operating Systems Review, 42:5, (4-17), Online publication date: 1-Jul-2008.
  10. Cantin J, Lipasti M and Smith J (2005). The Complexity of Verifying Memory Coherence and Consistency, IEEE Transactions on Parallel and Distributed Systems, 16:7, (663-671), Online publication date: 1-Jul-2005.
  11. ACM
    Cantin J, Lipasti M and Smith J The complexity of verifying memory coherence Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, (254-255)
  12. ACM
    Yang Y, Gopalakrishnan G and Lindstrom G Specifying Java thread semantics using a uniform memory model Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande, (192-201)
  13. Adve S and Gharachorloo K (1996). Shared Memory Consistency Models, Computer, 29:12, (66-76), Online publication date: 1-Dec-1996.
Contributors
  • Hewlett-Packard Inc.

Recommendations