article

Free access

An evaluation of memory consistency models for shared-memory systems with ILP processors

Authors:

Parthasarathy Ranganathan,

Sarita V. Adve,

Tracy HartonAuthors Info & Claims

ACM SIGPLAN Notices, Volume 31, Issue 9

Pages 12 - 23

https://doi.org/10.1145/248209.237142

Published: 01 September 1996 Publication History

Abstract

Relaxed consistency models have been shown to significantly outperform sequential consistency for single-issue, statically scheduled processors with blocking reads. However, current microprocessors aggressively exploit instruction-level parallelism (ILP) using methods such as multiple issue, dynamic scheduling, and non-blocking reads. Researchers have conjectured that two techniques, hardware-controlled non-binding prefetching and speculative loads, have the potential to equalize the hardware performance of memory consistency models on such processors.This paper performs the first detailed quantitative comparison of several implementations of sequential consistency and release consistency optimized for aggressive ILP processors. Our results indicate that hardware prefetching and speculative loads dramatically improve the performance of sequential consistency. However, the gap between sequential consistency and release consistency depends on the cache write policy and the complexity of the cache-coherence protocol implementation. In most cases, release consistency significantly outperforms sequential consistency, but for two applications, the use of a write-back primary cache and a more complex cache-coherence protocol nearly equalizes the performance of the two models.We also observe that the existing techniques, which require on-chip hardware modifications, enhance the performance of release consistency only to a small extent. We propose two new software techniques --- fuzzy acquires and selective acquires --- to achieve more overlap than allowed by the previous implementations of release consistency. To enhance methods for overlapping acquires, we also propose a technique to eliminate control dependences caused by an acquire loop, using a small amount of off-chip hardware called the synchronization buffer.

References

[1]

S. V. Adve, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. Replacing Locks by Higher-Level Primitives. Technical Report TR94-237, Computer Science, Rice University, 1994.

[2]

S. V. Adve and M. D. Hill. A Unified Formalization of Four Shared-Memory Models. IEEE Trans. on Parallel and Distributed Systems, 4(6):613-624, June 1993.

Digital Library

[3]

A. Agarwal et al. The MIT Alewife Machine: Architecture and Performance. In Proc. of the ~~nd ISUA, pages 2-13, 1995.

Digital Library

[4]

R. Alverson et al. The Tera Computer System. In Proc. of the Intl. Conf. on Supercomputing, pages 1-6, 1990.

Digital Library

[5]

B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. Compcon, 1992.

[6]

R. G. Covington et el. The Efficient Simulation of Paraffl~l Computer Systems. Intl. Journal of Computer Simulation, 1:31-58, January 1991.

[7]

F. Dahlgren and P. Stenstrom. Effectiveness of Hardware- Based Stride and Sequential Prefetching in Shared-Memory Multiprocessora. In Proc. of the 1st Intl. Syrup. on High Performance Computer Architecture, 1995.

Digital Library

[8]

K. Gharachorloo et el. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proc. of the 17th ISCA, pages 15-26, May 1990.

Digital Library

[9]

K. Gharachorloo, A. Gupta, and J. Hennessy. Performance Evaluation of Memory Consistency Models for Shared- Memory Multiprocessors. In Proc. of ASPLOS IV, pages 245-257, 1991.

Digital Library

[10]

K. Gharachorloo, A. Gupta, and J. Hennessy. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proc. of the Intk Conj. on Parallel Processing, pages 1355-i364, 1991.

[11]

K. Gharachorloo, A. Gupta, and J. Hennessy. Hiding Memory Latency Using Dynamic Scheduling in Shared-Memory Multiprocessors. In Proc. of the 19th ISCA, pages 22-33, 1992.

Digital Library

[12]

J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors. In Proc. of ASPLO$ III, pages 64-75, 1989.

Digital Library

[13]

A. Gupta et el. Comparative Evaluation of Latency Reducing and Tolerating Techniques. In Proc. of the 18th ISCA, pages 254-263, May 1991.

Digital Library

[14]

R. Gupta. The Fuzzy Barrier: A Mechanism for High Speed Synchronization of Processors. In Proc. of ASPLOS iII, pages 54-63, April 1989.

Digital Library

[15]

D. Hunt. Advanced Features of the 64-bit PA-8000. Hewlett Packard, 1996.

[16]

Intel Corporation. Pentium (r) Pro Family Developer's Manual

[17]

D. Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. of the 8th ISCA, pages 81-87, 1981.

Digital Library

[18]

L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. on Computers, C-28(9):690-691, September 1979.

Digital Library

[19]

MIPS Technologies, Inc. RIO00 Microprocessor User's Manual, Version 1.1, January 1996.

[20]

T. Mowry and A. Gupta. Tolerating Latency Through Software-Controlled Prefetching. JPDC, pages 87-106, June 1991.

Digital Library

[21]

V. S. Pat, P. Ranganathan, and S. V. Adve. The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodolgy. Technical report, Rice University, July 1996.

[22]

U. Rajagopalan. The Effects of Interconnection Networks on the Performance of Shared-Memory Multiprocessors. Master's thesis, Rice University, January 1995.

[23]

M. Rosenblum et el. The Impact of Architectural Trends on Operating System Performance. In Proc. of the 15th Syrup. on Operating Systems Principles, pages 285-298, 1995.

Digital Library

[24]

J.P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Sharecl-Memory. Computer Architecture News, 20(1):5-44, March 1992.

Digital Library

[25]

Spare International. The $PARC Architecture Manual, 1993. Version 9.

Digital Library

[26]

S. C. Woo et el. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. of the 22nd ISCA, pages 24-36, 1995.

Digital Library

[27]

R. N. Zucker and J.-L. Beer. A Performance Study of Memory Consistency Models. In Proc. of the i9th ISCA, pages 2-12, 1992.

Digital Library

Cited By

Oskin M(2008)The revolution inside the boxCommunications of the ACM10.1145/1364782.136479951:7(70-78)Online publication date: 1-Jul-2008
https://dl.acm.org/doi/10.1145/1364782.1364799
Nanri TSato HShimasaki M(2005)Cost estimation of coherence protocols of software managed cache on distributed shared memory systemHigh Performance Computing10.1007/BFb0024228(335-342)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0024228
Ros AKaxiras S(2020)Speculative Enforcement of Store Atomicity2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00053(555-567)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00053
Show More Cited By

Index Terms

An evaluation of memory consistency models for shared-memory systems with ILP processors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Hardware test
    1. Test-pattern generation and fault simulation
  2. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

An evaluation of memory consistency models for shared-memory systems with ILP processors

Relaxed consistency models have been shown to significantly outperform sequential consistency for single-issue, statically scheduled processors with blocking reads. However, current microprocessors aggressively exploit instruction-level parallelism (ILP)...
An evaluation of memory consistency models for shared-memory systems with ILP processors
ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

Relaxed consistency models have been shown to significantly outperform sequential consistency for single-issue, statically scheduled processors with blocking reads. However, current microprocessors aggressively exploit instruction-level parallelism (ILP)...
The interaction of software prefetching with ILP processors in shared-memory systems
ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work has shown that memory latency remains a significant performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 31, Issue 9

Sept. 1996

273 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/248209

Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle

Issue’s Table of Contents

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
October 1996
290 pages
ISBN:0897917677
DOI:10.1145/237090
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggets
Univ. of Washington, Seattle

Copyright © 1996 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996

Published in SIGPLAN Volume 31, Issue 9

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

76
Total Citations
View Citations
929
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)14

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Oskin M(2008)The revolution inside the boxCommunications of the ACM10.1145/1364782.136479951:7(70-78)Online publication date: 1-Jul-2008
https://dl.acm.org/doi/10.1145/1364782.1364799
Nanri TSato HShimasaki M(2005)Cost estimation of coherence protocols of software managed cache on distributed shared memory systemHigh Performance Computing10.1007/BFb0024228(335-342)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0024228
Ros AKaxiras S(2020)Speculative Enforcement of Store Atomicity2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00053(555-567)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00053
Shin STuck JSolihin Y(2017)Hiding the Long Latency of Persist Barriers Using Speculative ExecutionACM SIGARCH Computer Architecture News10.1145/3140659.308024045:2(175-186)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3140659.3080240
Akazue MHalvey MBaillie L(2017)Using Thermal Stimuli to Enhance Photo-Sharing in Social MediaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/30900501:2(1-21)Online publication date: 30-Jun-2017
https://dl.acm.org/doi/10.1145/3090050
Shin STuck JSolihin Y(2017)Hiding the Long Latency of Persist Barriers Using Speculative ExecutionProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080240(175-186)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3079856.3080240
Wang SHuang SLiu TMa JChen ZVeijalainen J(2016)Ranking-Oriented Collaborative FilteringACM Transactions on Information Systems10.1145/296040835:2(1-28)Online publication date: 21-Sep-2016
https://dl.acm.org/doi/10.1145/2960408
Singh AAga SNarayanasamy SPrvulovic M(2015)Efficiently enforcing strong memory ordering in GPUsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830778(699-712)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2830772.2830778
Terechko AHoogerbrugge JAlkadi GGuntur SLahiri ADuranton MWüst CChristie PNackaerts AKumar A(2012)Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore ArchitecturesACM Transactions on Embedded Computing Systems10.1145/2180887.218089011S:1(1-32)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1145/2180887.2180890
Cohen SKimelfeld BSagiv Y(2009)Incorporating constraints in probabilistic XMLACM Transactions on Database Systems10.1145/1567274.156728034:3(1-45)Online publication date: 3-Sep-2009
https://dl.acm.org/doi/10.1145/1567274.1567280
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents