Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1993498.1993522acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

A case for an SC-preserving compiler

Published: 04 June 2011 Publication History

Abstract

The most intuitive memory consistency model for shared-memory multi-threaded programming is sequential consistency (SC). However, current concurrent programming languages support a relaxed model, as such relaxations are deemed necessary for enabling important optimizations. This paper demonstrates that an SC-preserving compiler, one that ensures that every SC behavior of a compiler-generated binary is an SC behavior of the source program, retains most of the performance benefits of an optimizing compiler. The key observation is that a large class of optimizations crucial for performance are either already SC-preserving or can be modified to preserve SC while retaining much of their effectiveness. An SC-preserving compiler, obtained by restricting the optimization phases in LLVM, a state-of-the-art C/C++ compiler, incurs an average slowdown of 3.8% and a maximum slowdown of 34% on a set of 30 programs from the SPLASH-2, PARSEC, and SPEC CINT2006 benchmark suites.
While the performance overhead of preserving SC in the compiler is much less than previously assumed, it might still be unacceptable for certain applications. We believe there are several avenues for improving performance without giving up SC-preservation. In this vein, we observe that the overhead of our SC-preserving compiler arises mainly from its inability to aggressively perform a class of optimizations we identify as eager-load optimizations. This class includes common-subexpression elimination, constant propagation, global value numbering, and common cases of loop-invariant code motion. We propose a notion of interference checks in order to enable eager-load optimizations while preserving SC. Interference checks expose to the compiler a commonly used hardware speculation mechanism that can efficiently detect whether a particular variable has changed its value since last read.

References

[1]
S. V. Adve and H.-J. Boehm. Memory models: A case for rethinking parallel languages and hardware. Commun. ACM, 53(8):90--101, 2010.
[2]
S. V. Adve and K. Gharachorloo. Shared memory consistency models: a tutorial. Computer, 29(12):66--76, 1996.
[3]
S. V. Adve and M. D. Hill. Weak ordering---a new definition. In Proceedings of ISCA, pages 2--14. ACM, 1990.
[4]
S. V. Adve, M. D. Hill, B. P.Miller, and R. H. B. Netzer. Detecting data races on weak memory systems. In ISCA, pages 234--243, 1991.
[5]
W. Ahn, S. Qi, J.-W. Lee, M. Nicolaides, X. Fang, J. Torrellas, D. Wong, and S. Midkiff. BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support. In 42nd International Symposium on Microarchitecture, 2009.
[6]
M. Batty, S. Owens, S. Sarkar, P. Sewell, and T.Weber. Mathematizing C++ concurrency. In Proceedings of the 38th annual ACM SIGPLAN- SIGACT symposium on Principles of programming languages, POPL '11, pages 55--66. ACM, 2011.
[7]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008.
[8]
C. Blundell, M. M. Martin, and T. F. Wenisch. InvisiFence: Performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th annual International Symposium on Computer architecture, ISCA '09, pages 233--244. ACM, 2009.
[9]
R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In OOPSLA, 2009.
[10]
H. J. Boehm. Simple thread semantics require race detection. In FIT session at PLDI, 2009.
[11]
H. J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. In Proceedings of PLDI, pages 68--78. ACM, 2008.
[12]
C. Boyapati and M. Rinard. A parameterized type system for race-free Java programs. In Proceedings of OOPSLA, pages 56--69. ACM Press, 2001.
[13]
C. Boyapati, R. Lee, and M. Rinard. Ownership types for safe programming: Preventing data races and deadlocks. In Proceedings of OOPSLA, 2002.
[14]
S. Burckhardt, M. Musuvathi, and V. Singh. Verifying local transformations on relaxed memorymodels. In Compiler Construction, volume 6011 of Lecture Notes in Computer Science, pages 104--123. Springer Berlin / Heidelberg, 2010.
[15]
L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. In ISCA, pages 278--289, 2007.
[16]
L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. The case for system support for concurrency exceptions. In USENIX HotPar, 2009.
[17]
W. Chuang, S. Narayanasamy, G. Venkatesh, J. Sampson, M. V. Biesbrouck, G. Pokam, B. Calder, and O. Colavin. Unbounded page-based transactional memory. International Conference on Architectural Sup- port for Programming Languages and Operating Systems, 2006.
[18]
T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: A race and transactionaware Java runtime. In PLDI, pages 245--255, 2007.
[19]
FeS2. The FeS2 simulator. URL http://fes2.cs.uiuc.edu/.
[20]
C. Flanagan and S. Freund. FastTrack: Efficient and precise dynamic race detection. In Proceedings of PLDI, 2009.
[21]
C. Flanagan and S. N. Freund. Type-based race detection for Java. In Proceedings of PLDI, pages 219--232, 2000.
[22]
D. M. Gallagher, W. Y. Chen, S. A. Mahlke, J. C. Gyllenhaal, and W. mei W. Hwu. Dynamic memory disambiguation using the memory conflict buffer. In ASPLOS, pages 183--193, 1994.
[23]
K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, volume 1, pages 355--364, 1991.
[24]
C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. In IEEE PACT, pages 179--188, 2002.
[25]
L. Hammond, V. Wong, M. K. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In ISCA, pages 102-- 113, 2004.
[26]
J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News, 34:1--17, September 2006. ISSN 0163-5964.
[27]
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of ISCA, pages 289--300. ACM, 1993.
[28]
M. D. Hill. Multiprocessors should support simple memoryconsistency models. IEEE Computer, 31:28--34, 1998. ISSN 0018- 9162.
[29]
Itanium. Inside the Intel Itanium 2 processor. Hewlett Packard Technical White Paper, 2002.
[30]
A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in Titanium. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15. IEEE Computer Society, 2005.
[31]
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 100(28):690--691, 1979.
[32]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society, 2004.
[33]
C. Lin, V. Nagarajan, and R. Gupta. Efficient sequential consistency using conditional fences. In International Conference on Parallel Architectres and Compilation Techniques, 2010.
[34]
B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H. Boehm. Conflict Exceptions: Providing simple parallel language semantics with precise hardware exceptions. In 37th Annual International Symposium on Computer Architecture, June 2010.
[35]
S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, 2002.
[36]
J. Manson, W. Pugh, and S. V. Adve. The java memory model. In Proceedings of POPL, pages 378--391. ACM, 2005.
[37]
D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: A simple and efficient memory model for concurrent programming languages. In PLDI '10, pages 351--362. ACM, 2010.
[38]
A. Muzahid, D. Suarez, S. Qi, and J. Torrellas. SigRace: Signaturebased data race detection. In ISCA, 2009.
[39]
V. Nagarajan and R. Gupta. Speculative optimizations for parallel programs on multicores. In LCPC, pages 323--337, 2009.
[40]
S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In In TPHOLs '09: Conference on Theorem Proving in Higher Order Logics, volume 5674 of LNCS, pages 391--407. Springer, 2009.
[41]
M. Postiff, D. Greene, and T. N. Mudge. The store-load address table and speculative register promotion. In MICRO, pages 235--244, 2000.
[42]
P. Pratikakis, J. S. Foster, and M. Hicks. LOCKSMITH: Contextsensitive correlation analysis for race detection. In Proceedings of PLDI, pages 320--331, 2006.
[43]
M. Prvulovic and J. Torrelas. Reenact: Using thread-level speculation mechanisms to debug data races inmultithreaded codes. In Proceedings of ISCA, San Diego, CA, June 2003.
[44]
P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In SPAA '97, pages 199--210, 1997.
[45]
D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems (TOPLAS), 10(2):282--312, 1988.
[46]
Z. Sura, X. Fang, C. Wong, S. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent java programs. In Proceedings of PPoPP, pages 2--13, 2005.
[47]
J. Ševčík and D. Aspinall. On validity of program transformations in the Java memory model. In ECOOP, pages 27--51, 2008.
[48]
J. Ševčík, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and P. Sewell. Relaxed-memory concurrency and verified compilation. In Proceedings of the 38th annual ACMSIGPLAN-SIGACT symposium on Principles of programming languages, POPL '11, pages 43--54. ACM, 2011.
[49]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In ISCA, pages 24--36, New York, NY, USA, 1995. ACM.
[50]
K. Yeager.The MIPS R10000 superscalar microprocessor. Micro, IEEE, 16(2):28--41, 2002. ISSN 0272-1732.

Cited By

View all
  • (2023)Putting Weak Memory in Order via a Promising Intermediate RepresentationProceedings of the ACM on Programming Languages10.1145/35912977:PLDI(1872-1895)Online publication date: 6-Jun-2023
  • (2023)AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMMProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3579849(61-73)Online publication date: 27-Jan-2023
  • (2022)Cache Abstraction for Data Race Detection in Heterogeneous Systems with Non-coherent AcceleratorsACM Transactions on Embedded Computing Systems10.1145/353545722:1(1-25)Online publication date: 13-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
  • General Chair:
  • Mary Hall,
  • Program Chair:
  • David Padua
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 6
    PLDI '11
    June 2011
    652 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1993316
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. interference checks
  2. memory consistency models
  3. sc preservation
  4. sequential consistency

Qualifiers

  • Research-article

Conference

PLDI '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Putting Weak Memory in Order via a Promising Intermediate RepresentationProceedings of the ACM on Programming Languages10.1145/35912977:PLDI(1872-1895)Online publication date: 6-Jun-2023
  • (2023)AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMMProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3579849(61-73)Online publication date: 27-Jan-2023
  • (2022)Cache Abstraction for Data Race Detection in Heterogeneous Systems with Non-coherent AcceleratorsACM Transactions on Embedded Computing Systems10.1145/353545722:1(1-25)Online publication date: 13-Dec-2022
  • (2021)Safe-by-default Concurrency for Modern Programming LanguagesACM Transactions on Programming Languages and Systems10.1145/346220643:3(1-50)Online publication date: 3-Sep-2021
  • (2021)Cache abstraction for data race detection in heterogeneous systems with non-coherent acceleratorsProceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3461648.3463856(151-162)Online publication date: 22-Jun-2021
  • (2021)Taming x86-TSO persistencyProceedings of the ACM on Programming Languages10.1145/34343285:POPL(1-29)Online publication date: 4-Jan-2021
  • (2020)PeacenikProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378485(317-333)Online publication date: 9-Mar-2020
  • (2019)Accelerating sequential consistency for Java with speculative compilationProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314611(16-30)Online publication date: 8-Jun-2019
  • (2018)High-coverage, unbounded sound predictive race detectionACM SIGPLAN Notices10.1145/3296979.319238553:4(374-389)Online publication date: 11-Jun-2018
  • (2018)Towards understanding the costs of avoiding out-of-thin-air resultsProceedings of the ACM on Programming Languages10.1145/32765062:OOPSLA(1-29)Online publication date: 24-Oct-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media