research-article

Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Authors:

Mojtaba Mehrara,

Scott MahlkeAuthors Info & Claims

ACM SIGPLAN Notices, Volume 44, Issue 6

Pages 166 - 176

https://doi.org/10.1145/1543135.1542495

Published: 15 June 2009 Publication History

Abstract

Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the decades-old performance trend of doubling performance every 18 months. An attractive approach for exploiting multiple cores is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. However, despite decades of research on automatic parallelization, most techniques are only effective in the scientific and data parallel domains where array dominated codes can be precisely analyzed by the compiler. Thread-level speculation offers the opportunity to expand parallelization to general-purpose programs, but at the cost of expensive hardware support. In this paper, we focus on providing low-overhead software support for exploiting speculative parallelism. We propose STMlite, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization. STMlite eliminates a considerable amount of checking and locking overhead in conventional software transactional memory models by decoupling the commit phase from main transaction execution. Further, strong atomicity requirements for generic transactional memories are unnecessary within a stylized automatic parallelization framework. STMlite enables sequential applications to extract meaningful performance gains on commodity multicore hardware.

References

[1]

M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 185--196, 2009.

Digital Library

[2]

A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 26--37, 2006.

Digital Library

[3]

R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence--based approach. Morgan Kaufmann Publishers Inc., 2002.

Digital Library

[4]

M. J. Bridges et al. Revisiting the sequential programming model for multi-core. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 69--81, Dec. 2007.

Digital Library

[5]

B. D. Carlstrom et al. The Atomos transactional programming language. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 1--13, June 2006.

Digital Library

[6]

L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proc. of the 33rd Annual International Symposium on Computer Architecture, pages 227--238, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[7]

M. K. Chen and K. Olukotun. Exploiting method-level parallelism in single-threaded Java programs. In Proc. of the 7th International Conference on Parallel Architectures and Compilation Techniques, page 176, Oct. 1998.

Digital Library

[8]

K. Cooper et al. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244--263, Feb. 1993.

[9]

D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In Proc. of the 2006 International Symposium on Distributed Computing, 2006.

Digital Library

[10]

D. Dice and N. Shavit. Understanding tradeoffs in software transactional memory. In Proc. of the 2007 International Symposium on Code Generation and Optimization, pages 21--33, 2007.

Digital Library

[11]

Z.-H. Du et al. A cost-driven compilation framework for speculative parallelization of sequential programs. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 71--81, 2004.

Digital Library

[12]

W. Eatherton. The push of network processing to the top of the pyramid, 2005. Keynote address: Symposium on Architectures for Networking and Communications Systems.

[13]

M. Frank. SUDS: Automatic parallelization for Raw Processors. PhD thesis, MIT, 2003.

Digital Library

[14]

M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. of the SIGPLAN'98 Conference on Programming Language Design and Implementation, pages 212--223, June 1998.

Digital Library

[15]

M. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, Dec. 1996.

Digital Library

[16]

L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, Oct. 1998.

Digital Library

[17]

T. Harris and K. Fraser. Language support for lightweight transactions. Proceedings of the OOPSLA'03, 38(11):388--402, 2003.

Digital Library

[18]

T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing memory transactions. Proc. of the SIGPLAN'06 Conference on Programming Language Design and Implementation, 41(6):14--25, 2006.

Digital Library

[19]

M. Herlihy, V. Luchangco, and M. Moir. The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures. In Proceedings of the 16th International Conference on Distributed Computing, pages 339--353. Springer-Verlag, 2002.

Digital Library

[20]

H. P. Hofstee. Power efficient processor design and the Cell processor. In Proc. of the 11th International Symposium on High-Performance Computer Architecture, pages 258--262, Feb. 2005.

Digital Library

[21]

T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Min-cut program decomposition for thread-level speculation. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 59--70, June 2004.

Digital Library

[22]

P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2):21--29, Feb. 2005.

Digital Library

[23]

J. Larus and R. Rajwar. Transactional Memroy. Morgan & Claypool Publishers, 2007.

[24]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 75--86, 2004.

Digital Library

[25]

W. Liu et al. POSH: A TLS compiler that exploits program structure. In Proc. of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 158--167, Apr. 2006.

Digital Library

[26]

V. J. Marathe, W. N. Scherer, and M. L. Scott. Adaptive software transactional memory. In Proc. of the 2005 International Symposium on Distributed Computing, pages 354--368, Sept. 2005.

Digital Library

[27]

P. Marcuello and A. Gonzalez. Thread-spawning schemes for speculative multithreading. In Proc. of the 8th International Symposium on High-Performance Computer Architecture, page 55, Feb. 2002.

Digital Library

[28]

C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of IISWC08, 2008.

[29]

C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In Proc. of the 34th Annual International Symposium on Computer Architecture, pages 69--80, New York, NY, USA, 2007. ACM.

Digital Library

[30]

J. Nickolls and I. Buck. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007.

[31]

E. Nystrom, H.-S. Kim, and W. Hwu. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proc. of the 11th Static Analysis Symposium, pages 165--180, Aug. 2004.

[32]

B. Saha, A. Adl-Tabatabai, and Q. Jacobson. Architectural support for software transactional memory. In Proc. of the 39th Annual International Symposium on Microarchitecture, pages 185--196, Nov. 2006.

Digital Library

[33]

F. T. Schneider, V. Menon, T. Shpeisman, and A.-R. Adl-Tabatabai. Dynamic optimization for efficient strong atomicity. In Proceedings of the OOPSLA'08, pages 181--194, 2008.

Digital Library

[34]

M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, Oct. 1998.

Digital Library

[35]

N. Shavit and D. Touitou. Software transactional memory. Journal of Parallel and Distributed Computing, 10(2):99--116, Feb. 1997.

[36]

T. Shpeisman, V. Menon, A.-R. Adl-Tabatabai, S. Balensiefer, D. Grossman, R. L. Hudson, K. F. Moore, and B. Saha. Enforcing isolation and ordering in STM. In Proc. of the SIGPLAN '07 Conference on Programming Language Design and Implementation, pages 78--88, 2007.

Digital Library

[37]

A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible Decoupled Transactional Memory Support. In Proc. of the 35th Annual International Symposium on Computer Architecture, pages 139--150, 2008.

Digital Library

[38]

M. F. Spear, V. J. Marathe, W. N. S. Iii, and M. L. Scott. Conflict detection and validation strategies for software transactional memory. In Proc. of the 2006 International Symposium on Distributed Computing, 2006.

Digital Library

[39]

M. F. Spear, M. M. Michael, and C. von Praun. RingSTM: scalable transactions with a single atomic instruction. pages 275--284, 2008.

Digital Library

[40]

J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proc. of the 4th International Symposium on High--Performance Computer Architecture, pages 2--13, 1998.

Digital Library

[41]

W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 2002 International Conference on Compiler Construction, pages 179--196, 2002.

Digital Library

[42]

N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In Proc. of the 16th International Conference on Parallel Architectures and Compilation Techniques, pages 49--59, Sept. 2007.

Digital Library

[43]

L. Yen et al. LogTM-SE: Decoupling hardware transactional memory from caches. In Proc. of the 13th International Symposium on High-Performance Computer Architecture, pages 261--272, Feb. 2007.

Digital Library

[44]

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proc. of the 14th International Symposium on High-Performance Computer Architecture, Feb. 2008.

[45]

C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of the 35th Annual International Symposium on Microarchitecture, pages 85--96, Nov. 2002.\endthebibliography

Digital Library

Cited By

Lopes ACastro DRomano PTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)PIM-STM: Software Transactional Memory for Processing-In-Memory SystemsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640428(897-911)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640428
Gelashvili RSpiegelman AXiang ZDanezis GLi ZMalkhi DXia YZhou RDehnavi MKulkarni MKrishnamoorthy S(2023)Block-STMProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577524(232-244)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577524
Hirata HNunome A(2023)Performance Evaluation on Parallel Speculation-Based Construction of a Binary Search TreeInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00013-w11:2(88-111)Online publication date: 8-Nov-2023
https://doi.org/10.1007/s44227-023-00013-w
Show More Cited By

Index Terms

Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language features
        Concurrent programming structures

Recommendations

Speculative parallelization using software multi-threaded transactions
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation

Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the ...
Speculative parallelization using software multi-threaded transactions
ASPLOS '10

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 44, Issue 6

PLDI '09

June 2009

478 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1543135

Issue’s Table of Contents

PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2009
492 pages
ISBN:9781605583921
DOI:10.1145/1542476
General Chair:
Michael Hind
IBM Research, USA
,
Program Chair:
Amer Diwan
University of Colorado at Boulder, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2009

Published in SIGPLAN Volume 44, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

94
Total Citations
View Citations
1,050
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lopes ACastro DRomano PTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)PIM-STM: Software Transactional Memory for Processing-In-Memory SystemsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640428(897-911)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640428
Gelashvili RSpiegelman AXiang ZDanezis GLi ZMalkhi DXia YZhou RDehnavi MKulkarni MKrishnamoorthy S(2023)Block-STMProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577524(232-244)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577524
Hirata HNunome A(2023)Performance Evaluation on Parallel Speculation-Based Construction of a Binary Search TreeInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00013-w11:2(88-111)Online publication date: 8-Nov-2023
https://doi.org/10.1007/s44227-023-00013-w
Zhang XJones TCampanoni S(2021)Quantifying the Semantic Gap Between Serial and Parallel Programming2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00024(151-162)Online publication date: Nov-2021
https://doi.org/10.1109/IISWC53511.2021.00024
Anjo ICachopo J(2016)Design of a Method-Level Speculation framework for boosting irregular JVM applicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2015.09.00587:C(13-25)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1016/j.jpdc.2015.09.005
Streit KDoerfert JHammacher CZeller AHack S(2015)Generalized Task ParallelismACM Transactions on Architecture and Code Optimization10.1145/272316412:1(1-25)Online publication date: 2-Apr-2015
https://dl.acm.org/doi/10.1145/2723164
Brinkers DVeldema RPhilippsen M(2015)Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic ParallelizationLanguages and Compilers for Parallel Computing10.1007/978-3-319-17473-0_7(101-115)Online publication date: 1-May-2015
https://doi.org/10.1007/978-3-319-17473-0_7
Sukumaran-Rajam AMartinez Caamaño JWolff WJimborean AClauss P(2014)Speculative Program Parallelization with Scalable and Decentralized Runtime VerificationRuntime Verification10.1007/978-3-319-11164-3_11(124-139)Online publication date: 2014
https://doi.org/10.1007/978-3-319-11164-3_11
Gonzalez-Mesa MGutierrez EPlata O(2013)Parallelizing the Sparse Matrix Transposition: Reducing the Programmer Effort Using Transactional MemoryProcedia Computer Science10.1016/j.procs.2013.05.21418(501-510)Online publication date: 2013
https://doi.org/10.1016/j.procs.2013.05.214
Anjo ICachopo J(2013)A Software-Based Method-Level Speculation Framework for the Java PlatformLanguages and Compilers for Parallel Computing10.1007/978-3-642-37658-0_14(205-219)Online publication date: 2013
https://doi.org/10.1007/978-3-642-37658-0_14
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents