Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Published: 15 June 2009 Publication History

Abstract

Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the decades-old performance trend of doubling performance every 18 months. An attractive approach for exploiting multiple cores is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. However, despite decades of research on automatic parallelization, most techniques are only effective in the scientific and data parallel domains where array dominated codes can be precisely analyzed by the compiler. Thread-level speculation offers the opportunity to expand parallelization to general-purpose programs, but at the cost of expensive hardware support. In this paper, we focus on providing low-overhead software support for exploiting speculative parallelism. We propose STMlite, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization. STMlite eliminates a considerable amount of checking and locking overhead in conventional software transactional memory models by decoupling the commit phase from main transaction execution. Further, strong atomicity requirements for generic transactional memories are unnecessary within a stylized automatic parallelization framework. STMlite enables sequential applications to extract meaningful performance gains on commodity multicore hardware.

References

[1]
M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 185--196, 2009.
[2]
A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 26--37, 2006.
[3]
R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence--based approach. Morgan Kaufmann Publishers Inc., 2002.
[4]
M. J. Bridges et al. Revisiting the sequential programming model for multi-core. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 69--81, Dec. 2007.
[5]
B. D. Carlstrom et al. The Atomos transactional programming language. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 1--13, June 2006.
[6]
L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proc. of the 33rd Annual International Symposium on Computer Architecture, pages 227--238, Washington, DC, USA, 2006. IEEE Computer Society.
[7]
M. K. Chen and K. Olukotun. Exploiting method-level parallelism in single-threaded Java programs. In Proc. of the 7th International Conference on Parallel Architectures and Compilation Techniques, page 176, Oct. 1998.
[8]
K. Cooper et al. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244--263, Feb. 1993.
[9]
D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In Proc. of the 2006 International Symposium on Distributed Computing, 2006.
[10]
D. Dice and N. Shavit. Understanding tradeoffs in software transactional memory. In Proc. of the 2007 International Symposium on Code Generation and Optimization, pages 21--33, 2007.
[11]
Z.-H. Du et al. A cost-driven compilation framework for speculative parallelization of sequential programs. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 71--81, 2004.
[12]
W. Eatherton. The push of network processing to the top of the pyramid, 2005. Keynote address: Symposium on Architectures for Networking and Communications Systems.
[13]
M. Frank. SUDS: Automatic parallelization for Raw Processors. PhD thesis, MIT, 2003.
[14]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. of the SIGPLAN'98 Conference on Programming Language Design and Implementation, pages 212--223, June 1998.
[15]
M. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, Dec. 1996.
[16]
L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, Oct. 1998.
[17]
T. Harris and K. Fraser. Language support for lightweight transactions. Proceedings of the OOPSLA'03, 38(11):388--402, 2003.
[18]
T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing memory transactions. Proc. of the SIGPLAN'06 Conference on Programming Language Design and Implementation, 41(6):14--25, 2006.
[19]
M. Herlihy, V. Luchangco, and M. Moir. The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures. In Proceedings of the 16th International Conference on Distributed Computing, pages 339--353. Springer-Verlag, 2002.
[20]
H. P. Hofstee. Power efficient processor design and the Cell processor. In Proc. of the 11th International Symposium on High-Performance Computer Architecture, pages 258--262, Feb. 2005.
[21]
T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Min-cut program decomposition for thread-level speculation. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 59--70, June 2004.
[22]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2):21--29, Feb. 2005.
[23]
J. Larus and R. Rajwar. Transactional Memroy. Morgan & Claypool Publishers, 2007.
[24]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 75--86, 2004.
[25]
W. Liu et al. POSH: A TLS compiler that exploits program structure. In Proc. of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 158--167, Apr. 2006.
[26]
V. J. Marathe, W. N. Scherer, and M. L. Scott. Adaptive software transactional memory. In Proc. of the 2005 International Symposium on Distributed Computing, pages 354--368, Sept. 2005.
[27]
P. Marcuello and A. Gonzalez. Thread-spawning schemes for speculative multithreading. In Proc. of the 8th International Symposium on High-Performance Computer Architecture, page 55, Feb. 2002.
[28]
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of IISWC08, 2008.
[29]
C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In Proc. of the 34th Annual International Symposium on Computer Architecture, pages 69--80, New York, NY, USA, 2007. ACM.
[30]
J. Nickolls and I. Buck. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007.
[31]
E. Nystrom, H.-S. Kim, and W. Hwu. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proc. of the 11th Static Analysis Symposium, pages 165--180, Aug. 2004.
[32]
B. Saha, A. Adl-Tabatabai, and Q. Jacobson. Architectural support for software transactional memory. In Proc. of the 39th Annual International Symposium on Microarchitecture, pages 185--196, Nov. 2006.
[33]
F. T. Schneider, V. Menon, T. Shpeisman, and A.-R. Adl-Tabatabai. Dynamic optimization for efficient strong atomicity. In Proceedings of the OOPSLA'08, pages 181--194, 2008.
[34]
M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, Oct. 1998.
[35]
N. Shavit and D. Touitou. Software transactional memory. Journal of Parallel and Distributed Computing, 10(2):99--116, Feb. 1997.
[36]
T. Shpeisman, V. Menon, A.-R. Adl-Tabatabai, S. Balensiefer, D. Grossman, R. L. Hudson, K. F. Moore, and B. Saha. Enforcing isolation and ordering in STM. In Proc. of the SIGPLAN '07 Conference on Programming Language Design and Implementation, pages 78--88, 2007.
[37]
A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible Decoupled Transactional Memory Support. In Proc. of the 35th Annual International Symposium on Computer Architecture, pages 139--150, 2008.
[38]
M. F. Spear, V. J. Marathe, W. N. S. Iii, and M. L. Scott. Conflict detection and validation strategies for software transactional memory. In Proc. of the 2006 International Symposium on Distributed Computing, 2006.
[39]
M. F. Spear, M. M. Michael, and C. von Praun. RingSTM: scalable transactions with a single atomic instruction. pages 275--284, 2008.
[40]
J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proc. of the 4th International Symposium on High--Performance Computer Architecture, pages 2--13, 1998.
[41]
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 2002 International Conference on Compiler Construction, pages 179--196, 2002.
[42]
N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In Proc. of the 16th International Conference on Parallel Architectures and Compilation Techniques, pages 49--59, Sept. 2007.
[43]
L. Yen et al. LogTM-SE: Decoupling hardware transactional memory from caches. In Proc. of the 13th International Symposium on High-Performance Computer Architecture, pages 261--272, Feb. 2007.
[44]
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proc. of the 14th International Symposium on High-Performance Computer Architecture, Feb. 2008.
[45]
C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of the 35th Annual International Symposium on Microarchitecture, pages 85--96, Nov. 2002.\endthebibliography

Cited By

View all
  • (2024)PIM-STM: Software Transactional Memory for Processing-In-Memory SystemsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640428(897-911)Online publication date: 27-Apr-2024
  • (2023)Block-STMProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577524(232-244)Online publication date: 25-Feb-2023
  • (2023)Performance Evaluation on Parallel Speculation-Based Construction of a Binary Search TreeInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00013-w11:2(88-111)Online publication date: 8-Nov-2023
  • Show More Cited By

Index Terms

  1. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 6
      PLDI '09
      June 2009
      478 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1543135
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2009
        492 pages
        ISBN:9781605583921
        DOI:10.1145/1542476
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 June 2009
      Published in SIGPLAN Volume 44, Issue 6

      Check for updates

      Author Tags

      1. automatic parallelization
      2. loop level parallelism
      3. profile-guided optimization
      4. software transactional memory
      5. thread-level speculation

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)PIM-STM: Software Transactional Memory for Processing-In-Memory SystemsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640428(897-911)Online publication date: 27-Apr-2024
      • (2023)Block-STMProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577524(232-244)Online publication date: 25-Feb-2023
      • (2023)Performance Evaluation on Parallel Speculation-Based Construction of a Binary Search TreeInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00013-w11:2(88-111)Online publication date: 8-Nov-2023
      • (2021)Quantifying the Semantic Gap Between Serial and Parallel Programming2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00024(151-162)Online publication date: Nov-2021
      • (2016)Design of a Method-Level Speculation framework for boosting irregular JVM applicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2015.09.00587:C(13-25)Online publication date: 1-Jan-2016
      • (2015)Generalized Task ParallelismACM Transactions on Architecture and Code Optimization10.1145/272316412:1(1-25)Online publication date: 2-Apr-2015
      • (2015)Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic ParallelizationLanguages and Compilers for Parallel Computing10.1007/978-3-319-17473-0_7(101-115)Online publication date: 1-May-2015
      • (2014)Speculative Program Parallelization with Scalable and Decentralized Runtime VerificationRuntime Verification10.1007/978-3-319-11164-3_11(124-139)Online publication date: 2014
      • (2013)Parallelizing the Sparse Matrix Transposition: Reducing the Programmer Effort Using Transactional MemoryProcedia Computer Science10.1016/j.procs.2013.05.21418(501-510)Online publication date: 2013
      • (2013)A Software-Based Method-Level Speculation Framework for the Java PlatformLanguages and Compilers for Parallel Computing10.1007/978-3-642-37658-0_14(205-219)Online publication date: 2013
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media