research-article

Public Access

What Scalable Programs Need from Transactional Memory

Authors:

Keshav PingaliAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 45, Issue 1

Pages 105 - 118

https://doi.org/10.1145/3093337.3037750

Published: 04 April 2017 Publication History

Abstract

Transactional memory (TM) has been the focus of numerous studies, and it is supported in processors such as the IBM Blue Gene/Q and Intel Haswell. Many studies have used the STAMP benchmark suite to evaluate their designs. However, the speedups obtained for the STAMP benchmarks on all TM systems we know of are quite limited; for example, with 64 threads on the IBM Blue Gene/Q, we observe a median speedup of 1.4X using the Blue Gene/Q hardware transactional memory (HTM), and a median speedup of 4.1X using a software transactional memory (STM).

What limits the performance of these benchmarks on TMs? In this paper, we argue that the problem lies with the programming model and data structures used to write them. To make this point, we articulate two principles that we believe must be embodied in any scalable program and argue that STAMP programs violate both of them. By modifying the STAMP programs to satisfy both principles, we produce a new set of programs that we call the Stampede suite. Its median speedup on the Blue Gene/Q is 8.0X when using an STM. The two principles also permit us to simplify the TM design. Using this new STM with the Stampede benchmarks, we obtain a median speedup of 17.7X with 64 threads on the Blue Gene/Q and 13.2X with 32 threads on an Intel Westmere system.

These results suggest that HTM and STM designs will benefit if more attention is paid to the division of labor between application programs, systems software, and hardware.

References

[1]

M. Abadi, A. Birrell, T. Harris, and M. Isard. Semantics of transactional memory and automatic mutual exclusion. ACM Trans. Programming Language and Systems, 33 (1): 2:1--2:50, Jan. 2011. 10.1145/1889997.1889999.

Digital Library

[2]

A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, PLDI, pages 26--37, 2006. 10.1145/1133981.1133985.

Digital Library

[3]

A. W. Appel. Compiling with Continuations. Cambridge University Press, 2007.

Digital Library

[4]

H. Avni and N. Shavit. Maintaining consistent transactional states without a global clock. In Proc. Intl Colloq. Structural Information and Communication Complexity, pages 131--140, 2008. 10.1007/978--3--540--69355-0_12.

Digital Library

[5]

M. J. Best, S. Mottishaw, C. Mustard, M. Roth, A. Fedorova, and A. Brownsword. Synchronization via scheduling: Techniques for efficiently managing shared state. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, PLDI, pages 640--652, 2011. 10.1145/1993498.1993573.

Digital Library

[6]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. Intl Conf. Parallel Architectures and Compilation Techniques, PACT, pages 72--81, 2008. 10.1145/1454115.1454128.

Digital Library

[7]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Notices, 30 (8): 207--216, 1995. 10.1145/209937.209958.

Digital Library

[8]

C. Blundell, J. Devietti, E. C. Lewis, and M. M. K. Martin. Making the fast case common and the uncommon case simple in unbounded transactional memory. In Proc. Intl Symp. Computer Architecture, ISCA, pages 24--34, 2007. 10.1145/1250662.1250667.

Digital Library

[9]

J. Bobba, N. Goyal, M. D. Hill, M. M. Swift, and D. A. Wood. Token™: Efficient execution of large transactions with hardware transactional memory. In Proc. Intl Symp. Computer Architecture, ISCA, pages 127--138, 2008. 10.1109/ISCA.2008.24.

[10]

C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proc. IEEE Intl Symp. Workload Characterization, IISWC, Sept. 2008.

[11]

B. D. Carlstrom, A. McDonald, H. Chafi, J. Chung, C. C. Minh, C. Kozyrakis, and K. Olukotun. The Atomos transactional programming language. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, PLDI, pages 1--13, 2006. 10.1145/1133981.1133983.

Digital Library

[12]

B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl., 21 (3): 291--312, Aug. 2007. 10.1177/1094342007078442.

Digital Library

[13]

A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: Designing scalable software for multicore processors. In Proc. ACM Symp. Operating Systems Principles, SOSP, pages 1--17, 2013. 10.1145/2517349.2522712.

Digital Library

[14]

C. Click. Azul's experiences with hardware transactional memory. In HP Labs' Bay Area Workshop on Transactional Memory, 2009.

[15]

L. Dalessandro, F. Carouge, S. White, Y. Lev, M. Moir, M. L. Scott, and M. F. Spear. Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory. In Proc. Intl Conf. Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 39--52, 2011. 10.1145/1950365.1950373.

Digital Library

[16]

D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Proc. Intl Conf. Distributed Computing, pages 194--208, 2006. 10.1007/11864219_14.

Digital Library

[17]

D. Dice, Y. Lev, M. Moir, and D. Nussbaum. Early experience with a commercial hardware transactional memory implementation. In Proc. Intl Conf. Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 157--168, 2009. 10.1145/1508244.1508263.

Digital Library

[18]

N. Diegues, P. Romano, and L. Rodrigues. Virtues and limitations of commodity hardware transactional memory. In Proc. Intl Conf. Parallel Architectures and Compilation, PACT, pages 3--14, 2014. 10.1145/2628071.2628080.

Digital Library

[19]

S. Dolev, D. Hendler, and A. Suissa. CAR-S™: Scheduling-based collision avoidance and resolution for software transactional memory. In Proc. ACM Symp. Principles of Distributed Computing, PODC, pages 125--134, 2008. 10.1145/1400751.1400769.

Digital Library

[20]

A. Dragojević, R. Guerraoui, and M. Kapalka. Stretching transactional memory. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, PLDI, pages 155--165, 2009. 10.1145/1542476.1542494.

Digital Library

[21]

P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, PPoPP, pages 237--246, 2008. 10.1145/1345206.1345241.

Digital Library

[22]

S. Ghemawat and P. Menage. TCMalloc: Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html, 2014.

[23]

T. Harris and K. Fraser. Language support for lightweight transactions. In Proc. ACM SIGPLAN Conf. Object-oriented Programing, Systems, Languages and Applications, OOPSLA, pages 388--402, New York, NY, USA, 2003. 10.1145/949305.949340.

Digital Library

[24]

M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proc. Intl Symp. Computer Architecture, ISCA, 1993. 10.1145/165123.165164.

Digital Library

[25]

M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, March 2008. ISBN 0123705916.

[26]

M. Kulkarni, L. P. Chew, and K. Pingali. Using transactions in Delaunay mesh generation. In Proc. Workshop on Transactional Memory Workloads, WTW, 2006.

[27]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, PLDI, pages 211--222, 2007. 10.1145/1250734.1250759.

Digital Library

[28]

C. Lattner, A. Lenharth, and V. Adve. Making context-sensitive points-to analysis with heap cloning practical for the real world. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, PLDI, pages 278--289, 2007. 10.1145/1250734.1250766.

Digital Library

[29]

A. Lenharth and K. Pingali. Scaling runtimes for irregular algorithms to large-scale NUMA systems. Computer, 48 (8): 35--44, 2015. 10.1109/MC.2015.229.

[30]

A. Lenharth, D. Nguyen, and K. Pingali. Priority queues are not good concurrent priority schedulers. In Proc. European Conf. Parallel Processing, pages 209--221, 2015.

[31]

V. Luchangco, M. Wong, H. Boehm, J. Gottschlich, J. Maurer, P. McKenney, M. Michael, M. Moir, T. Riegel, M. Scott, T. Shpeisman, and M. Spear. Transactional memory support for C+. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3919.pdf, Feb. 2014.

[32]

M. Mendez-Lojo, A. Mathew, and K. Pingali. Parallel inclusion-based points-to analysis. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA, October 2010.

Digital Library

[33]

V. Menon and K. Pingali. High-level semantic optimization of numerical codes. In Proc. Intl Conf. Supercomputing, ICS, pages 434--443, 1999. 10.1145/305138.305230.

Digital Library

[34]

C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In Proc. Intl Symp. Computer Architecture, ISCA, pages 69--80, 2007. 10.1145/1250662.1250673.

Digital Library

[35]

R. Nasre, M. Burtscher, and K. Pingali. Data-driven versus topology-driven irregular computations on GPUs. In Proc. IEEE Intl Symp. Parallel and Distributed Processing, pages 463--474, 2013.

Digital Library

[36]

R. Nasre, M. Burtscher, and K. Pingali. Morph algorithms on GPUs. In ACM SIGPLAN Notices, volume 48, pages 147--156, 2013

Digital Library

[37]

D. Nguyen and K. Pingali. Synthesizing concurrent schedulers for irregular algorithms. In Proc. Intl Conf. Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 333--344, 2011. 10.1145/1950365.1950404.

Digital Library

[38]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proc. ACM Symp. Operating Systems Principles, SOSP, pages 456--471, New York, NY, USA, 2013. 10.1145/2517349.2522739.

Digital Library

[39]

Y. Ni, A. Welc, A.-R. Adl-Tabatabai, M. Bach, S. Berkowits, J. Cownie, R. Geva, S. Kozhukow, R. Narayanaswamy, J. Olivier, S. Preis, B. Saha, A. Tal, and X. Tian. Design and implementation of transactional constructs for C/C++. In Proc. ACM SIGPLAN Intl. Conf. Object-oriented Programming Systems Languages and Applications, OOPSLA, pages 195--212, 2008. 10.1145/1449764.1449780.

Digital Library

[40]

S. Pai and K. Pingali. A compiler for throughput optimization of graph algorithms on GPUs. In Proc. ACM SIGPLAN Intl Conf. Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA, pages 1--19, 2016. 10.1145/2983990.2984015.

Digital Library

[41]

K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, PLDI, pages 12--25, 2011. 10.1145/1993498.1993501.

Digital Library

[42]

T. Riegel, P. Felber, and C. Fetzer. A lazy snapshot algorithm with eager validation. In Proc. Intl Conf. on Distributed Computing, DISC, pages 284--298, 2006. 10.1007/11864219_20.

Digital Library

[43]

T. Riegel, C. Fetzer, and P. Felber. Time-based transactional memory with scalable time bases. In Proc. ACM Symp. on Parallel Algorithms and Architectures, SPAA, pages 221--228, 2007. 10.1145/1248377.1248415.

Digital Library

[44]

W. Ruan, Y. Liu, and M. Spear. Boosting timestamp-based transactional memory by exploiting hardware cycle counters. ACM Trans. Archit. Code Optim., 10 (4): 40:1--40:21, Dec. 2013. 10.1145/2541228.2555297.

Digital Library

[45]

W. Ruan, T. Vyas, Y. Liu, and M. Spear. Transactionalizing legacy code: An experience report using gcc and memcached. In Proc. Intl Conf. Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 399--412, 2014. 10.1145/2541940.2541960.

Digital Library

[46]

B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. C. Minh, and B. Hertzberg. McRT-S™: a high performance software transactional memory system for a multi-core runtime. In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, PPoPP, pages 187--197, 2006. 10.1145/1122971.1123001.

Digital Library

[47]

A. Shriraman, M. F. Spear, H. Hossain, V. J. Marathe, S. Dwarkadas, and M. L. Scott. An integrated hardware-software approach to flexible transactional memory. In Proc. Intl Symp. Computer Architecture, ISCA, pages 104--115, 2007. 10.1145/1250662.1250676.

Digital Library

[48]

M. F. Spear, M. M. Michael, and C. von Praun. RingS™: scalable transactions with a single atomic instruction. In Proc. Symp. Parallelism in Algorithms and Architectures, SPAA, pages 275--284, 2008. 10.1145/1378533.1378583.

[49]

S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, and M. Valero. EazyH™: eager-lazy hardware transactional memory. In Proc. IEEE/ACM Intl Symp. Microarchitecture, MICRO, pages 145--155, 2009. 10.1145/1669112.1669132.

[50]

A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q hardware support for transactional memories. In Proc. Intl Conf. Parallel Architectures and Compilation Techniques, PACT, pages 127--136, 2012. 10.1145/2370816.2370836.

Digital Library

[51]

L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling hardware transactional memory from caches. In Proc. IEEE Intl Symp. High Performance Computer Architecture, HPCA, pages 261--272, 2007. 10.1109/HPCA.2007.346204.

Digital Library

[52]

R. M. Yoo, C. J. Hughes, K. Lai, and R. Rajwar. Performance evaluation of Intel transactional synchronization extensions for high-performance computing. In Proc. Intl Conf. for High Performance Computing, Networking, Storage and Analysis, SC, pages 19:1--19:11, 2013. 10.1145/2503210.2503232.

Digital Library

Cited By

Poudel PSharma G(2019)Adaptive Versioning in Transactional MemoriesStabilization, Safety, and Security of Distributed Systems10.1007/978-3-030-34992-9_22(277-295)Online publication date: 14-Nov-2019
https://doi.org/10.1007/978-3-030-34992-9_22
Poudel PSharma G(2021)Adaptive Versioning in Transactional Memory SystemsAlgorithms10.3390/a1406017114:6(171)Online publication date: 31-May-2021
https://doi.org/10.3390/a14060171
Chen DGibbons PMowry T(2020)TardisTMProceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3380536.3380538(1-10)Online publication date: 22-Feb-2020
https://dl.acm.org/doi/10.1145/3380536.3380538
Show More Cited By

Index Terms

What Scalable Programs Need from Transactional Memory
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

What Scalable Programs Need from Transactional Memory
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

Transactional memory (TM) has been the focus of numerous studies, and it is supported in processors such as the IBM Blue Gene/Q and Intel Haswell. Many studies have used the STAMP benchmark suite to evaluate their designs. However, the speedups obtained ...
What Scalable Programs Need from Transactional Memory
ASPLOS '17

Transactional memory (TM) has been the focus of numerous studies, and it is supported in processors such as the IBM Blue Gene/Q and Intel Haswell. Many studies have used the STAMP benchmark suite to evaluate their designs. However, the speedups obtained ...
Unbounded page-based transactional memory
Proceedings of the 2006 ASPLOS Conference

Exploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 45, Issue 1

Asplos'17

March 2017

812 pages

ISSN:0163-5964

DOI:10.1145/3093337

Editor:
Babak Falsafi
Interim

Issue’s Table of Contents

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
April 2017
856 pages
ISBN:9781450344654
DOI:10.1145/3037697
General Chairs:
Yunji Chen
Institute of Computing Technology, CAS, China
,
Olivier Temam
Google, USA
,
Program Chair:
John Carter
IBM, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017

Published in SIGARCH Volume 45, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,051
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)9

Reflects downloads up to 30 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Poudel PSharma G(2019)Adaptive Versioning in Transactional MemoriesStabilization, Safety, and Security of Distributed Systems10.1007/978-3-030-34992-9_22(277-295)Online publication date: 14-Nov-2019
https://doi.org/10.1007/978-3-030-34992-9_22
Poudel PSharma G(2021)Adaptive Versioning in Transactional Memory SystemsAlgorithms10.3390/a1406017114:6(171)Online publication date: 31-May-2021
https://doi.org/10.3390/a14060171
Chen DGibbons PMowry T(2020)TardisTMProceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3380536.3380538(1-10)Online publication date: 22-Feb-2020
https://dl.acm.org/doi/10.1145/3380536.3380538
Wu ZLu KWang RZhang W(2020)A survey on optimizations towards best-effort hardware transactional memoryCCF Transactions on High Performance Computing10.1007/s42514-020-00049-2Online publication date: 15-Sep-2020
https://doi.org/10.1007/s42514-020-00049-2
Li ZLiu LDeng YWang JLiu ZYin SWei S(2019)FPGA-Accelerated Optimistic Concurrency Control for Transactional MemoryProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358270(911-923)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358270
Kim JMathew AKashyap SRamanathan MMin CBahar IHerlihy MWitchel ELebeck A(2019)MV-RLUProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304040(779-792)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304040
Liu YKato SEdahiro M(2019)Optimization of the Load Balancing Policy for Tiled Many-Core ProcessorsIEEE Access10.1109/ACCESS.2018.28834157(10176-10188)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2018.2883415
Sutra PMarlier PSchiavoni VTrahay F(2018)Boosting Transactional Memory with Stricter SerializabilityCoordination Models and Languages10.1007/978-3-319-92408-3_11(231-251)Online publication date: 27-May-2018
https://doi.org/10.1007/978-3-319-92408-3_11
Fahmy SSenousy ZAmin A(2017)Providing QoS in contention management for software transactional memory2017 13th International Computer Engineering Conference (ICENCO)10.1109/ICENCO.2017.8289793(231-236)Online publication date: Dec-2017
https://doi.org/10.1109/ICENCO.2017.8289793

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents