Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2581122.2544139acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
tutorial

Software Transactional Memory for GPU Architectures

Published: 15 February 2014 Publication History

Abstract

Modern GPUs have shown promising results in accelerating computation intensive and numerical workloads with limited dynamic data sharing. However, many real-world applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, lock-based synchronization requires significant programming efforts to achieve functional correctness. The massive multithreading and SIMT execution paradigm of GPUs further extend the challenges of GPU locks.
To make applications with dynamic data sharing benefit from GPU acceleration, we propose a novel software transactional memory system for GPU architectures (GPU-STM). The major challenges include ensuring good scalability with respect to the massive multithreading of GPUs, and preventing livelocks caused by the SIMT execution paradigm of GPUs. To this end, we propose (1) a hierarchical validation technique and (2) an encounter-time lock-sorting mechanism to deal with the two challenges, respectively. We build our GPU-STM prototype based on the commercially available GPU platform and runtime. Our real system based evaluation shows that GPU-STM outperforms coarse-grain locks on GPUs by up to 20x.

References

[1]
Khronos OpenCL, http://www.khronos.org/opencl/, 2013.
[2]
C. Blundell, E. C. Lewis, and M. M. K. Martin. Subtleties of transactional memory atomicity semantics. IEEE Computer Architecture Letters (CAL), 5(2), 2006.
[3]
D. Cederman, P. Tsigas, and M. T. Chaudhry. Towards a Software Transactional Memory for Graphics Processors. In Proc. of the Eurographics Symp. on Parallel Graphics and Visualization (EGPGV), 2010.
[4]
L. Dalessandro and M. L. Scott. Sandboxing Transactional Memory. In Proc. of 21th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2012.
[5]
L. Dalessandro, M. F. Spear, and M. L. Scott. NOrec: Streamlining STM by Abolishing Ownership Records. In Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP), pages 67--78, 2010.
[6]
D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In Proc. of the 20th Intl. Symp. on Distributed Computing (DISC), pages 194--208, 2006.
[7]
W. W. L. Fung, I. Singh, A. Brownsword, and T. M. Aamodt. Hardware Transactional Memory for GPU Architectures. In Proc. of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture (MICRO), pages 296--307, 2011.
[8]
R. Guerraoui and M. Kapalka. On the Correctness of Transactional Memory. In Proc. of the 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP), pages 175--184, 2008.
[9]
P. Harish, V. Vineet, and P. J. Narayanan. Large graph algorithms for massively multithreaded architectures. Technical Report IIIT/TR/2009/74, IIIT Hyderabad, INDIA, 2009.
[10]
B. He and J. X. Yu. High-throughput transaction executions on graphics processors. In Proc. of the VLDB Endowment (PVLDB), pages 314--325, 2011.
[11]
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proc. 20th Annual Intl. Symp. on Computer Architecture (ISCA), pages 289--300, 1993.
[12]
S. Hong, T. Oguntebi, J. Casper, N. Bronson, C. Kozyrakis, and K. Olukotun. Eigenbench: A simple exploration tool for orthogonal TM characteristics. In Proc. of IEEE Intl. Symp. on Workload Characterization (IISWC), 2010.
[13]
D. B. Lomet. Process structuring, synchronization, and recovery using atomic actions. In Proc. of the ACM Conference on Language Design for Reliable Software, pages 128--137, 1977.
[14]
M. Mendez-Lojo, M. Burtscher, and K. Pingali. A GPU Implementation of Inclusion-based Points-to Analysis. In Proc. of the 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP), pages 107--116, 2012.
[15]
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In Proc. of IEEE Intl. Symp. on Workload Characterization (IISWC), 2008.
[16]
R. Nasre, M. Burtscher, and K. Pingali. Atomic-free Irregular Computations on GPUs. In Proc. of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU), pages 96--107, 2013.
[17]
NVIDIA CUDA. CUDA C Programming Guide Version 4.2.
[18]
M. Olszewski, J. Cutler, and J. G. Steffan. JudoSTM: A Dynamic Binary-Rewriting Approach to Software Transactional Memory. In Proc. of 16th Intl. Conf. on Parallel Architecture and Compilation Techniques (PACT), 2007.
[19]
A. Ramamurthy. Towards Scalar Synchronization in SIMT Architectures. Master's thesis, University of British Columbia, 2011.
[20]
J. Sanders and E. Kandrot. CUDA by Example, An Introduction to General Purpose GPU Programming, chapter 9. Addison-Wesley Professional, 2010.
[21]
N. Shavit and D. Touitou. Software transactional memory. In Proc. of the 14th ACM Symp. on Principles of Distributed Computing (PODC), pages 204--213, 1995.
[22]
M. F. Spear, M. M. Michael, M. L. Scott, and P. Wu. Reducing Memory Ordering Overheads in Software Transactional Memory. In Proc. of the 7th annual IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO), pages 13--24, 2009.
[23]
Y. Xu, R. Wang, N. Goswami, T. Li, and D. Qian. Software Transactional Memory for GPU Architectures. IEEE Computer Architecture Letters (CAL), 2013.

Cited By

View all
  • (2024)Using Hardware-Transactional-Memory Support to Implement Speculative Task ExecutionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104939(104939)Online publication date: Jun-2024
  • (2023)Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based SynchronizationProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577474(1-13)Online publication date: 25-Feb-2023
  • (2022)High performance GPU concurrent B+treeProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508419(443-444)Online publication date: 2-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
February 2014
328 pages
ISBN:9781450326704
DOI:10.1145/2581122
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. General-Purpose GPU Computing
  2. Parallel Programming
  3. Software Transactional Memory

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CGO '14

Acceptance Rates

CGO '14 Paper Acceptance Rate 29 of 100 submissions, 29%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Using Hardware-Transactional-Memory Support to Implement Speculative Task ExecutionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104939(104939)Online publication date: Jun-2024
  • (2023)Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based SynchronizationProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577474(1-13)Online publication date: 25-Feb-2023
  • (2022)High performance GPU concurrent B+treeProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508419(443-444)Online publication date: 2-Apr-2022
  • (2020)Thread-Level Locking for SIMT ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.295570531:5(1121-1136)Online publication date: 1-May-2020
  • (2019)Efficient GPU NVRAM Persistence with Helper WarpsProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317810(1-6)Online publication date: 2-Jun-2019
  • (2019)Fast Fine-Grained Global Synchronization on GPUsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304055(793-806)Online publication date: 4-Apr-2019
  • (2019)Engineering a high-performance GPU B-TreeProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295706(145-157)Online publication date: 16-Feb-2019
  • (2019)HeTM: Transactional Memory for Heterogeneous Systems2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00026(232-244)Online publication date: Sep-2019
  • (2019)Efficient Inspected Critical Sections in Data-Parallel GPU CodesLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_15(223-239)Online publication date: 15-Nov-2019
  • (2019)CUDA-DTM: Distributed Transactional Memory for GPU ClustersNetworked Systems10.1007/978-3-030-31277-0_12(183-199)Online publication date: 14-Sep-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media