tutorial

Public Access

Accelerating GPU Hardware Transactional Memory with Snapshot Isolation

Authors:

Samuel IrvingAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 45, Issue 2

Pages 282 - 294

https://doi.org/10.1145/3140659.3080204

Published: 24 June 2017 Publication History

Abstract

Snapshot Isolation (SI) is an established model in the database community, which permits write-read conflicts to pass and aborts transactions only on write-write conflicts. With the Write Skew anomaly correctly eliminated, SI can reduce the occurrence of aborts, save the work done by transactions, and greatly benefit long transactions involving complex data structures.

GPUs are evolving towards a general-purpose computing device with growing support for irregular workloads, including transactional memory. The usage of snapshot isolation on transactional memory has proven to be greatly beneficial for performance. In this paper, we propose a multi-versioned memory subsystem for hardware-based transactional memory on the GPU, with a method for eliminating the Write Skew anomaly on the fly, and finally incorporate Snapshot Isolation with this system.

The results show that snapshot isolation can effectively boost the performance of dynamically sized data structures such as linked lists, binary trees and red-black trees, sometimes by as much as 4.5x, which results in improved overall performance of benchmarks utilizing these data structures.

References

[1]

Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. 1995. A Critique of ANSI SQL Isolation Levels. In Proceedings of the 1995 ACM International Conference on Management of Data (SIGMOD).

Digital Library

[2]

Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2009. Serializable Isolation for Snapshot Databases. In ACM Transactions on Database Systems (TODS), Vol. 34. 1--42.

Digital Library

[3]

George. C. Caragea, Fuat Keceli, Alexadros Tzannes, and Uzi Vishkin. 2010. General-Purpose vs. GPU: Comparison of Many-Cores on Irregular Workloads. In Proceedings of the Second Usenix Workshop on Hot Topics in Parallelism. http://www.usenix.org/event/hotpar10/final

[4]

Sui Chen and Lu Peng. 2016. Efficient GPU Hardware Transactional Memory through Early Conflict Resolution. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 274--284.

[5]

Alan Fekete, Dimitrios Liarokapis, Elizabeth O'Neil, Patrick O'Neil, and Dennis Shasha. 2005. Making Snapshot Isolation Serializable. In ACM Transactions on Database Systems (TODS), Vol. 30. 492--528.

Digital Library

[6]

Wilson W. L. Feng. 2013. GPGPU-Sim 3.2.1. http://www.ece.ubc.ca/~wwlfung/code/kilotm-gpgpu_sim.tgz. (2013). Retrieved on 2015-05-30.

[7]

Michael Ferdman, Pejman Lotfi-Kamran, Ken Balet, and Babak Falsafi. 2011. Cuckoo Directory: A Scalable Directory for Many-Core Systems. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA).

Digital Library

[8]

Wilson W. L. Fung and Tor M. Aamodt. 2013. Energy efficient GPU transactional memory via space-time optimizations. In Proceedings of the 46th International Symposium on Microarchitecture (MICRO).

Digital Library

[9]

Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, and Tor M. Aamodt. 2011. Hardware transactional memory for GPU architectures. In Proceedings of the 44th International Symposium on Microarchitecture(MICRO).

Digital Library

[10]

Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional Memory: Architectural Support For Lock-free Data Structures. IEEE Computers Society Press. 289--300 pages.

Digital Library

[11]

Maurice Herlihy, Victor Luchangco, and Mark Moir. 2006. A Flexible Framework for Implementing Software Transactional Memory. In Proceedings of the 21th ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA).

Digital Library

[12]

Anup Holey and Antonia Zhai. 2014. Lightweight Software Transactions on GPUs. Proceedings of the 43rd International Conference on Parallel Processing (ICPP) (Sep 2014).

Digital Library

[13]

Kevin Hsieh, Eiman Ebrahim, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 204--216.

Digital Library

[14]

Intel Corporation. 2016. Chapter 8, Intel Transactional Synchronization Extensions. (2016).

[15]

Syed Ali Raza Jafri, Gwendolyn Voskuilen, and T. N. Vijaykumar. 2013. Wait-n-GoTM: Improving HTM Performance by Serializing Cyclic Dependencies. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[16]

David R. Karger. 1993. Global Min-cuts in RNC, and Other Ramifications of a Simple Min-out Algorithm. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '93). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 21--30. http://dl.acm.org/citation.cfm?id=313559.313605

Digital Library

[17]

HP Labs. 2009. CACTI 5.3. http://quid.hpl.hp.com:9081/cacti/. (2009). Retrieved on 2016-07-01.

[18]

Heiner Litz, David Cheriton, Amin Firoozshahian, Omid Azizi, and John P. Stevenson. 2014. SI-TM: Reducing Transactional Memory Abort Rates Through Snapshot Isolation. In Proceedings of the 19th international conference on Architectural Support for programming Languages and Operating Systems (ASPLOS).

Digital Library

[19]

Heiner Litz, Richardo J. Dias, and David R. Cheriton. 2014. Efficient Correction of Anomalies in Snapshot Isolation Transactions. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (2014), 1--24.

Digital Library

[20]

Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N. Scherer III, and Michael L. Scott. 2006. Lowering the Overhead of Nonblocking Software Transactional Memory. In Tech Report, Dept. of Computer Science, Univ. of Rochester.

[21]

Chí Cao. Minh, JaeWoong. Chung, Christos Kozyrakis, and Kunle Olukotun. 2008. STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC).

[22]

Prabhakar Misra and Mainak Chaudhuri. 2012. Performance Evaluation of Concurrent Lock-free Data Structures on GPUs. 18th International Conference on Parallel and Distributed Systems (ICPADS) (Dec 2012).

Digital Library

[23]

Anurag Negi, Per Stenstrom, Manuel E. Acacio, Rubén Titos-Gil, and José M. Garcia. 2011. π-TM: Pessimistic invalidation for scalable lazy hardware transactional memory. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT).

Digital Library

[24]

Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling Ways and Associativity. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

Digital Library

[25]

Daniel Sanchez and Christos Kozyrakis. 2012. SCD: A scalable coherence directory with flexible sharer set encoding. In Proceedings of the 18th IEEE International Symposium on High-Performance Computer Architecture (HPCA).

Digital Library

[26]

Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. RowClone: Fast and Energy-Efficient in-DRAM Bulk Data Copy and Initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 186--197.

Digital Library

[27]

Michael F. Spear, Maged M. Michael, and Christoph von Praun. 2008. RingSTM: Scalable Transactions with a Single Atomic Instruction. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, New York, NY, USA, 275--284.

Digital Library

[28]

Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, and Mateo Valero. 2009. EazyHTM: EAger-LaZY hardware Transactional Memory. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 145--155.

Digital Library

[29]

Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar. 2010. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[30]

Yunlong Xu, Rui Wang, Nilanjan Goswami, Tao Li, Lan Gao, and Depei Qian. 2014. Software Transactional Memory for GPU Architectures. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). 49 --52.

Digital Library

[31]

Lihang Zhao and Jeffrey Draper. 2014. Consolidated Conflict Detection for Hardware Transactional Memory. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT). 201--212.

Digital Library

Cited By

Salamanca JBaldassin A(2024)Using Hardware-Transactional-Memory Support to Implement Speculative Task ExecutionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104939(104939)Online publication date: Jun-2024
https://doi.org/10.1016/j.jpdc.2024.104939
Nunes DCastro DRomano P(2023)CSMV: A highly scalable multi-versioned software transactional memory for GPUsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.002180(104701)Online publication date: Oct-2023
https://doi.org/10.1016/j.jpdc.2023.04.002
Nunes DCastro DRomano P(2022)CSMV: A Highly Scalable Multi-Versioned Software Transactional Memory for GPUs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00057(526-536)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00057
Show More Cited By

Index Terms

Accelerating GPU Hardware Transactional Memory with Snapshot Isolation
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Computing methodologies
  1. Concurrent computing methodologies

Recommendations

Accelerating GPU Hardware Transactional Memory with Snapshot Isolation
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Snapshot Isolation (SI) is an established model in the database community, which permits write-read conflicts to pass and aborts transactions only on write-write conflicts. With the Write Skew anomaly correctly eliminated, SI can reduce the occurrence ...
A critique of snapshot isolation
EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems

The support for transactions is an essential part of a database management system (DBMS). Without this support, the developers are burdened with ensuring atomic execution of a transaction despite failures as well as concurrent accesses to the database ...
Efficient Correction of Anomalies in Snapshot Isolation Transactions

Transactional memory systems providing snapshot isolation enable concurrent access to shared data without incurring aborts on read-write conflicts. Reducing aborts is extremely relevant as it leads to higher concurrency, greater performance, and better ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 45, Issue 2

ISCA'17

May 2017

715 pages

ISSN:0163-5964

DOI:10.1145/3140659

Editor:
Babak Falsafi
Interim

Issue’s Table of Contents

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
June 2017
736 pages
ISBN:9781450348928
DOI:10.1145/3079856

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2017

Published in SIGARCH Volume 45, Issue 2

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
850
Total Downloads

Downloads (Last 12 months)115
Downloads (Last 6 weeks)19

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Salamanca JBaldassin A(2024)Using Hardware-Transactional-Memory Support to Implement Speculative Task ExecutionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104939(104939)Online publication date: Jun-2024
https://doi.org/10.1016/j.jpdc.2024.104939
Nunes DCastro DRomano P(2023)CSMV: A highly scalable multi-versioned software transactional memory for GPUsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.002180(104701)Online publication date: Oct-2023
https://doi.org/10.1016/j.jpdc.2023.04.002
Nunes DCastro DRomano P(2022)CSMV: A Highly Scalable Multi-Versioned Software Transactional Memory for GPUs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00057(526-536)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00057
Chen WHsiu PKuo T(2019)Enabling Failure-resilient Intermittently-powered Systems Without Runtime CheckpointingProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317816(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317816
Irving SChen SPeng LBusch CHerlihy MMichael C(2019)CUDA-DTM: Distributed Transactional Memory for GPU ClustersNetworked Systems10.1007/978-3-030-31277-0_12(183-199)Online publication date: 14-Sep-2019
https://doi.org/10.1007/978-3-030-31277-0_12
Zhang WZhao CPeng LLin YZhang FLu YDehnavi MKulkarni MKrishnamoorthy S(2023)Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based SynchronizationProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577474(1-13)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577474
Gao LWang JZhang W(2022)Adaptive Contention Management for Fine-Grained Synchronization on Commodity GPUsACM Transactions on Architecture and Code Optimization10.1145/354730119:4(1-21)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3547301
Miller dNelson JHassan APalmieri RWassermann BMalka MChidambaram VRaz D(2021)KVCGProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463779(1-12)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1145/3456727.3463779
Chen SLiu LZhang WPeng L(2020)Architectural Support for NVRAM Persistence in GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.296023331:5(1107-1120)Online publication date: 1-May-2020
https://doi.org/10.1109/TPDS.2019.2960233
Nelson JMiller dPalmieri R(2020)Don't forget about synchronization! Guidelines for using locks on graphics processing unitsConcurrency and Computation: Practice and Experience10.1002/cpe.575734:2Online publication date: 13-Apr-2020
https://doi.org/10.1002/cpe.5757
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents