Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Hardware acceleration of transactional memory on commodity systems

Published: 05 March 2011 Publication History

Abstract

The adoption of transactional memory is hindered by the high overhead of software transactional memory and the intrusive design changes required by previously proposed TM hardware. We propose that hardware to accelerate software transactional memory (STM) can reside outside an unmodified commodity processor core, thereby substantially reducing implementation costs. This paper introduces Transactional Memory Acceleration using Commodity Cores (TMACC), a hardware-accelerated TM system that does not modify the processor, caches, or coherence protocol.
We present a complete hardware implementation of TMACC using a rapid prototyping platform. Using this hardware, we implement two unique conflict detection schemes which are accelerated using Bloom filters on an FPGA. These schemes employ novel techniques for tolerating the latency of fine-grained asynchronous communication with an out-of-core accelerator. We then conduct experiments to explore the feasibility of accelerating TM without modifying existing system hardware. We show that, for all but short transactions, it is not necessary to modify the processor to obtain substantial improvement in TM performance. In these cases, TMACC outperforms an STM by an average of 69% in applications using moderate-length transactions, showing maximum speedup within 8% of an upper bound on TM acceleration. Overall, we demonstrate that hardware can substantially accelerate the performance of an STM on unmodified commodity processors.

References

[1]
A.-R. Adl-Tabatabai, B. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In PLDI '06: ACM SIGPLAN Conference on Programming Language Design and Implementation, 2006.
[2]
W. Baek, C. Cao Minh, M. Trautmann, C. Kozyrakis, and K. Olukotun. The Open™ transactional application programming interface. In PACT '07: 16th Internation Conference on Parallel Architecture and Compilation Techniques, 2007.
[3]
B. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of ACM, 1970.
[4]
C. Blundell, J. Devietti, E. C. Lewis, and M. M. K. Martin. Making the fast case common and the uncommon case simple in unbounded transactional memory. In ISCA '07: 34th International Symposium on Computer Architecture, 2007.
[5]
J. Bobba, N. Goyal, M. Hill, M. Swift, and D. Wood. Tokentm: Efficient execution of large transactions with hardware transactional memory. In ISCA '08: 35th International Symposium on Computer Architecture, 2008.
[6]
C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC '08: Proc. The IEEE International Symposium on Workload Characterization, 2008.
[7]
C. Cao Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In ISCA '07: 34th International Symposium on Computer Architecture, 2007.
[8]
J. L. Carter and M. N. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18(2), 1979.
[9]
C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6(5), 2008.
[10]
L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: bulk enforcement of sequential consistency. In ISCA '07: 34th International Symposium on Computer architecture, 2007.
[11]
H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. Cao Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In HPCA '07: 13th International Symposium on High Performance Computer Architecture, 2007.
[12]
S. Chaudhry, R. Cypher, M. Ekman, M. Karlsson, A. Landin, S. Yip, H. Zeffer, and M. Tremblay. Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor. In ISCA '09: 36th Intl. Symposium on Computer Architecture, 2009.
[13]
L. Dalessandro, M. F. Spear, and M. L. Scott. NOrec: streamlining S™ by abolishing ownership records. In PPoPP '10: 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, 2010.
[14]
P. Damron, A. Fedorova, Y. Lev, V. Luchangco, M. Moir, and D. Nussbaum. Hybrid transactional memory. In ASPLOS '06: 12th Internation Conference on Architectural Support for Programming Languages and Operating Systems, October 2006.
[15]
D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In DISC '06: 20th Internation Symposium on Distributed Computing, 2006.
[16]
A. Dragojević, R. Guerraoui, and M. Kapalka. Stretching transactional memory. In PLDI '09: ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.
[17]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In ISCA '04: 31st International Symposium on Computer Architecture, 2004.
[18]
T. Harris and K. Fraser. Language support for lightweight transactions. In OOPSLA '03: 18th ACM SIGPLAN Conference on Object-oriented Programing, Systems, Languages, and Applications, 2003.
[19]
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA '93: 20th International Symposium on Computer Architecture, 1993.
[20]
O. S. Hofmann, C. J. Rossbach, and E. Witchel. Maximum benefit from a minimal H™. In ASPLOS '09: 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009.
[21]
S. Hong, T. Oguntebi, J. Casper, N. Bronson, C. Kozyrakis, and K. Olukotun. Eigenbench: A simple exploration tool for orthogonal tm characteristics. In IISWC '10: International Symposium on Workload Characterization, 2010.
[22]
S. Kumar, M. Chu, C. J. Hughes, P. Kundu, and A. Nguyen. Hybrid transactional memory. In PPoPP '06: 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006.
[23]
J. Larus and R. Rajwar. Transactional Memory. Morgan Claypool Synthesis Series, 2006.
[24]
M. Lupon, G. Magklis, and A. González. FAS™: A log-based hardware transactional memory with fast abort recovery. In PACT '09: 18th International Conference on Parallel Architecture and Compilation Techniques, 2009.
[25]
V. J. Marathe, W. N. Scherer III, and M. L. Scott. Adaptive Software Transactional Memory. In DISC '05: 19th International Symposium on Distributed Computing, 2005.
[26]
S. S. Mukherjee, B. Falsafi, M. D. Hill, and D. A. Wood. Coherent network interfaces for fine-grain communication. In ISCA '96: 23rd International Symposium on Computer Architecture, 1996.
[27]
T. Oguntebi, S. Hong, J. Casper, N. Bronson, C. Kozyrakis, and K. Olukotun. FARM: A prototyping environment for tightly-coupled, heterogeneous architectures. In FCCM '10: 18th Symposium on Field-Programmable Custom Computing Machines, 2010.
[28]
M. Olszewski, J. Cutler, and J. G. Steffan. JudoS™: A dynamic binary-rewriting approach to software transactional memory. In PACT '07: 16th International Conference on Parallel Architecture and Compilation Techniques.
[29]
H. E. Ramadan, C. J. Rossbach, D. E. Porter, O. S. Hofmann, A. Bhandari, and E. Witchel. Metatm/txlinux: transactional memory for an operating system. SIGARCH Computer Architecture News, 35(2), 2007.
[30]
B. Saha, A. Adl-Tabatabai, and Q. Jacobson. Architectural support for software transactional memory. In MICRO '06: International Symposium on Microarchitecture, 2006.
[31]
B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. Cao Minh, and B. Hertzberg. McRT-S™: A high performance software transactional memory system for a multi-core runtime. In PPoPP '06: 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006.
[32]
T. Shpeisman, V. Menon, A.-R. Adl-Tabatabai, S. Balensiefer, D. Grossman, R. L. Hudson, K. Moore, and B. Saha. Enforcing isolation and ordering in stm. In PLDI '07: Conference on Programming Language Design and Implementation, 2007.
[33]
A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible decoupled transactional memory support. In ISCA '08: 35th International Symposium on Computer Architecture, 2008.
[34]
A. Shriraman, M. F. Spear, H. Hossain, V. J. Marathe, S. Dwarkadas, and M. L. Scott. An integrated hardware-software approach to flexible transactional memory. SIGARCH Computer Architecture News, 35, June 2007.
[35]
M. F. Spear. Lightweight, robust adaptivity for software transactional memory. In SPAA '10: 22nd ACM Symposium on Parallelism in Algorithms and Architectures, 2010.
[36]
M. F. Spear, M. M. Michael, and C. von Praun. RingS™: scalable transactions with a single atomic instruction. In SPAA '08: 20th Symposium on Parallelism in Algorithms and Architectures, 2008.
[37]
STAMP: Stanford transactional applications for multi-processing. http://stamp.stanford.edu.
[38]
F. Tabba, M. Moir, J. R. Goodman, A. Hay, and C. Wang. NZ™: Nonblocking zero-indirection transactional memory. In SPAA '09: 21st Symposium on Parallelism in Algorithms and Architectures, 2009.
[39]
C. Wang, W.-Y. Chen, Y. Wu, B. Saha, and A.-R. Adl-Tabatabai. Code generation and optimization for transactional memory constructs in an unmanaged language. In CGO '07: International Symposium on Code Generation and Optimization, 2007.
[40]
L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling Hardware Transactional Memory from Caches. In HPCA '07: 13th International Symposium on High Performance Computer Architecture, 2007.
[41]
L. Yen, S. Draper, and M. Hill. Notary: Hardware techniques to enhance signatures. In MICRO '08: 41st International Symposium on Microarchitecture, 2008.

Cited By

View all
  • (2017)Hardware transactional memory architecture with adaptive version management for multi-processor FPGA platformsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2016.12.00673:C(42-52)Online publication date: 1-Feb-2017
  • (2016)A Hardware Approach to Detect, Expose and Tolerate High Level Data Races2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)10.1109/PDP.2016.57(159-167)Online publication date: Feb-2016
  • (2016)Asymmetric Allocation in a Shared Flexible Signature Module for Multicore ProcessorsThe Computer Journal10.1093/comjnl/bxw01059:10(1453-1469)Online publication date: 17-Mar-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 46, Issue 3
ASPLOS '11
March 2011
407 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1961296
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
    March 2011
    432 pages
    ISBN:9781450302661
    DOI:10.1145/1950365
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2011
Published in SIGPLAN Volume 46, Issue 3

Check for updates

Author Tags

  1. fpga
  2. hardware acceleration
  3. transactional memory

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Hardware transactional memory architecture with adaptive version management for multi-processor FPGA platformsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2016.12.00673:C(42-52)Online publication date: 1-Feb-2017
  • (2016)A Hardware Approach to Detect, Expose and Tolerate High Level Data Races2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)10.1109/PDP.2016.57(159-167)Online publication date: Feb-2016
  • (2016)Asymmetric Allocation in a Shared Flexible Signature Module for Multicore ProcessorsThe Computer Journal10.1093/comjnl/bxw01059:10(1453-1469)Online publication date: 17-Mar-2016
  • (2020)Chronos: Efficient Speculative Parallelism for AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378454(1247-1262)Online publication date: 9-Mar-2020
  • (2020)GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00078(908-921)Online publication date: Oct-2020
  • (2019)FPGA-Accelerated Optimistic Concurrency Control for Transactional MemoryProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358270(911-923)Online publication date: 12-Oct-2019
  • (2018)High-Performance GPU Transactional Memory via Eager Conflict Detection2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00029(235-246)Online publication date: Feb-2018
  • (2017)FPGA-Accelerated Transactional Execution of Graph WorkloadsProceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3020078.3021743(227-236)Online publication date: 22-Feb-2017
  • (2012)A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardwareProceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis10.1145/2380445.2380524(513-520)Online publication date: 7-Oct-2012
  • (2012)Sandboxing transactional memoryProceedings of the 21st international conference on Parallel architectures and compilation techniques10.1145/2370816.2370843(171-180)Online publication date: 19-Sep-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media