Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory

Published: 20 January 2013 Publication History

Abstract

Our experimental study and analysis reveal that the bottlenecks of existing hardware transactional memory systems are largely rooted in the extra data movements in version management and in the inefficient scheduling of conflicting transactions in conflict management, particularly in the presence of high-contention and coarse-grained applications. In order to address this problem, we propose an integrated Pseudo-Associativity and Relaxed-Order approach to hardware Transactional Memory, called PARO-TM. It exploits the extra pseudo-associative space in the data cache to hold the new value of each transactional modification, and maintains the mappings between the old and new versions via an implicit pseudo-associative hash algorithm (i.e., by inverting the specific bit of the SET index). PARO-TM can branch out the speculative version from the old version upon each transactional modification on demand without a dedicated hardware component to hold the uncommitted data. This means that it is able to automatically access the proper version upon the transaction's commit or abort. Moreover, PARO-TM augments multi-version support in a chained directory to schedule conflicting transactions in a relaxed-order manner to further reduce their overheads. We compare PARO-TM with the state-of-the-art LogTM-SE, TCC, DynTM, and SUV-TM systems and find that PARO-TM consistently outperforms these four representative HTMs. This performance advantage of PARO-TM is far more pronounced under the high-contention and coarse-grained applications in the STAMP benchmark suite, for which PARO-TM is motivated and designed.

References

[1]
Adl-Tabatbai, A.-R., Shpeisman, T., and Gottsclich, J. 2011. Draft specification of transactional language constructs for c++. http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1613.pdf.
[2]
Agarwal, A. and Pudar, S. D. 1993. Column-Associative caches: A technique for reducing the miss rate of direct-mapped caches. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA'93). 179--190.
[3]
Ananian, C., Asanovic, K., Kuszmavi, B. C., Leiserson, C. E., and Lie, S. 2005. Unbounded transactional memory. In Proceedings of the 11th International Symposium on High Performance Computer Architecture (HPCA'05). 316--327.
[4]
Ansari, M., Kotselidis, C., Watson, I., Kirkham, C., Lujan, M., and Jarvis, K. 2008. Lee-TM: A non-trivial benchmark for transactional memory. In Proceedings of the 8th International Conference on Algorithms and Architectures for Parallel Processing. 196--207.
[5]
Armejach, A., Seyedi, A., Titos-Gil, R., Hur, I., Cristal, A., et al. 2011. Using a reconfigurable L1 data cache for efficient version management in hardware transactional memory. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT'11). 361--371.
[6]
Aydonat, U. and Abdelrahman, T. S. 2010. Hardware support for relaxed concurrency control in transactional memory. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MIRCO'10). 15--26.
[7]
Blake, G., Dreslinski, R., and Mudge, T. 2009. Proactive transactional scheduling for contention management. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 156--167.
[8]
Blake, G., Dreslinski, R., and Mudge, T. 2011. Bloom filter guided transaction scheduling. In Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA'11). 75--86.
[9]
Blundel, C., Devietti, J., Lewis, E. C., and Martin, M. M. K. 2007. Making the fast case common and the uncommon case simple in unbounded transactional memory. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07). 24--34.
[10]
Bobba, J., Goyal, N., Hill, M., Swift, M., and Wood, D. 2008. TokenTM: Efficient execution of large transactions with hardware transactional memory. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 127--138.
[11]
Bobba, J., Moore, K., Volos, H., Yen, L., Hill, M. D. et.al. 2007. Performance Pathologies in Hardware Transactional Memory. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA). 81--91.
[12]
Calder, B., Grunwald, D., and Emer, J. 1996. Predictive sequential associative cache. In Proceedings of the 2nd IEEE Symposium on High Performance Computer Architecture (HPCA'96). 244--253.
[13]
Carlstrom, B. D., McDonald, A., Carbin, M., Kozyrakis, C., and Olukotun, K. 2007. Transactional collection classes. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'07). 56--67.
[14]
Ceze, L., Tuck, J., and Torrellas, J. 2006. Bulk disambiguation of speculative threads in multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA'06). 227--238.
[15]
Chafi, H., Casper, J., Carlstrom, B. D., McDonald, A., Cao, C., et al. 2007. A scalable, non-blocking approach to transactional memory. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA'07). 97--108.
[16]
Chaiken, D., Fields, C., Kurihara, K., and Agarwal, A. 1990. Directory-Based cache coherence in large scale multiprocessors. Compt. 23, 6, 49-58.
[17]
Chaudhry, S. 2008. Rock: A third generation 65nm, 16-core, 32 thread + 32 scout-threads cmt sparc processor. In 20th HotChips Conference.
[18]
Chuang, W., Narayanasamy, S., Venkatesh, G., Sampson, J., Van Biesbrouck, M., et al. 2006. Unbounded page-based transactional memory. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). 347--358.
[19]
Chung, J., Chafi, H., Minh, C.C., McDonald, A., Carlstrom, B. D., et al. 2006a. The common case transactional behavior of multithreaded programs. In Proceedings of the 12th International Symposium on High Performance Computer Architecture (HPCA'06). 266--277.
[20]
Chung, J., Minh, C. C., McDonald, A., Skare, T., Chafi, H., et al. 2006b. Tradeoffs in transactional memory visualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). 371--381.
[21]
Click, C. 2009. Azul's experiences with hardware transactional memory. In Transactional Memory Workshop.
[22]
Colohan, C. B., Ailamaki, A., Steffan, J. G., and Mowry, T. C. 2006. Tolerating dependences between large speculative threads via sub-threads. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA'06). 216-226.
[23]
Dice, D., Lev, Y., Moir, M., Nussbaum, D., and Olszewski, M. 2009. Early experience with a commercial hardware transactional memory implementation. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). 157--168.
[24]
Dragojevic, A. and Guerraoui, R. 2010. Predicting the scalability of an stm a pragmatic approach. In Proceedings of the 5th ACM SIGPLAN Workshop on Transactional Computing (TRANSACT).
[25]
Fan, L., Cao, P., Almeida, J., and Broder, A. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8, 3, 281--293.
[26]
Garzaran, M. J., Prvulovic, M., Llaberia, J. M., Vinals, V., Rauchwerger, L., et al. 2003. Tradeoffs in buffering memory state for thread-level speculation in multiprocessors. In Proceedings of the 9th International Symposium on High Performance Computer Architecture (HPCA'03). 191-202.
[27]
Gopal, S., Vijaykumar, T., Smith, J., and Sohi, G. 1998. Speculative versioning cache. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA'98). 195--206.
[28]
Guerraoui, R. and Kapalka, M. 2010. Principles of Transactional Memory. Morgan and Claypool.
[29]
Hammond, L., Wong, V., Chen, M., Carlstrom, B. D., Davis, J. D., et al. 2004. Transactional memory coherence and consistency. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04). 102--113.
[30]
Haring, R. 2011. The IBM blue gene/q compute chip+simd floating-point unit. In Proceedings of the 23rd IEEE International Symposium on High Performance Chips (HotChips'11).
[31]
Harris, T., Larus, J., and Rajwar, R. 2010. Transactional Memory, 2nd ed. Morgan and Claypool.
[32]
Herlihy, M. and Moss, J. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA'93). 289--300.
[33]
Intel. 2012. Intel architecture instruction set extensions programming reference. http://software.intel.com/sites/default/files/m/a/b/3/4/d/41604-319433-012a.pdf.
[34]
Kestor, G., Stipic, S., Unsal, O., Cristal, A., Valero, M. 2009. RMS-TM: A transactional memory benchmark for recognition, mining and synthesis applications. In the 4th Workshop on Transactional Computing (TRANSACT'09).
[35]
Khan, B., Horsnell, M., Rogers, I., Lujan, M., Dinn, A., et al. 2008. An object-aware hardware transactional memory system. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications (HPCC'08). 93--102.
[36]
Lupon, M., Magklis, G., and Gonzalez, A. 2008. Version management alternatives for hardware transactional memory. In Proceeding of the 9th Workshop on Memory Performance: Dealing with Applications, Systems and Architecture. 69--76.
[37]
Lupon, M., Magklis, G., and Gonzalez, A. 2009. FasTM: A log-based hardware transactional memory with fast abort recovery. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT'09). 293--302.
[38]
Lupon, M., Magklis, G., and Gonzalez, A. 2010. A dynamically adaptable hardware transactional memory. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'10). 27--38.
[39]
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hailberg, G., et al. 2002. Simics: A full system simulation platform. IEEE Comput. 35, 50--58.
[40]
Martin, M., Sorin, D., Beckmann, B. M., Marty, M. R., Xu, M., et al. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 92--99.
[41]
McDonald, A., Chung, J., Carlstrom, B. D., Minh, C. C., Chafi, H., et al. 2006. Architectural semantics for practical transactional memory. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'06). 53--65.
[42]
Minh, C., Chung, J., Kozyrakis, C., and Olukotun, K. 2008. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the 4th IEEE International Symposium on Workload Characteristics (IISWC'08). 35--46.
[43]
Minh, C., Trautmann, M., Chung, J. W., McDonald, A., Bronson, N., et al. 2007. An effective hybrid transactional memory system with strong isolation guarantees. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07). 69--80.
[44]
Moore, K., Bobba, J., Moravan, M. J., Hill, M. D., and Wood, D. A. 2006. LogTM: Log-Based transactional memory. In Proceedings of the 12th IEEE Symposium on High Performance Computer Architecture (HPCA'06). 254-265.
[45]
Powell, M. D., Agarwal, A., Vijaykumar, T. N., Falsafi, B., and Roy, K. 2001. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'01). 54--65.
[46]
Rajwar, R. and Goodman, J. 2002. Transactional lock-free execution of lock-based programs. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'02). 5--17.
[47]
Rajwar, R., Herlihy, M., and Lai, K. 2005. Virtualizing transactional memory. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). 494--505.
[48]
Ramadan, H. E. Rossbach, C. J., and Witchel, E. 2008. Dependence-Aware transactional memory for increased concurrency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'08). 246--257.
[49]
Rossbach, C., Hofmann, O., and Witchel, E. 2010. Is transactional programming really easier. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10). 47--56.
[50]
Shriraman, A., Dwarkadas, S., and Scott, M. 2008. Flexible decoupled transactional memory support. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 139--150.
[51]
Shriraman, A., Spear, M., Hossain, H., Marathe, V. J., Dwarkadas, S., et al. 2007. An integrated hardware-software approach to flexible transactional memory. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07). 104--115.
[52]
Titos, R., Acacio, M. E., and Garcia, J. M. 2009. Speculation-Based conflict resolution in hardware transactional memory. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS'09). 1--12.
[53]
Titos-Gil, R., Negi, A., Acacio, M. E., Garcia, J. M., and Stenstrom, P. 2011. ZEBRA: A data-centric hybrid-policy hardware transactional memory design. In Proceedings of the 25th International Conference on Supercomputing (ICS'11). 53--62.
[54]
Tomic, S., Perfumo, C., Kulkarni, C., Armejach, A., Cristal, A., et al. 2009. Eazyhtm: Eager-lazy hardware transactional memory. In Proceedings the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 145--155.
[55]
Tremblay, N. and Chaudhry, S. 2008. A third-generation 65nm 16-core 32-thread + 32-scout-thread cmt sparc processor. In Digest of Technical Papers of IEEE International Solid-State Circuits Conference (ISSCC'08). 82--83.
[56]
Yan, Z., Jiang, H., Feng, D., Tian, L., and Tan, Y. 2012. SUV:A novel single update version-management scheme for hardware transactional memory systems. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS'12). 131--143.
[57]
Yen, L., Bobba, J., Marty, M. R., Moore, K. E., Volos, H., et al. 2007. LogTM-SE:Decoupling hardware transactional memory from caches. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA'07). 261--272.
[58]
Zhao, L., Choi, W., and Drapper, J. 2012. SEL-TM: Selective eager-lazy management for improved concurrency in transactional memory. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS'12). 95--106.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 9, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2013
876 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2400682
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2013
Accepted: 01 November 2012
Revised: 01 September 2012
Received: 01 June 2012
Published in TACO Volume 9, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chained directory
  2. chip multi-processor
  3. hardware transactional memory
  4. pseudo-associative cache

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 464
    Total Downloads
  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media