Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

A Filtering Mechanism to Reduce Network Bandwidth Utilization of Transaction Execution

Published: 04 January 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Hardware Transactional Memory (HTM) relies heavily on the on-chip network for intertransaction communication. However, the network bandwidth utilization of transactions has been largely neglected in HTM designs. In this work, we propose a cost model to analyze network bandwidth in transaction execution. The cost model identifies a set of key factors that can be optimized through system design to reduce the communication cost of HTM. Based on the model and network traffic characterization of a representative HTM design, we identify a huge source of superfluous traffic due to failed requests in transaction conflicts. As observed in a spectrum of workloads, 39% of the transactional requests fail due to conflicts, which renders 58% of the transactional network traffic futile. To combat this pathology, a novel in-network filtering mechanism is proposed. The on-chip router is augmented to predict conflicts among transactions and proactively filter out those requests that have a high probability to fail. Experimental results show the proposed mechanism reduces total network traffic by 24% on average for a set of high-contention TM applications, thereby reducing energy consumption by an average of 24%. Meanwhile, the contention in the coherence directory is reduced by 68%, on average. These improvements are achieved with only 5% area added to a conventional on-chip router design.

    References

    [1]
    Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K. Jha. 2009a. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of International Symposium on Performance Analysis of Systems and Software.
    [2]
    Niket Agarwal, Li-Shiuan Peh, and Niraj K. Jha. 2009b. In-network coherence filtering: Snoopy coherence without broadcasts. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, 232--243.
    [3]
    James Balfour and William J. Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS’06).
    [4]
    Nick Barrow-Williams, Christian Fensch, and Simon Moore. 2010. Proximity coherence for chip multiprocessors. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 123--134.
    [5]
    Daniel U. Becker. 2012. Efficient Microarchitecture for Network-on-Chip Routers. Ph.D. Dissertation. Stanford University.
    [6]
    Geoffrey Blake, Ronald G. Dreslinski, and Trevor Mudge. 2009. Proactive transaction scheduling for contention management. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42).
    [7]
    Geoffrey Blake, Ronald G. Dreslinski, and Trevor Mudge. 2011. Bloom filter guided transaction scheduling. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA’11).
    [8]
    Colin Blundell, Joe Devietti, E. Christopher Lewis, and Milo M. K. Martin. 2007. Making the fast case common and the uncommon case simple in unbounded transactional memory. In Proceedings of 34th International Symposium on Computer Architecture.
    [9]
    Hassan Chafi, Jared Casper, Brian D. Carlstrom, Austen McDonald, Chi Cao Minh, Woongki Baek, Christos Kozyrakis, and Kunle Olukotun. 2007. A scalable, non-blocking approach to transactional memory. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07).
    [10]
    Natalie Enright Jerger and Li-Shiuan Peh. 2009. On-Chip Networks (1st ed.). Morgan Claypool.
    [11]
    Natalie Enright Jerger, Li-Shiuan Peh, and Mikko H. Lipasti. 2008. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41). IEEE Computer Society, Washington, DC, 35--46.
    [12]
    Ruud Haring, Martin Ohmacht, Thomas Fox, Michael Gschwind, David Satterfield, Krishnan Sugavanam, Paul Coteus, Philip Heidelberger, Matthias Blumrich, Robert Wisniewski, alan gara, George Chiu, Peter Boyle, Norman Chist, and Changhoan Kim. 2012. The IBM blue gene/Q compute chip. IEEE Micro 32, 2 (March 2012), 48--60.
    [13]
    Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA’93). ACM, New York, NY, 289--300.
    [14]
    Intel. 2012. Intel Itanium Processor 9500 Series Reference Manual. Retrieved from http://www.intel.com/content/www/us/en/processors/itanium/itanium-9500-reference-manual.html.
    [15]
    Marc Lupon, Grigorios Magklis, and Antonio Gonzalez. 2009. FASTM: A log-based hardware transactional memory with fast abort recovery. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques.
    [16]
    Marc Lupon, Grigorios Magklis, and Antonio Gonzalez. 2010. A dynamically adaptable hardware transactional memory. In Proceedings of the 43rd International Symposium on Microarchitecture.
    [17]
    Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2 (February 2002), 50--58.
    [18]
    Milo M. K. Martin, Pacia J. Harper, Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2003. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In Computer Architecture, 2003. Proceedings of the 30th Annual International Symposium on Computer ARchitecture. 206--217.
    [19]
    Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33 (November 2005). Issue 4.
    [20]
    Michael R. Marty and Mark D. Hill. 2007. Virtual hierarchies to support server consolidation. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 46--56.
    [21]
    Chi Cao Minh, JaeWoong Chung, C. Kozyrakis, and K. Olukotun. 2008. STAMP: Stanford transactional applications for multi-processing. In Proceedings of International Symposium on Workload Characterization.
    [22]
    Andreas Moshovos, Gokhan Memik, Alok Choudhary, and Babak Falsafi. 2001. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA’01). IEEE Computer Society, Washington, DC, 85.
    [23]
    Naveen Muralimanohar, Rajeev Balasubramonian, and Norm Jouppi. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th International Symposium on Microarchitecture.
    [24]
    Anurag Negi, Ruben Titos-Gil, Manuel E. Acacio, Jose M. Garcia, and Per Stenstrom. 2012. pi-TM: Pessimistic invalidation for scalable lazy hardware transactional memory. In Proceedings of the 18th International Symposium on High Performance Computer Architecture.
    [25]
    Ravi Rajwar and James R. Goodman. 2002. Transactional lock-free execution of lock-based programs. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X).
    [26]
    Ravi Rajwar, Maurice Herlihy, and Konrad Lai. 2005. Virtualizing transactional memory. In Proceedings of the 32Nd Annual International Symposium on Computer Architecture (ISCA’05).
    [27]
    Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Bhandari Aditya, and Emmett Witchel. 2007. TxLinux: Using and managing hardware transactional memory in an operating system. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07).
    [28]
    Valentina Salapura, Matthias Blumrich, and Alan Gara. 2008. Design and implementation of the blue gene/P snoop filter. In Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture. 5--14.
    [29]
    Daniel Sanchez, Luke Yen, Mark D. Hill, and Karthikeyan Sankaralingam. 2007. Implementing signatures for transactional memory. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40).
    [30]
    William N. Scherer III and Michael L. Scott. 2005. Advanced contention management for dynamic software transactional memory. In Proceedings of the 24th Symposium on Principles of Distributed Computing.
    [31]
    Arrvindh Shriraman and Sandhya Dwarkadas. 2009. Refereeing conflicts in hardware transactional memory. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09).
    [32]
    Arrvindh Shriraman, Sandhya Dwarkadas, and Michael L. Scott. 2008. Flexible decoupled transactional memory support. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08).
    [33]
    Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott. 2007. Privatization techniques for software transactional memory. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). ACM, New York, NY, 338--339.
    [34]
    Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT - A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the 2012 6th IEEE/ACM International Symposium on Networks on Chip.
    [35]
    Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, and Mateo Valero. 2009. EazyHTM: Eager-lazy hardware transactional memory. In Proceedings of the 42nd International Symposium on Microarchitecture.
    [36]
    Marc Tremblay and Shailender Chaudhry. 2008. A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor. In Proceedings of the IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
    [37]
    Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raul Silvera, and Maged Michael. 2012. Evaluation of blue gene/Q hardware support for transactional memories. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12).
    [38]
    Luke Yen, Jayaram Bobba, Michael R. Marty, Kevin E. Moore, Haris Volos, Mark D. Hill, Michael M. Swift, and David A. Wood. 2007. LogTM-SE: Decoupling hardware transactional memory from caches. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07).
    [39]
    Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. 2013. Performance evaluation of Intel&Reg; transactional synchronization extensions for high-performance computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’13).
    [40]
    Richard M. Yoo and Hsien-Hsin S. Lee. 2008. Adaptive transaction scheduling for transactional memory systems. In Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’08).
    [41]
    Lihang Zhao, Lizhong Chen, and Jeffrey Draper. 2014. Mitigating the mismatch between the coherence protocol and conflict detection in hardware transactional memory. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS’14). IEEE Computer Society, Washington, DC, 605--614.
    [42]
    Lihang Zhao, Woojin Choi, Lizhong Chen, and Jeffrey Draper. 2013. In-network traffic regulation for transactional memory. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13).
    [43]
    Lihang Zhao, Woojin Choi, and Jeffreys Draper. 2012a. SEL-TM: Selective eager-lazy management for improved concurrency in transactional memory. In Proceedings of the IEEE 26th International Parallel Distributed Processing Symposium (IPDPS). 95--106.
    [44]
    Lihang Zhao, Woojin Choi, and Jeffrey Draper. 2012b. TMNOC: A case of HTM and NoC co-design for increased energy efficiency and concurrency. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY.
    [45]
    Lihang Zhao and Jeffrey Draper. 2014. Consolidated conflict detection for hardware transactional memory. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 201--212.

    Index Terms

    1. A Filtering Mechanism to Reduce Network Bandwidth Utilization of Transaction Execution

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 4
      January 2016
      848 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/2836331
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 January 2016
      Accepted: 01 October 2015
      Revised: 01 October 2015
      Received: 01 April 2015
      Published in TACO Volume 12, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Transactional memory
      2. communication cost modeling
      3. energy efficiency
      4. network traffic
      5. on-chip network

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 361
        Total Downloads
      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 30 Jul 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media