Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2451116.2451119acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

DeNovoND: efficient hardware support for disciplined non-determinism

Published: 16 March 2013 Publication History

Abstract

Recent work has shown that disciplined shared-memory programming models that provide deterministic-by-default semantics can simplify both parallel software and hardware. Specifically, the DeNovo hardware system has shown that the software guarantees of such models (e.g., data-race-freedom and explicit side-effects) can enable simpler, higher performance, and more energy-efficient hardware than the current state-of-the-art for deterministic programs. Many applications, however, contain non-deterministic parts; e.g., using lock synchronization. For commercial hardware to exploit the benefits of DeNovo, it is therefore necessary to extend DeNovo to support non-deterministic applications.
This paper proposes DeNovoND, a system that supports lock-based, disciplined non-determinism, with the simplicity, performance, and energy benefits of DeNovo. We use a combination of distributed queue-based locks and access signatures to implement simple memory consistency semantics for safe non-determinism, with a coherence protocol that does not require transient states, invalidation traffic, or directories, and does not incur false sharing. The resulting system is simpler, shows comparable or better execution time, and has 33% less network traffic on average (translating directly into energy savings) relative to a state-of-the-art invalidation-based protocol for 8 applications designed for lock synchronization.

References

[1]
S. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, Aug. 2010.
[2]
S. Adve and M. Hill. Weak Ordering - A New Definition. In ISCA, 1990.
[3]
N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha. GARNET: A Detailed On-Chip Network Model inside a Full-System Simulator. In ISPASS, 2009.
[4]
M. Allen, S. Sridharan, and G. Sohi. Serialization Sets: A Dynamic Dependence-based Parallel Execution Model. In PPoPP, 2009.
[5]
Z. Anderson, D. Gay, R. Ennals, and E. Brewer. SharC: Checking Data Sharing Strategies for Multithreaded C. In PLDI, 2008.
[6]
E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, 2009.
[7]
B. Bershad, M. Zekauskas, and W. Sawdon. The Midway Distributed Shared Memory System. In Compcon Digest of Papers., 1993.
[8]
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, 2011.
[9]
B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM, 13:422--426, 1970.
[10]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP, 1995.
[11]
R. Bocchino, Jr., V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, 2009.
[12]
R. L. Bocchino, Jr., S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. Safe Nondeterminism in a Deterministic-by-Default Parallel Language. In POPL, 2011.
[13]
F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. An Improved Construction for Counting Bloom Filters. In ESA, 2006.
[14]
Z. Budimlić, M. Burke, V. Cavé, K. Knobe, G. Lowney, R. Newton, J. Palsberg, D. Peixotto, V. Sarkar, F. Schlimbach, and S. Taşirlar. Concurrent Collections. Sci. Program., 18(3--4), Aug. 2010.
[15]
J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In STOC, 1977.
[16]
L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk Disambiguation of Speculative Threads in Multiprocessors. In ISCA, 2006.
[17]
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism. In PACT, 2011.
[18]
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009.
[19]
A. Ghuloum, E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A Flexible Parallel Programming Model for Tera-Scale Architectures, 2007.
[20]
J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors. In ASPLOS, 1989.
[21]
D. Hackenberg, D. Molka, and W. E. Nagel. Comparing Cache Architectures and Coherency Protocols on x86--64 Multicore SMP Systems. In MICRO. IEEE, 2009.
[22]
L. Iftode, J. P. Singh, and K. Li. Scope Consistency: A Bridge between Release Consistency and Entry Consistency. In SPAA, 1996.
[23]
Intel. The SCC Platform Overview, 2010.
[24]
A. Kagi, D. Burger, and J. R. Goodman. Efficient Synchronization: Let Them Eat QOLB. In ISCA, 1997.
[25]
S. Kaxiras and G. Keramidas. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power. IEEE Micro, 2010.
[26]
S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the Future of Parallel Computing. IEEE Micro, 31:7--17, 2011.
[27]
P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory. In ISCA, 1992.
[28]
J. H. Kelm, D. R. Johnson, M. R. Johnson, N. C. Crago, W. Tuohy, A. Mahesri, S. S. Lumetta, M. I. Frank, and S. J. Patel. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In ISCA, 2009.
[29]
J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel. Cohesion: A Hybrid Memory Model for Accelerators. In ISCA, 2010.
[30]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI, 2007.
[31]
A. Lebeck and D. Wood. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In ISCA, 1995.
[32]
E. A. Lee. The Problem with Threads. IEEE Computer, 39(5), 2006.
[33]
F. X. Lin, Z. Wang, R. LiKamWa, and L. Zhong. Reflex: Using Low-Power Processors in Smartphones without Knowing Them. In ASPLOS, 2012.
[34]
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. Computer, 35:50--58, 2002.
[35]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News, 2005.
[36]
S. L. Min and J.-L. Baer. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps. TPDS, 1992.
[37]
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IISWC, 2008.
[38]
M. Mitzenmacher. Compressed Bloom Filters. In PODC, 2001.
[39]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009.
[40]
Oracle. Java Language and Virtual Machine Specifications.
[41]
S. H. Pugsley, J. B. Spjut, D. W. Nellans, and R. Balasubramonian. SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches. In PACT, 2010.
[42]
D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing Signatures for Transactional Memory. In MICRO, 2007.
[43]
A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible Decoupled Transactional Memory Support. In ISCA, 2008.
[44]
D. Vantrease, M. H. Lipasti, and N. Binkert. Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. In HPCA, 2011.
[45]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA, 1995.
[46]
L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling Hardware Transactional Memory from Caches. In HPCA, 2007.

Cited By

View all
  • (2023)WARDen: Specializing Cache Coherence for High-Level Parallel LanguagesProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580013(122-135)Online publication date: 17-Feb-2023
  • (2021)Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00032(277-290)Online publication date: Feb-2021
  • (2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
  • Show More Cited By

Index Terms

  1. DeNovoND: efficient hardware support for disciplined non-determinism

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
    March 2013
    574 pages
    ISBN:9781450318709
    DOI:10.1145/2451116
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
      ASPLOS '13
      March 2013
      540 pages
      ISSN:0163-5964
      DOI:10.1145/2490301
      Issue’s Table of Contents
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 4
      ASPLOS '13
      April 2013
      540 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499368
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 March 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache coherence
    2. disciplined parallelism
    3. memory consistency
    4. non-determinism
    5. shared memory

    Qualifiers

    • Research-article

    Conference

    ASPLOS '13

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)WARDen: Specializing Cache Coherence for High-Level Parallel LanguagesProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580013(122-135)Online publication date: 17-Feb-2023
    • (2021)Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00032(277-290)Online publication date: Feb-2021
    • (2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
    • (2020)Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00025(173-186)Online publication date: May-2020
    • (2019)Rethinking Support for Region Conflict Exceptions2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00116(1095-1106)Online publication date: May-2019
    • (2018)Runtime-assisted cache coherence deactivation in task parallel programsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291703(1-12)Online publication date: 11-Nov-2018
    • (2018)Automatic Detection of Large Extended Data-Race-Free Regions with Conflict IsolationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.277150929:3(527-541)Online publication date: 1-Mar-2018
    • (2018)Runtime-assisted cache coherence deactivation in task parallel programsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00038(1-12)Online publication date: 11-Nov-2018
    • (2018)SpandexProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00031(261-274)Online publication date: 2-Jun-2018
    • (2018)VISU: A Simple and Efficient Cache Coherence Protocol Based on Self-updatingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05063-4_27(341-357)Online publication date: 7-Dec-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media