Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations

Published: 14 March 2015 Publication History

Abstract

Current shared-memory hardware is complex and inefficient. Prior work on the DeNovo coherence protocol showed that disciplined shared-memory programming models can enable more complexity-, performance-, and energy-efficient hardware than the state-of-the-art MESI protocol. DeNovo, however, severely restricted the synchronization constructs an application can support. This paper proposes DeNovoSync, a technique to support arbitrary synchronization in DeNovo. The key challenge is that DeNovo exploits race-freedom to use reader-initiated local self-invalidations (instead of conventional writer-initiated remote cache invalidations) to ensure coherence. Synchronization accesses are inherently racy and not directly amenable to self-invalidations. DeNovoSync addresses this challenge using a novel combination of registration of all synchronization reads with a judicious hardware backoff to limit unnecessary registrations. For a wide variety of synchronization constructs and applications, compared to MESI, DeNovoSync shows comparable or up to 22% lower execution time and up to 58% lower network traffic, enabling DeNovo's advantages for a much broader class of software than previously possible.

References

[1]
S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29 (12): 66--76, 1996.
[2]
A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proceedings of the 16th Annual International Symposium on Computer Architecture, ISCA '89, 1989.
[3]
N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha. Garnet: A detailed interconnection network model inside a full-system simulation framework. Technical Report CE-P08-001, Princeton University, 2008. URL http://www.princeton.edu/~niketa/garnet.
[4]
T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst., 1 (1), Jan. 1990.
[5]
B. Bershad, M. Zekauskas, and W. Sawdon. The midway distributed shared memory system. In Compcon Spring '93, Digest of Papers., Feb 1993.
[6]
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, Jan. 2011.
[7]
R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for deterministic parallel java. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '09, 2009.
[8]
R. L. Bocchino, Jr., S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. Safe nondeterminism in a deterministic-by-default parallel language. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '11, 2011.
[9]
H.-J. Boehm and S. V. Adve. Foundations of the c
[10]
concurrency memory model. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, 2008.
[11]
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou. Denovo: Rethinking the memory hierarchy for disciplined parallelism. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques, PACT '11, 2011.
[12]
M. Elver and V. Nagarajan. Tso-cc: Consistency directed cache coherence for tso. In IEEE 20th International Symposium on High Performance Computer Architecture, HPCA-20, Feb 2014.
[13]
J. R. Goodman and P. J. Woest. The wisconsin multicube: A new large-scale cache-coherent multiprocessor. In Proceedings of the 15th Annual International Symposium on Computer Architecture, ISCA '88, 1988.
[14]
J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS III, 1989.
[15]
M. Herlihy. A methodology for implementing highly concurrent data structures. In Proceedings of the Second ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, PPOPP '90, 1990.
[16]
M. D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood. Cooperative shared memory: Software and hardware for scalable multiprocessor. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS V, 1992.
[17]
L. Iftode, J. P. Singh, and K. Li. Scope consistency: A bridge between release consistency and entry consistency. In Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '96, 1996.
[18]
S. Kaxiras and G. Keramidas. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power. IEEE Micro, 30 (5), Sept.-Oct. 2010.
[19]
P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory. In Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA '92, 1992.
[20]
J. H. Kelm, D. R. Johnson, M. R. Johnson, N. C. Crago, W. Tuohy, A. Mahesri, S. S. Lumetta, M. I. Frank, and S. J. Patel. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, 2009.
[21]
J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel. Cohesion: A Hybrid Memory Model for Accelerators. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, 2010.
[22]
R. Komuravelli, S. V. Adve, and C.-T. Chou. Revisiting the complexity of hardware cache coherence and some implications. ACM Trans. Archit. Code Optim., Dec. 2014.
[23]
D. Koufaty, X. Chen, D. Poulsen, and J. Torrellas. Data forwarding in scalable shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 7 (12), dec 1996.
[24]
J. R. Larus, S. Chandra, and D. A. Wood. Cico: A practical shared-memory programming performance model. In Workshop on Portability and Performance for Parallel Processing, 1993.
[25]
A. R. Lebeck and D. A. Wood. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, 1995.
[26]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35: 50--58, 2002.
[27]
J. Manson, W. Pugh, and S. V. Adve. The java memory model. In Proceedings of the 32Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '05, 2005.
[28]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News, 33 (4): 92--99, 2005.
[29]
M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, PODC '96, 1996.
[30]
M. M. Michael and M. L. Scott. Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors. J. Parallel Distrib. Comput., 51 (1), May 1998.
[31]
S. L. Min and J.-L. Baer. Design and analysis of a scalable cache coherence scheme based on clocks and timestamps. IEEE Trans. on Parallel and Distributed Systems, 3 (2): 25--44, January 1992.
[32]
R. Rajwar, A. Kagi, and J. Goodman. Improving the throughput of synchronization by insertion of delays. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, HPCA-6, 2000.
[33]
A. Ros and S. Kaxiras. Complexity-effective multicore coherence. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT '12, 2012.
[34]
M. Scott. Shared Memory Synchronization. Synthesis Lectures on Computer Architecture. Morgan & Claypool, 2013. ISBN 9781608459568. URL http://books.google.com/books?id=N4YcnQEACAAJ.
[35]
S. Subramaniam, S. C. Steely, W. Hasenplaugh, A. Jaleel, C. Beckmann, T. Fossum, and J. Emer. Using in-flight chains to build a scalable cache coherence protocol. ACM Trans. Archit. Code Optim., 10 (4), Dec. 2013.
[36]
H. Sung, R. Komuravelli, and S. V. Adve. DeNovoND: efficient hardware support for disciplined non-determinism. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, ASPLOS '13, 2013.
[37]
H. Sung, R. Komuravelli, and S. V. Adve. DeNovoND: efficient hardware for disciplined nondeterminism. IEEE Micro, 34 (3), 2014.
[38]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, 1995.
[39]
J. Zebchuk, V. Srinivasan, M. K. Qureshi, and A. Moshovos. A tagless coherence directory. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, 2009.

Cited By

View all
  • (2016)Fencing Programs with Self-Invalidation and Self-Downgrade36th IFIP WG 6.1 International Conference on Formal Techniques for Distributed Objects, Components, and Systems - Volume 968810.1007/978-3-319-39570-8_2(19-35)Online publication date: 6-Jun-2016
  • (2019)Rethinking Support for Region Conflict Exceptions2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00116(1095-1106)Online publication date: May-2019
  • (2018)Automatic Detection of Large Extended Data-Race-Free Regions with Conflict IsolationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.277150929:3(527-541)Online publication date: 1-Mar-2018
  • Show More Cited By

Index Terms

  1. DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 4
      ASPLOS '15
      April 2015
      676 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2775054
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2015
        720 pages
        ISBN:9781450328357
        DOI:10.1145/2694344
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 March 2015
      Published in SIGPLAN Volume 50, Issue 4

      Check for updates

      Author Tags

      1. cache coherence
      2. consistency
      3. shared memory
      4. synchronization

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)30
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Fencing Programs with Self-Invalidation and Self-Downgrade36th IFIP WG 6.1 International Conference on Formal Techniques for Distributed Objects, Components, and Systems - Volume 968810.1007/978-3-319-39570-8_2(19-35)Online publication date: 6-Jun-2016
      • (2019)Rethinking Support for Region Conflict Exceptions2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00116(1095-1106)Online publication date: May-2019
      • (2018)Automatic Detection of Large Extended Data-Race-Free Regions with Conflict IsolationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.277150929:3(527-541)Online publication date: 1-Mar-2018
      • (2018)SpandexProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00031(261-274)Online publication date: 2-Jun-2018
      • (2018)Constructing a weak memory modelProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00021(124-137)Online publication date: 2-Jun-2018
      • (2017)Automatic detection of extended data-race-free regionsProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049835(14-26)Online publication date: 4-Feb-2017
      • (2017)Chasing Away RAtsACM SIGARCH Computer Architecture News10.1145/3140659.308020645:2(161-174)Online publication date: 24-Jun-2017
      • (2017)Chasing Away RAtsProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080206(161-174)Online publication date: 24-Jun-2017
      • (2017)Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed SemanticsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.272074428:12(3413-3425)Online publication date: 1-Dec-2017
      • (2017)TC-Release++: An Efficient Timestamp-Based Coherence Protocol for Many-Core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.271967928:11(3313-3327)Online publication date: 1-Nov-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media