Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Internally deterministic parallel algorithms can be fast

Published: 25 February 2012 Publication History
  • Get Citation Alerts
  • Abstract

    The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that for any input there is a unique dependence graph representing a trace of the computation annotated with every operation and value. This has been referred to as internal determinism, and implies a sequential semantics---i.e., considering any sequential traversal of the dependence graph is sufficient for analyzing the correctness of the code. In addition to returning deterministic results, internal determinism has many advantages including ease of reasoning about the code, ease of verifying correctness, ease of debugging, ease of defining invariants, ease of defining good coverage for testing, and ease of formally, informally and experimentally reasoning about performance. On the other hand one needs to consider the possible downsides of determinism, which might include making algorithms (i) more complicated, unnatural or special purpose and/or (ii) slower or less scalable.
    In this paper we study the effectiveness of this strong form of determinism through a broad set of benchmark problems. Our main contribution is to demonstrate that for this wide body of problems, there exist efficient internally deterministic algorithms, and moreover that these algorithms are natural to reason about and not complicated to code. We leverage an approach to determinism suggested by Steele (1990), which is to use nested parallelism with commutative operations. Our algorithms apply several diverse programming paradigms that fit within the model including (i) a strict functional style (no shared state among concurrent operations), (ii) an approach we refer to as deterministic reservations, and (iii) the use of commutative, linearizable operations on data structures. We describe algorithms for the benchmark problems that use these deterministic approaches and present performance results on a 32-core machine. Perhaps surprisingly, for all problems, our internally deterministic algorithms achieve good speedup and good performance even relative to prior nondeterministic solutions.

    References

    [1]
    U. Acar, G. E. Blelloch, and R. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3), 2002. Springer.
    [2]
    S. V. Adve and M. D. Hill. Weak ordering--a new definition. In ACM ISCA, 1990.
    [3]
    T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. Core-Det: A compiler and runtime system for deterministic multithreaded execution. In ACM ASPLOS, 2010.
    [4]
    T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOS. In Usenix OSDI, 2010.
    [5]
    E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe multithreaded programming for C/C++. In ACM OOPSLA, 2009.
    [6]
    G. E. Blelloch. Programming parallel algorithms. CACM, 39(3), 1996.
    [7]
    G. E. Blelloch and D. Golovin. Strongly history-independent hashing with applications. In IEEE FOCS, 2007.
    [8]
    G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In ACM ICFP, 1996.
    [9]
    G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low-depth cache oblivious algorithms. In ACM SPAA, 2010.
    [10]
    R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. J. Parallel and Distributed Computing, 37(1), 1996. Elsevier.
    [11]
    R. L. Bocchino, V. S. Adve, S. V. Adve, and M. Snir. Parallel programming must be deterministic by default. In Usenix HotPar, 2009.
    [12]
    R. L. Bocchino, S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. Safe nondeterminism in a deterministic-by-default parallel language. In ACM POPL, 2011.
    [13]
    P. B. Callahan and S. R. Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42(1), 1995.
    [14]
    D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SIAM SDM, 2004.
    [15]
    G.-I. Cheng, M. Feng, C. E. Leiserson, K. H. Randall, and A. F. Stark. Detecting data races in Cilk programs that use locks. In ACM SPAA, 1998.
    [16]
    B. Choi, R. Komuravelli, V. Lu, H. Sung, R. L. Bocchino, S. V. Adve, and J. C. Hart. Parallel SAH k-D tree construction. In ACM High Performance Graphics, 2010.
    [17]
    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.
    [18]
    M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, 2008.
    [19]
    J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic shared memory multiprocessing. In ACM ASPLOS, 2009.
    [20]
    J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman. RCDC: A relaxed consistency deterministic computer. In ACM ASPLOS, 2011.
    [21]
    E. W. Dijkstra. Cooperating sequential processes. Technical Report EWD 123, Dept. of Mathematics, Technological U., Eindhoven, 1965.
    [22]
    K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In ACM ISCA, 1990.
    [23]
    P. B. Gibbons. A more practical PRAM model. In ACM SPAA, 1989.
    [24]
    R. H. Halstead. Multilisp: A language for concurrent symbolic computation. ACM TOPLAS, 7(4), 1985.
    [25]
    M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered vs. unordered: A comparison of parallelism and work-efficiency in irregular algorithms. In ACM PPoPP, 2011.
    [26]
    M. Herlihy and E. Koskinen. Transactional boosting: A methodology for highly-concurrent transactional objects. In ACM PPoPP, 2008.
    [27]
    M. P. Herlihy and J. M.Wing. Linearizability: A correctness condition for concurrent objects. ACM TOPLAS, 12(3), 1990.
    [28]
    D. Hower, P. Dudnik, M. Hill, and D. Wood. Calvin: Deterministic or not? Free will to choose. In IEEE HPCA, 2011.
    [29]
    J. Karkkainen and P. Sanders. Simple linear work suffix array construction. In EATCS ICALP, 2003.
    [30]
    M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In ACM PLDI, 2011.
    [31]
    C. E. Leiserson. The Cilk++ concurrency platform. J. Supercomputing, 51(3), 2010. Springer.
    [32]
    C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In ACM SPAA, 2010.
    [33]
    C. E. Leiserson, T. B. Schardl, and J. Sukha. Deterministic parallel random-number generation for dynamic-multithreading platforms. In ACM PPoPP, 2012.
    [34]
    J. D. MacDonald and K. S. Booth. Heuristics for ray tracing using space subdivision. The Visual Computer, 6(3), 1990. Springer.
    [35]
    R. H. B. Netzer and B. P. Miller. What are race conditions? ACM LOPLAS, 1(1), 1992.
    [36]
    M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. In ACM ASPLOS, 2009.
    [37]
    S. S. Patil. Closure properties of interconnections of determinate systems. In J. B. Dennis, editor, Record of the Project MAC conference on concurrent systems and parallel computation. ACM, 1970.
    [38]
    K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Mendez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In ACM PLDI, 2011.
    [39]
    P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In ACM PLDI, 2011.
    [40]
    M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM TOPLAS, 19(6), 1997.
    [41]
    J. Singler, P. Sanders, and F. Putze. MCSTL: The multi-core standard template library. In Euro-Par, 2007.
    [42]
    G. L. Steele Jr. Making asynchronous parallelism safe for the world. In ACM POPL, 1990.
    [43]
    J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In ACM ISCA, 2000.
    [44]
    W. E. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Trans. Computers, 37(12), 1988.
    [45]
    J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ACM ISCA, 2009.

    Cited By

    View all
    • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
    • (2023)Provably-Efficient and Internally-Deterministic Parallel Union-FindProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591082(261-271)Online publication date: 17-Jun-2023
    • (2022)Entanglement detection with near-zero costProceedings of the ACM on Programming Languages10.1145/35476466:ICFP(679-710)Online publication date: 31-Aug-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 8
    PPOPP '12
    August 2012
    334 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2370036
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
      February 2012
      352 pages
      ISBN:9781450311601
      DOI:10.1145/2145816
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2012
    Published in SIGPLAN Volume 47, Issue 8

    Check for updates

    Author Tags

    1. commutative operations
    2. deterministic parallelism
    3. geometry algorithms
    4. graph algorithms
    5. parallel algorithms
    6. parallel programming
    7. sorting
    8. string processing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)57
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
    • (2023)Provably-Efficient and Internally-Deterministic Parallel Union-FindProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591082(261-271)Online publication date: 17-Jun-2023
    • (2022)Entanglement detection with near-zero costProceedings of the ACM on Programming Languages10.1145/35476466:ICFP(679-710)Online publication date: 31-Aug-2022
    • (2021)Atomic power in forksProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458192(2141-2153)Online publication date: 10-Jan-2021
    • (2021)ConnectItProceedings of the VLDB Endowment10.14778/3436905.343692314:4(653-667)Online publication date: 22-Feb-2021
    • (2021)Provably space-efficient parallel functional programmingProceedings of the ACM on Programming Languages10.1145/34342995:POPL(1-33)Online publication date: 4-Jan-2021
    • (2019)Disentanglement in nested-parallel programsProceedings of the ACM on Programming Languages10.1145/33711154:POPL(1-32)Online publication date: 20-Dec-2019
    • (2019)TapirACM Transactions on Parallel Computing10.1145/33656556:4(1-33)Online publication date: 17-Dec-2019
    • (2019)Symbolic computation with monotone operatorsACM Communications in Computer Algebra10.1145/3338637.333864652:4(139-141)Online publication date: 30-May-2019
    • (2019)Tight Analysis of Parallel Randomized Greedy MISACM Transactions on Algorithms10.1145/332616516:1(1-13)Online publication date: 5-Dec-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media