research-article

Internally deterministic parallel algorithms can be fast

Authors:

Guy E. Blelloch,

Jeremy T. Fineman,

Phillip B. Gibbons,

Julian ShunAuthors Info & Claims

ACM SIGPLAN Notices, Volume 47, Issue 8

Pages 181 - 192

https://doi.org/10.1145/2370036.2145840

Published: 25 February 2012 Publication History

Abstract

The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that for any input there is a unique dependence graph representing a trace of the computation annotated with every operation and value. This has been referred to as internal determinism, and implies a sequential semantics---i.e., considering any sequential traversal of the dependence graph is sufficient for analyzing the correctness of the code. In addition to returning deterministic results, internal determinism has many advantages including ease of reasoning about the code, ease of verifying correctness, ease of debugging, ease of defining invariants, ease of defining good coverage for testing, and ease of formally, informally and experimentally reasoning about performance. On the other hand one needs to consider the possible downsides of determinism, which might include making algorithms (i) more complicated, unnatural or special purpose and/or (ii) slower or less scalable.

In this paper we study the effectiveness of this strong form of determinism through a broad set of benchmark problems. Our main contribution is to demonstrate that for this wide body of problems, there exist efficient internally deterministic algorithms, and moreover that these algorithms are natural to reason about and not complicated to code. We leverage an approach to determinism suggested by Steele (1990), which is to use nested parallelism with commutative operations. Our algorithms apply several diverse programming paradigms that fit within the model including (i) a strict functional style (no shared state among concurrent operations), (ii) an approach we refer to as deterministic reservations, and (iii) the use of commutative, linearizable operations on data structures. We describe algorithms for the benchmark problems that use these deterministic approaches and present performance results on a 32-core machine. Perhaps surprisingly, for all problems, our internally deterministic algorithms achieve good speedup and good performance even relative to prior nondeterministic solutions.

References

[1]

U. Acar, G. E. Blelloch, and R. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3), 2002. Springer.

[2]

S. V. Adve and M. D. Hill. Weak ordering--a new definition. In ACM ISCA, 1990.

Digital Library

[3]

T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. Core-Det: A compiler and runtime system for deterministic multithreaded execution. In ACM ASPLOS, 2010.

Digital Library

[4]

T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOS. In Usenix OSDI, 2010.

Digital Library

[5]

E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe multithreaded programming for C/C++. In ACM OOPSLA, 2009.

Digital Library

[6]

G. E. Blelloch. Programming parallel algorithms. CACM, 39(3), 1996.

Digital Library

[7]

G. E. Blelloch and D. Golovin. Strongly history-independent hashing with applications. In IEEE FOCS, 2007.

Digital Library

[8]

G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In ACM ICFP, 1996.

Digital Library

[9]

G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low-depth cache oblivious algorithms. In ACM SPAA, 2010.

Digital Library

[10]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. J. Parallel and Distributed Computing, 37(1), 1996. Elsevier.

Digital Library

[11]

R. L. Bocchino, V. S. Adve, S. V. Adve, and M. Snir. Parallel programming must be deterministic by default. In Usenix HotPar, 2009.

Digital Library

[12]

R. L. Bocchino, S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. Safe nondeterminism in a deterministic-by-default parallel language. In ACM POPL, 2011.

Digital Library

[13]

P. B. Callahan and S. R. Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42(1), 1995.

Digital Library

[14]

D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SIAM SDM, 2004.

[15]

G.-I. Cheng, M. Feng, C. E. Leiserson, K. H. Randall, and A. F. Stark. Detecting data races in Cilk programs that use locks. In ACM SPAA, 1998.

Digital Library

[16]

B. Choi, R. Komuravelli, V. Lu, H. Sung, R. L. Bocchino, S. V. Adve, and J. C. Hart. Parallel SAH k-D tree construction. In ACM High Performance Graphics, 2010.

Digital Library

[17]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.

Digital Library

[18]

M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, 2008.

Digital Library

[19]

J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic shared memory multiprocessing. In ACM ASPLOS, 2009.

Digital Library

[20]

J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman. RCDC: A relaxed consistency deterministic computer. In ACM ASPLOS, 2011.

Digital Library

[21]

E. W. Dijkstra. Cooperating sequential processes. Technical Report EWD 123, Dept. of Mathematics, Technological U., Eindhoven, 1965.

Digital Library

[22]

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In ACM ISCA, 1990.

Digital Library

[23]

P. B. Gibbons. A more practical PRAM model. In ACM SPAA, 1989.

Digital Library

[24]

R. H. Halstead. Multilisp: A language for concurrent symbolic computation. ACM TOPLAS, 7(4), 1985.

Digital Library

[25]

M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered vs. unordered: A comparison of parallelism and work-efficiency in irregular algorithms. In ACM PPoPP, 2011.

Digital Library

[26]

M. Herlihy and E. Koskinen. Transactional boosting: A methodology for highly-concurrent transactional objects. In ACM PPoPP, 2008.

Digital Library

[27]

M. P. Herlihy and J. M.Wing. Linearizability: A correctness condition for concurrent objects. ACM TOPLAS, 12(3), 1990.

Digital Library

[28]

D. Hower, P. Dudnik, M. Hill, and D. Wood. Calvin: Deterministic or not? Free will to choose. In IEEE HPCA, 2011.

Digital Library

[29]

J. Karkkainen and P. Sanders. Simple linear work suffix array construction. In EATCS ICALP, 2003.

Digital Library

[30]

M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In ACM PLDI, 2011.

Digital Library

[31]

C. E. Leiserson. The Cilk++ concurrency platform. J. Supercomputing, 51(3), 2010. Springer.

Digital Library

[32]

C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In ACM SPAA, 2010.

Digital Library

[33]

C. E. Leiserson, T. B. Schardl, and J. Sukha. Deterministic parallel random-number generation for dynamic-multithreading platforms. In ACM PPoPP, 2012.

Digital Library

[34]

J. D. MacDonald and K. S. Booth. Heuristics for ray tracing using space subdivision. The Visual Computer, 6(3), 1990. Springer.

Digital Library

[35]

R. H. B. Netzer and B. P. Miller. What are race conditions? ACM LOPLAS, 1(1), 1992.

Digital Library

[36]

M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. In ACM ASPLOS, 2009.

Digital Library

[37]

S. S. Patil. Closure properties of interconnections of determinate systems. In J. B. Dennis, editor, Record of the Project MAC conference on concurrent systems and parallel computation. ACM, 1970.

Digital Library

[38]

K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Mendez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In ACM PLDI, 2011.

Digital Library

[39]

P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In ACM PLDI, 2011.

Digital Library

[40]

M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM TOPLAS, 19(6), 1997.

Digital Library

[41]

J. Singler, P. Sanders, and F. Putze. MCSTL: The multi-core standard template library. In Euro-Par, 2007.

Digital Library

[42]

G. L. Steele Jr. Making asynchronous parallelism safe for the world. In ACM POPL, 1990.

Digital Library

[43]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In ACM ISCA, 2000.

Digital Library

[44]

W. E. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Trans. Computers, 37(12), 1988.

Digital Library

[45]

J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ACM ISCA, 2009.

Digital Library

Cited By

Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Fedorov AHashemi DNadiradze GAlistarh DAgrawal KShun J(2023)Provably-Efficient and Internally-Deterministic Parallel Union-FindProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591082(261-271)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591082
Westrick SArora JAcar U(2022)Entanglement detection with near-zero costProceedings of the ACM on Programming Languages10.1145/35476466:ICFP(679-710)Online publication date: 31-Aug-2022
https://dl.acm.org/doi/10.1145/3547646
Show More Cited By

Index Terms

Internally deterministic parallel algorithms can be fast
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Internally deterministic parallel algorithms can be fast
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that for any input there is a unique ...
Provably-Efficient and Internally-Deterministic Parallel Union-Find
SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures

Determining the degree of inherent parallelism in classical sequential algorithms and leveraging it for fast parallel execution is a key topic in parallel computing, and detailed analyses are known for a wide range of classical algorithms. In this paper, ...
Parallelizing Subroutines in Sequential Programs

An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 47, Issue 8

PPOPP '12

August 2012

334 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2370036

Issue’s Table of Contents

PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
February 2012
352 pages
ISBN:9781450311601
DOI:10.1145/2145816
General Chair:
J. Ramanujam
Louisiana State University, USA
,
Program Chair:
P. Sadayappan
The Ohio State University, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012

Published in SIGPLAN Volume 47, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

134
Total Citations
View Citations
800
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)4

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Fedorov AHashemi DNadiradze GAlistarh DAgrawal KShun J(2023)Provably-Efficient and Internally-Deterministic Parallel Union-FindProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591082(261-271)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591082
Westrick SArora JAcar U(2022)Entanglement detection with near-zero costProceedings of the ACM on Programming Languages10.1145/35476466:ICFP(679-710)Online publication date: 31-Aug-2022
https://dl.acm.org/doi/10.1145/3547646
Goodrich MJacob RSitchinava NMarx D(2021)Atomic power in forksProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458192(2141-2153)Online publication date: 10-Jan-2021
https://dl.acm.org/doi/10.5555/3458064.3458192
Dhulipala LHong CShun J(2021)ConnectItProceedings of the VLDB Endowment10.14778/3436905.343692314:4(653-667)Online publication date: 22-Feb-2021
https://dl.acm.org/doi/10.14778/3436905.3436923
Arora JWestrick SAcar U(2021)Provably space-efficient parallel functional programmingProceedings of the ACM on Programming Languages10.1145/34342995:POPL(1-33)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3434299
Westrick SYadav RFluet MAcar U(2019)Disentanglement in nested-parallel programsProceedings of the ACM on Programming Languages10.1145/33711154:POPL(1-32)Online publication date: 20-Dec-2019
https://dl.acm.org/doi/10.1145/3371115
Schardl TMoses WLeiserson C(2019)TapirACM Transactions on Parallel Computing10.1145/33656556:4(1-33)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3365655
Lauster FLuke DTam M(2019)Symbolic computation with monotone operatorsACM Communications in Computer Algebra10.1145/3338637.333864652:4(139-141)Online publication date: 30-May-2019
https://dl.acm.org/doi/10.1145/3338637.3338646
Fischer MNoever A(2019)Tight Analysis of Parallel Randomized Greedy MISACM Transactions on Algorithms10.1145/332616516:1(1-13)Online publication date: 5-Dec-2019
https://dl.acm.org/doi/10.1145/3326165
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents