Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3490148.3538574acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Open access

Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient

Published: 11 July 2022 Publication History

Abstract

Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so.
To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework.
All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.

References

[1]
Openstreetmap © openstreetmap contributors. https://www.openstreetmap.org/, 2010.
[2]
U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theoretical Computer Science (TCS), 35(3), 2002.
[3]
Y. Afek, N. Alon, Z. Bar-Joseph, A. Cornejo, B. Haeupler, and F. Kuhn. Beeping a maximal independent set. Distributed computing, 26(4):195--208, 2013.
[4]
K. Agrawal, J. T. Fineman, K. Lu, B. Sheridan, J. Sukha, and R. Utterback. Provably good scheduling for parallel programs that use data structures through implicit batching. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2014.
[5]
M. R. Alam and M. S. Rahman. A divide and conquer approach and a work-optimal parallel algorithm for the lis problem. Information Processing Letters, 113(13):470--476, 2013.
[6]
N. Ben-David, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, C. McGuffey, and J. Shun. Parallel algorithms for asymmetric read-write costs. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2016.
[7]
N. Ben-David, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, C. McGuffey, and J. Shun. Implicit decomposition for write-efficient connectivity algorithms. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018.
[8]
G. E. Blelloch, R. A. Chowdhury, P. B. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In ACM-SIAM Symposium on Discrete Algorithms (SODA), 2008.
[9]
G. E. Blelloch, D. Ferizovic, and Y. Sun. Just join for parallel ordered sets. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2016.
[10]
G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In ACM Symposium on Principles and Practice of Parallel Programming (PPOPP), 2012.
[11]
G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and H. V. Simhadri. Scheduling irregular parallel computations on hierarchical caches. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2011.
[12]
G. E. Blelloch, J. T. Fineman, Y. Gu, and Y. Sun. Optimal parallel algorithms in the binary-forking model. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2020.
[13]
G. E. Blelloch, J. T. Fineman, and J. Shun. Greedy sequential maximal independent set and matching are parallel on average. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2012.
[14]
G. E. Blelloch and P. B. Gibbons. Effectively sharing a cache among threads. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2004.
[15]
G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low depth cache-oblivious algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2010.
[16]
G. E. Blelloch, Y. Gu, J. Shun, and Y. Sun. Parallel write-efficient algorithms and data structures for computational geometry. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2018.
[17]
G. E. Blelloch, Y. Gu, J. Shun, and Y. Sun. Parallelism in randomized incremental algorithms. J. ACM, 2020.
[18]
G. E. Blelloch, Y. Gu, J. Shun, and Y. Sun. Randomized incremental convex hull is highly parallel. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2020.
[19]
G. E. Blelloch and M. Reid-Miller. Fast set operations using treaps. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 1998.
[20]
G. E. Blelloch and M. Reid-Miller. Pipelining with futures. Theory of Computing Systems (TOCS), 32(3), 1999.
[21]
G. E. Blelloch, H. V. Simhadri, and K. Tangwongsan. Parallel and I/O efficient set covering algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2012.
[22]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5):720--748, 1999.
[23]
N. Calkin and A. Frieze. Probabilistic analysis of a parallel algorithm for finding maximal independent sets. Random Structures & Algorithms, 1(1):39--50, 1990.
[24]
S. Chatterjee, R. Gmyr, and G. Pandurangan. Sleeping is efficient: Mis in o (1)-rounds node-averaged awake complexity. In ACM Symposium on Principles of Distributed Computing (PODC), pages 99--108, 2020.
[25]
R. Chowdhury, P. Ganapathi, Y. Tang, and J. J. Tithi. Provably efficient scheduling of cache-oblivious wavefront algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 339--350, 2017.
[26]
R. A. Chowdhury, V. Ramachandran, F. Silvestri, and B. Blakeley. Oblivious algorithms for multicores and networks of processors. Journal of Parallel and Distributed Computing, 73(7):911--925, 2013.
[27]
R. Cole and V. Ramachandran. Resource oblivious sorting on multicores. ACM Transactions on Parallel Computing (TOPC), 3(4), 2017.
[28]
S. A. Cook. A taxonomy of problems with fast parallel algorithms. Inf. Control, 64, March 1985.
[29]
D. Coppersmith, P. Raghavan, and M. Tompa. Parallel graph algorithms that are efficient on average. In IEEE Symposium on Foundations of Computer Science (FOCS), pages 260--269. IEEE, 1987.
[30]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (3rd edition). MIT Press, 2009.
[31]
A. Crauser, K. Mehlhorn, U. Meyer, and P. Sanders. A parallelization of dijkstra's shortest path algorithm. In International Symposium on Mathematical Foundations of Computer Science, pages 722--731. Springer, 1998.
[32]
J. Dahlum, S. Lamm, P. Sanders, C. Schulz, D. Strash, and R. F. Werneck. Accelerating local search for the maximum independent set problem. In International symposium on experimental algorithms, pages 118--133. Springer, 2016.
[33]
S. Daum, M. Ghaffari, S. Gilbert, F. Kuhn, and C. Newport. Maximal independent sets in multichannel radio networks. In ACM Symposium on Principles of Distributed Computing (PODC), pages 335--344, 2013.
[34]
L. Dhulipala, G. E. Blelloch, Y. Gu, and Y. Sun. Pac-trees: Supporting parallel and compressed purely-functional collections. In ACM Conference on Programming Language Design and Implementation (PLDI), 2022.
[35]
L. Dhulipala, G. E. Blelloch, and J. Shun. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2017.
[36]
L. Dhulipala, C. McGuffey, H. Kang, Y. Gu, G. E. Blelloch, P. B. Gibbons, and J. Shun. Semi-asymmetric parallel graph algorithms for nvrams. Proceedings of the VLDB Endowment (PVLDB), 13(9), 2020.
[37]
E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1), 1959.
[38]
D. Dinh, H. V. Simhadri, and Y. Tang. Extending the nested parallel model to the nested dataflow model with provably efficient schedulers. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2016.
[39]
X. Dong, Y. Gu, Y. Sun, and Y. Zhang. Efficient stepping algorithms and implementations for parallel shortest paths. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2021.
[40]
C. Dwork, M. Herlihy, and O. Waarts. Contention in shared memory algorithms. Journal of the ACM (JACM), 44(6):779--805, 1997.
[41]
J. T. Fineman, C. Newport, M. Sherr, and T. Wang. Fair maximal independent sets. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 712--721. IEEE, 2014.
[42]
M. Fischer and A. Noever. Tight analysis of parallel randomized greedy mis. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2152--2160, 2018.
[43]
Z. Galil and K. Park. Parallel algorithms for dynamic programming recurrences with more than O(1) dependency. J. Parallel Distrib. Comput., 21(2), 1994.
[44]
J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the third international conference on Architectural support for programming languages and operating systems, pages 64--75, 1989.
[45]
P. Gruevski, W. Hasenplaugh, D. Lugato, and J. J. Thomas. Laika: Efficient in-place scheduling for 3d mesh graph computations. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, pages 415--426, 2018.
[46]
Y. Gu, O. Obeya, and J. Shun. Parallel in-place algorithms: Theory and practice. In SIAM Symposium on Algorithmic Principles of Computer Systems (APOCS), pages 114--128, 2021.
[47]
W. Hasenplaugh, T. Kaler, T. B. Schardl, and C. E. Leiserson. Ordering heuristics for parallel graph coloring. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2014.
[48]
S. Im, B. Moseley, and X. Sun. Efficient massively parallel methods for dynamic programming. In ACM Symposium on Theory of Computing (STOC), pages 798--811, 2017.
[49]
M. T. Jones and P. E. Plassmann. A parallel graph coloring heuristic. 14(3):654--669, 1993.
[50]
M. Kainer and J. L. Tr"aff. More parallelism in dijkstra's single-source shortest path algorithm. arXiv preprint arXiv:1903.12085, 2019.
[51]
T. Kaler, W. Hasenplaugh, T. B. Schardl, and C. E. Leiserson. Executing dynamic data-graph computations deterministically using chromatic scheduling. ACM Transactions on Parallel Computing (TOPC), 3(1):1--31, 2016.
[52]
P. Krusche and A. Tiskin. Parallel longest increasing subsequences in scalable time and memory. In International Conference on Parallel Processing and Applied Mathematics, pages 176--185. Springer, 2009.
[53]
P. Krusche and A. Tiskin. New algorithms for efficient parallel string comparison. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 209--216, 2010.
[54]
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pages 591--600, 2010.
[55]
C. E. Leiserson. Cilk. In D. A. Padua, editor, Encyclopedia of Parallel Computing. Springer, 2011.
[56]
M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. on Computing, 15, 1986.
[57]
U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, 2003.
[58]
T. Nakashima and A. Fujiwara. Parallel algorithms for patience sorting and longest increasing subsequence. In International Conference in Networks, Parallel and Distributed Processing and Applications, pages 7--12, 2002.
[59]
T. Nakashima and A. Fujiwara. A cost optimal parallel algorithm for patience sorting. Parallel processing letters, 16(01):39--51, 2006.
[60]
X. Pan, D. Papailiopoulos, S. Oymak, B. Recht, K. Ramchandran, and M. I. Jordan. Parallel correlation clustering on big graphs. In Advances in Neural Information Processing Systems (NIPS), pages 82--90, 2015.
[61]
D. Semé. A cgm algorithm solving the longest increasing subsequence problem. In International Conference on Computational Science and Its Applications, pages 10--21. Springer, 2006.
[62]
Z. Shen, Z. Wan, Y. Gu, and Y. Sun. Many sequential iterative algorithms can be parallel and (nearly) work-efficient. arXiv preprint arXiv:2205.13077, 2022.
[63]
J. Shun and G. E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In ACM Symposium on Principles and Practice of Parallel Programming (PPOPP), 2013.
[64]
J. Shun, Y. Gu, G. E. Blelloch, J. T. Fineman, and P. B. Gibbons. Sequential random permutation, list contraction and tree contraction are highly parallel. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 431--448, 2015.
[65]
Y. Sun and G. E. Blelloch. Parallel range, segment and rectangle queries with augmented maps. In SIAM Symposium on Algorithm Engineering and Experiments (ALENEX), pages 159--173, 2019.
[66]
Y. Sun, D. Ferizovic, and G. E. Blelloch. Pam: Parallel augmented maps. In ACM Symposium on Principles and Practice of Parallel Programming (PPOPP), 2018.
[67]
G. Thierry, M. Jean-Frédéric, and S. David. A work-optimal cgm algorithm for the lis problem. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 330--331, 2001.
[68]
A. Tiskin. Fast distance multiplication of unit-monge matrices. Algorithmica, 71(4):859--888, 2015.
[69]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181--213, 2015.
[70]
Y. Zhang, M. Yang, R. Baghdadi, S. Kamil, J. Shun, and S. Amarasinghe. Graphit: A high-performance graph dsl. Proceedings of the ACM on Programming Languages, 2(OOPSLA):1--30, 2018.

Cited By

View all
  • (2024)Analysis and Construction of Hardware Accelerators for Calculating the Shortest Path in Real-Time Robot Route PlanningElectronics10.3390/electronics1311216713:11(2167)Online publication date: 2-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '22: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures
July 2022
464 pages
ISBN:9781450391467
DOI:10.1145/3490148
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2022

Check for updates

Author Tags

  1. activity selection
  2. independence system
  3. longest increasing subsequence
  4. maximal independent set
  5. parallel algorithms
  6. parallel programming
  7. phase-parallel framework
  8. sequential iterative algorithms

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Conference

SPAA '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)160
  • Downloads (Last 6 weeks)13
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Analysis and Construction of Hardware Accelerators for Calculating the Shortest Path in Real-Time Robot Route PlanningElectronics10.3390/electronics1311216713:11(2167)Online publication date: 2-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media