research-article

Free access

A work-efficient algorithm for parallel unordered depth-first search

Authors:

Arthur Charguéraud,

Mike RaineyAuthors Info & Claims

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 67, Pages 1 - 12

https://doi.org/10.1145/2807591.2807651

Published: 15 November 2015 Publication History

Abstract

Advances in processing power and memory technology have made multicore computers an important platform for high-performance graph-search (or graph-traversal) algorithms. Since the introduction of multicore, much progress has been made to improve parallel breadth-first search. However, less attention has been given to algorithms for unordered or loosely ordered traversals.

We present a parallel algorithm for unordered depth-first-search on graphs. We prove that the algorithm is work efficient in a realistic algorithmic model that accounts for important scheduling costs. This work-efficiency result applies to all graphs, including those with high diameter and high out-degree vertices. The algorithmic techniques behind this result include a new data structure for representing the frontier of vertices in depth-first search, a new amortization technique for controlling excess parallelism, and an adaptation of the lazy-splitting technique to depth first search.

We validate the theoretical results with an implementation and experiments. The experiments show that the algorithm performs well on a range of graphs and that it can lead to significant improvements over comparable algorithms.

References

[1]

Stanford large network dataset collection. http://snap.stanford.edu/.

[2]

The 9^th dimacs implementation challenge, 2013. http://www.dis.uniroma1.it/challenge9/.

[3]

The 10^th dimacs implementation challenge, 2014. http://www.cc.gatech.edu/dimacs10/.

[4]

U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321--347, 2002.

[5]

U. A. Acar, A. Charguéraud, and M. Rainey. Oracle scheduling: Controlling granularity in implicitly parallel languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2011.

Digital Library

[6]

U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP '13, 2013.

Digital Library

[7]

U. A. Acar, A. Charguéraud, and M. Rainey. Theory and practice of chunked sequences. In ESA 2014, volume 8737 of LNCS, pages 25--36. Springer Berlin Heidelberg, 2014.

[8]

V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph exploration on multicore processors. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13--19, 2010, pages 1--11, 2010.

Digital Library

[9]

A. Aggarwal, R. J. Anderson, and M. Kao. Parallel depth-first search in general directed graphs. SIAM J. Comput., 19(2):397--409, 1990.

Digital Library

[10]

D. A. Bader and K. Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray MTA-2. In 2006 International Conference on Parallel Processing (ICPP 2006), 14--18 August 2006, Columbus, Ohio, USA, pages 523--530, 2006.

Digital Library

[11]

S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In SC '12, pages 12:1--12:10, Los Alamitos, CA, USA, 2012. IEEE.

Digital Library

[12]

R. Berrendorf and M. Makulla. Level-synchronous parallel breadth-first search algorithms for multicore and multiprocessor systems. In FC '14, pages 26--31, 2014

[13]

G. E. Blelloch, P. Cheng, and P. B. Gibbons. Room synchronizations. In SPAA '01, pages 122--133. ACM, 2001.

Digital Library

[14]

G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In PPoPP '12, pages 181--192, New York, NY, USA, 2012. ACM.

Digital Library

[15]

R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720--748, Sept. 1999.

Digital Library

[16]

D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SIAM SDM, 2004.

[17]

D. Chase and Y. Lev. Dynamic circular work-stealing deque. In SPAA '05, pages 21--28, 2005.

Digital Library

[18]

C.-Y. Cher, A. L. Hosking, and T. Vijaykumar. Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign. In ASPLOS '04, volume 38, pages 199--210. ACM, 2004.

Digital Library

[19]

J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and efficient graph traversal algorithm for cpus: Maximizing single-node efficiency. In IPDPS '12, pages 378--389. IEEE, 2012.

Digital Library

[20]

G. Cong, S. B. Kodali, S. Krishnamoorthy, D. Lea, V. A. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, pages 536--545, 2008.

Digital Library

[21]

T. A. Davis. University of florida sparse matrix collection, 2010. Available at http://www.cise.ufl.edu/research/sparse/matrices/.

Digital Library

[22]

T. Endo, K. Taura, and A. Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In SC '97, pages 48--48. IEEE, 1997.

Digital Library

[23]

C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In JVM '01, 2001.

Digital Library

[24]

T. Hagerup. Planar depth-first search in o(log n) parallel time. SIAM J. Comput., 19(4):678--704, 1990.

Digital Library

[25]

Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger. KLA: A new algorithmic paradigm for parallel graph computations. In PACT '14, pages 27--38, New York, NY, USA, 2014. ACM.

Digital Library

[26]

D. Hendler and N. Shavit. Non-blocking steal-half work queues. In PODC '02, pages 280--289, 2002.

Digital Library

[27]

X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2004, October 24--28, 2004, Vancouver, BC, Canada, pages 69--80, 2004.

Digital Library

[28]

Intel. Cilk Plus. http://www.cilkplus.org/.

[29]

R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 2011.

[30]

V. Kumar and V. Rao. Parallel depth first search. part ii. analysis. International Journal of Parallel Programming, 16(6):501--519, 1987.

Digital Library

[31]

H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW '10, pages 591--600. ACM, 2010.

Digital Library

[32]

C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm. SPAA '10, pages 303--314, New York, NY, USA, 2010. ACM.

Digital Library

[33]

A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In SIGCOMM '07, pages 29--42. ACM, 2007.

Digital Library

[34]

D. Mizell and K. J. Maschhoff. Early experiences with large-scale cray XMT systems. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23--29, 2009, pages 1--9, 2009.

Digital Library

[35]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP '13, pages 456--471. ACM, 2013.

Digital Library

[36]

M. Patwary, P. Refsnes, and F. Manne. Multi-core spanning forest algorithms using the disjoint-set data structure. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 827--835, May 2012.

Digital Library

[37]

M. J. Quinn and N. Deo. Parallel graph algorithms. ACM Comput. Surv., 16(3):319--348, 1984.

Digital Library

[38]

V. Rao and V. Kumar. Parallel depth first search. part i. implementation. IJPP, 16(6):479--499, 1987.

Digital Library

[39]

E. Reghbati and D. G. Corneil. Parallel computations in graph theory. SIAM J. Comput., 7(2):230--237, 1978.

[40]

E. Reghbati (Arjomandi) and D. Corneil. Parallel computations in graph theory. SIAM JoC, 7(2):230--237, 1978.

[41]

J. H. Reif. Depth-first search is inherently sequential. Inf. Process. Lett., 20(5):229--234, 1985.

[42]

V. A. Saraswat, P. Kambadur, S. B. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In C. Cascaval and P.-C. Yew, editors, PPOPP, pages 201--212. ACM, 2011.

Digital Library

[43]

A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on gpus and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, New York, NY, USA, 2013. ACM.

Digital Library

[44]

E. Saule and Ü. V. Çatalyürek. An early evaluation of the scalability of graph algorithms on the intel MIC architecture. In 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPS 2012, Shanghai, China, May 21--25, 2012, pages 1629--1639, 2012

Digital Library

[45]

J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPOPP '13, pages 135--146, New York, NY, USA, 2013. ACM.

Digital Library

[46]

J. Shun, L. Dhulipala, and G. Blelloch. A simple and practical linear-work parallel algorithm for connectivity. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 143--153, 2014.

Digital Library

[47]

F. Siebert. Concurrent, parallel, real-time garbage-collection. In ACM Sigplan Notices, volume 45, pages 11--20. ACM, 2010.

Digital Library

[48]

A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua. Lazy scheduling: A runtime adaptive scheduler for declarative parallelism. TOPLAS, 36(3):10:1--10:51, Sept. 2014.

Digital Library

[49]

C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EUROSYS '09, pages 205--218. Acm, 2009.

Digital Library

[50]

Y. Xia and V. K. Prasanna. Topologically adaptive parallel breadth-first search on multicore processors. In IASTED '09, volume 668, page 91, 2009.

Cited By

Papon TChen TZhang SAthanassoulis M(2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654928
Schwing GGrosu DSchwiebert L(2024)Parallel Maximum Cardinality Matching for General Graphs on GPUs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00157(880-889)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00157
Schwing GGrosu DSchwiebert L(2024)Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General Graphs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00107(530-539)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00107
Show More Cited By

Index Terms

A work-efficient algorithm for parallel unordered depth-first search

Recommendations

Parallel Depth-First Search for Directed Acyclic Graphs
IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms

Depth-First Search (DFS) is a pervasive algorithm, often used as a building block for topological sort, connectivity and planarity testing, among many other applications. We propose a novel work-efficient parallel algorithm for the DFS traversal of ...
A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

We have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph using the Cilk++ extensions to C++. Our PBFS program on a single processor runs as quickly as a standar. C++ breadth-first search implementation. PBFS ...
Recognizing Unordered Depth-First Search Trees of an Undirected Graph in Parallel

Let $G$ be an undirected graph and $T$ be a spanning tree of $G$. In this paper, an efficient parallel algorithm is proposed for determining whether $T$ is an unordered depth-first search tree of $G$. The proposed algorithm runs in $O(m/p + \log m)$ ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2015

985 pages

ISBN:9781450337236

DOI:10.1145/2807591

General Chair:
Jackie Kern
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Jeffrey S. Vetter
Oak Ridge National Laboratory and Georgia Institute of Technology, Oak Ridge, Tennessee

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15 - 20, 2015

Texas, Austin

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
1,091
Total Downloads

Downloads (Last 12 months)204
Downloads (Last 6 weeks)21

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Papon TChen TZhang SAthanassoulis M(2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654928
Schwing GGrosu DSchwiebert L(2024)Parallel Maximum Cardinality Matching for General Graphs on GPUs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00157(880-889)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00157
Schwing GGrosu DSchwiebert L(2024)Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General Graphs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00107(530-539)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00107
Papon T(2024)Enhancing Data Systems Performance by Exploiting SSD Concurrency & Asymmetry2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00454(5644-5648)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00454
Wang LDong XGu YSun Y(2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589259
Ji YLiu HHu YHuang H(2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
https://dl.acm.org/doi/10.1145/3543542
Mei HChen HJin HHua QZhou B(2021)Efficient Complete Event Trend Detection over High-Velocity StreamsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472526(1-12)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472526
Sha QYang QLi G(2021)A Parallel Implementation of Liveness on Knowledge Graphs under Label Constraints2021 International Symposium on Theoretical Aspects of Software Engineering (TASE)10.1109/TASE52547.2021.00016(103-110)Online publication date: Aug-2021
https://doi.org/10.1109/TASE52547.2021.00016
Quer SCalabrese A(2020)Graph Reachability on Parallel Many-Core ArchitecturesComputation10.3390/computation80401038:4(103)Online publication date: 2-Dec-2020
https://doi.org/10.3390/computation8040103
Li MHawrylak PHale J(2020)Implementing an Attack Graph Generator in CUDA2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00128(730-738)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00128
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents