Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2807591.2807651acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Free access

A work-efficient algorithm for parallel unordered depth-first search

Published: 15 November 2015 Publication History

Abstract

Advances in processing power and memory technology have made multicore computers an important platform for high-performance graph-search (or graph-traversal) algorithms. Since the introduction of multicore, much progress has been made to improve parallel breadth-first search. However, less attention has been given to algorithms for unordered or loosely ordered traversals.
We present a parallel algorithm for unordered depth-first-search on graphs. We prove that the algorithm is work efficient in a realistic algorithmic model that accounts for important scheduling costs. This work-efficiency result applies to all graphs, including those with high diameter and high out-degree vertices. The algorithmic techniques behind this result include a new data structure for representing the frontier of vertices in depth-first search, a new amortization technique for controlling excess parallelism, and an adaptation of the lazy-splitting technique to depth first search.
We validate the theoretical results with an implementation and experiments. The experiments show that the algorithm performs well on a range of graphs and that it can lead to significant improvements over comparable algorithms.

References

[1]
Stanford large network dataset collection. http://snap.stanford.edu/.
[2]
The 9th dimacs implementation challenge, 2013. http://www.dis.uniroma1.it/challenge9/.
[3]
The 10th dimacs implementation challenge, 2014. http://www.cc.gatech.edu/dimacs10/.
[4]
U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321--347, 2002.
[5]
U. A. Acar, A. Charguéraud, and M. Rainey. Oracle scheduling: Controlling granularity in implicitly parallel languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2011.
[6]
U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP '13, 2013.
[7]
U. A. Acar, A. Charguéraud, and M. Rainey. Theory and practice of chunked sequences. In ESA 2014, volume 8737 of LNCS, pages 25--36. Springer Berlin Heidelberg, 2014.
[8]
V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph exploration on multicore processors. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13--19, 2010, pages 1--11, 2010.
[9]
A. Aggarwal, R. J. Anderson, and M. Kao. Parallel depth-first search in general directed graphs. SIAM J. Comput., 19(2):397--409, 1990.
[10]
D. A. Bader and K. Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray MTA-2. In 2006 International Conference on Parallel Processing (ICPP 2006), 14--18 August 2006, Columbus, Ohio, USA, pages 523--530, 2006.
[11]
S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In SC '12, pages 12:1--12:10, Los Alamitos, CA, USA, 2012. IEEE.
[12]
R. Berrendorf and M. Makulla. Level-synchronous parallel breadth-first search algorithms for multicore and multiprocessor systems. In FC '14, pages 26--31, 2014
[13]
G. E. Blelloch, P. Cheng, and P. B. Gibbons. Room synchronizations. In SPAA '01, pages 122--133. ACM, 2001.
[14]
G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In PPoPP '12, pages 181--192, New York, NY, USA, 2012. ACM.
[15]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720--748, Sept. 1999.
[16]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SIAM SDM, 2004.
[17]
D. Chase and Y. Lev. Dynamic circular work-stealing deque. In SPAA '05, pages 21--28, 2005.
[18]
C.-Y. Cher, A. L. Hosking, and T. Vijaykumar. Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign. In ASPLOS '04, volume 38, pages 199--210. ACM, 2004.
[19]
J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and efficient graph traversal algorithm for cpus: Maximizing single-node efficiency. In IPDPS '12, pages 378--389. IEEE, 2012.
[20]
G. Cong, S. B. Kodali, S. Krishnamoorthy, D. Lea, V. A. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, pages 536--545, 2008.
[21]
T. A. Davis. University of florida sparse matrix collection, 2010. Available at http://www.cise.ufl.edu/research/sparse/matrices/.
[22]
T. Endo, K. Taura, and A. Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In SC '97, pages 48--48. IEEE, 1997.
[23]
C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In JVM '01, 2001.
[24]
T. Hagerup. Planar depth-first search in o(log n) parallel time. SIAM J. Comput., 19(4):678--704, 1990.
[25]
Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger. KLA: A new algorithmic paradigm for parallel graph computations. In PACT '14, pages 27--38, New York, NY, USA, 2014. ACM.
[26]
D. Hendler and N. Shavit. Non-blocking steal-half work queues. In PODC '02, pages 280--289, 2002.
[27]
X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2004, October 24--28, 2004, Vancouver, BC, Canada, pages 69--80, 2004.
[28]
Intel. Cilk Plus. http://www.cilkplus.org/.
[29]
R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 2011.
[30]
V. Kumar and V. Rao. Parallel depth first search. part ii. analysis. International Journal of Parallel Programming, 16(6):501--519, 1987.
[31]
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW '10, pages 591--600. ACM, 2010.
[32]
C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm. SPAA '10, pages 303--314, New York, NY, USA, 2010. ACM.
[33]
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In SIGCOMM '07, pages 29--42. ACM, 2007.
[34]
D. Mizell and K. J. Maschhoff. Early experiences with large-scale cray XMT systems. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23--29, 2009, pages 1--9, 2009.
[35]
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP '13, pages 456--471. ACM, 2013.
[36]
M. Patwary, P. Refsnes, and F. Manne. Multi-core spanning forest algorithms using the disjoint-set data structure. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 827--835, May 2012.
[37]
M. J. Quinn and N. Deo. Parallel graph algorithms. ACM Comput. Surv., 16(3):319--348, 1984.
[38]
V. Rao and V. Kumar. Parallel depth first search. part i. implementation. IJPP, 16(6):479--499, 1987.
[39]
E. Reghbati and D. G. Corneil. Parallel computations in graph theory. SIAM J. Comput., 7(2):230--237, 1978.
[40]
E. Reghbati (Arjomandi) and D. Corneil. Parallel computations in graph theory. SIAM JoC, 7(2):230--237, 1978.
[41]
J. H. Reif. Depth-first search is inherently sequential. Inf. Process. Lett., 20(5):229--234, 1985.
[42]
V. A. Saraswat, P. Kambadur, S. B. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In C. Cascaval and P.-C. Yew, editors, PPOPP, pages 201--212. ACM, 2011.
[43]
A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on gpus and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, New York, NY, USA, 2013. ACM.
[44]
E. Saule and Ü. V. Çatalyürek. An early evaluation of the scalability of graph algorithms on the intel MIC architecture. In 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPS 2012, Shanghai, China, May 21--25, 2012, pages 1629--1639, 2012
[45]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPOPP '13, pages 135--146, New York, NY, USA, 2013. ACM.
[46]
J. Shun, L. Dhulipala, and G. Blelloch. A simple and practical linear-work parallel algorithm for connectivity. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 143--153, 2014.
[47]
F. Siebert. Concurrent, parallel, real-time garbage-collection. In ACM Sigplan Notices, volume 45, pages 11--20. ACM, 2010.
[48]
A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua. Lazy scheduling: A runtime adaptive scheduler for declarative parallelism. TOPLAS, 36(3):10:1--10:51, Sept. 2014.
[49]
C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EUROSYS '09, pages 205--218. Acm, 2009.
[50]
Y. Xia and V. K. Prasanna. Topologically adaptive parallel breadth-first search on multicore processors. In IASTED '09, volume 668, page 91, 2009.

Cited By

View all
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)Parallel Maximum Cardinality Matching for General Graphs on GPUs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00157(880-889)Online publication date: 27-May-2024
  • (2024)Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General Graphs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00107(530-539)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
  • General Chair:
  • Jackie Kern,
  • Program Chair:
  • Jeffrey S. Vetter
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC15
Sponsor:

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)204
  • Downloads (Last 6 weeks)21
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)Parallel Maximum Cardinality Matching for General Graphs on GPUs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00157(880-889)Online publication date: 27-May-2024
  • (2024)Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General Graphs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00107(530-539)Online publication date: 27-May-2024
  • (2024)Enhancing Data Systems Performance by Exploiting SSD Concurrency & Asymmetry2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00454(5644-5648)Online publication date: 13-May-2024
  • (2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
  • (2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
  • (2021)Efficient Complete Event Trend Detection over High-Velocity StreamsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472526(1-12)Online publication date: 9-Aug-2021
  • (2021)A Parallel Implementation of Liveness on Knowledge Graphs under Label Constraints2021 International Symposium on Theoretical Aspects of Software Engineering (TASE)10.1109/TASE52547.2021.00016(103-110)Online publication date: Aug-2021
  • (2020)Graph Reachability on Parallel Many-Core ArchitecturesComputation10.3390/computation80401038:4(103)Online publication date: 2-Dec-2020
  • (2020)Implementing an Attack Graph Generator in CUDA2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00128(730-738)Online publication date: May-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media