Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3208040.3208041acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

A high-performance connected components implementation for GPUs

Published: 11 June 2018 Publication History

Abstract

Computing connected components is an important graph algorithm that is used, for example, in medicine, image processing, and biochemistry. This paper presents a fast connected-components implementation for GPUs called ECL-CC. It builds upon the best features of prior algorithms and augments them with GPU-specific optimizations. For example, it incorporates a parallelism-friendly version of pointer jumping to speed up union-find operations and uses several compute kernels to exploit the multiple levels of hardware parallelism. The resulting CUDA code is asynchronous and lock free, employs load balancing, visits each edge exactly once, and only processes edges in one direction. It is 1.8 times faster on average than the fastest prior GPU implementation running on a Titan X and faster on most of the eighteen real-world and synthetic graphs we tested.

References

[1]
Ahmad, M., F. Hijaz, Q. Shi, and O. Khan. "CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores." 2015 IEEE International Symposium on Workload Characterization, pp. 44--55, 2015.
[2]
Ben-Nun, T., M. Sutton, S. Pai, and K. Pingali. "Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations." 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 235--248, 2017.
[3]
Boost, http://www.boost.org/doc/libs/1_62_0/boost/graph/connected_components.hpp, last accessed on 1/23/2018.
[4]
CRONO, https://github.com/masabahmad/CRONO, last accessed on 8/2/2016.
[5]
Csardi G. and T. Nepusz. "The igraph Software Package for Complex Network Research." InterJournal, Complex Systems 1695, 2006.
[6]
Dezső, B., A. Jüttner, and P. Kovács. "LEMON - An Open Source C++ Graph Template Library." Electronic Notes in Theoretical Computer Science, 264(5):23--45, 2011.
[7]
DIMACS, http://www.dis.uniromal.it/challenge9/download.shtml, last accessed on 1/23/2018.
[8]
ECL-CC, http://cs.txstate.edu/~burtscher/research/ECL-CC/, last accessed on 1/23/2018.
[9]
Galler, B.A. and M.J. Fischer. "An Improved Equivalence Algorithm." Communications of the ACM, 7:301--303, 1964.
[10]
Galois, http://iss.ices.utexas.edu/projects/galois/downloads/Galois-2.3.0.tar.bz2, last accessed on 1/23/2018.
[11]
Greiner, J. "A Comparison of Parallel Algorithms for Connected Components." Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 16--25, 1994.
[12]
Groute, https://github.com/groute/groute/tree/master/samples/cc, last accessed on 1/23/2018.
[13]
Gunrock https://github.com/gunrock/gunrock/blob/master/tests/cc/test_cc.cu, last accessed on 1/23/2018.
[14]
He, L., X. Ren, Q. Gao, X. Zhao, B. Yao, and Y. Chao. "The Connected-Component Labeling Problem: A Review of State-of-the-Art Algorithms." Pattern Recognition, 70:25--43, 2017.
[15]
Hopcroft, J. and R. Tarjan. "Algorithm 447: Efficient Algorithms for Graph Manipulation." Communications of the ACM, 16(6):372--378, 1973.
[16]
Hossam, M.M., A.E. Hassanien, and M. Shoman. "3D Brain Tumor Segmentation Scheme using K-Mean Clustering and Connected Component Labeling Algorithms." 10th International Conference on Intelligent Systems Design and Applications, pp. 320--324, 2010.
[17]
igraph, https://github.com/igraph/igraph/blob/master/src/components.c, last accessed on 1/23/2018.
[18]
IrGL, code obtained from Sreepathi Pai.
[19]
Kulkarni, M., K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L.P. Chew. "Optimistic Parallelism Requires Abstractions." 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 211--222, 2007.
[20]
Lemon, http://lemon.cs.elte.hu/trac/lemon/wiki/Downloads, last accessed on 1/23/2018.
[21]
Ligra+ BFSCC, https://github.com/jshun/ligra/blob/master/apps/BFSCC.C, last accessed on 1/23/2018.
[22]
Ligra+ Comp, https://github.com/jshun/ligra/blob/master/apps/Components.C, last accessed on 1/23/2018.
[23]
Liu, H. and H. H. Huang. "Enterprise: Breadth-First Graph Traversal on GPUs." International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1--12, 2015.
[24]
Multistep, https://github.com/HPCGraphAnalysis/Connectivity, last accessed on 5/8/2018.
[25]
ndHybrid, https://people.csail.mit.edu/jshun/connectedComponents.tar, last accessed on 5/8/2018.
[26]
Pai, S. and K. Pingali. "A Compiler for Throughput Optimization of Graph Algorithms on GPUs." 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 1--19, 2016.
[27]
Patwary, M. M. A., P. Refsnes, and F. Manne. "Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure." IEEE 26th International Parallel and Distributed Processing Symposium, pp. 827--835, 2012.
[28]
Shiloach, Y., and U. Vishkin. "An O(log n) Parallel Connectivity Algorithm." Journal of Algorithms, 3(1):57--67, 1982.
[29]
Shun, J. and G.E. Blelloch. "Ligra: A Lightweight Graph Processing Framework for Shared Memory." 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 135--146, 2013.
[30]
Shun, J., L. Dhulipala, and G.E. Blelloch. "A Simple and Practical Linear-Work Parallel Algorithm for Connectivity." 26th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 143--153, 2014.
[31]
Shun, J., L. Dhulipala, and G.E. Blelloch. "Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+." 2015 Data Compression Conference, pp. 403--412, 2015.
[32]
Siek, J.G., L.-Q. Lee, and A. Lumsdaine. "The Boost Graph Library: User Guide and Reference Manual." Addison-Wesley, 2001. ISBN 978-0-201-72914-6.
[33]
Slota, G.M., S. Rajamanickam, and K. Madduri. "BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related Problems." 28th IEEE International Parallel and Distributed Processing Symposium, pp. 550--559, 2014.
[34]
SNAP, https://snap.stanford.edu/data/, last accessed on 1/23/2018.
[35]
Soman, https://github.com/jyosoman/GpuConnectedComponents/blob/master/conn.cu, last accessed on 1/23/2018.
[36]
Soman, J., K. Kishore, and P. J. Narayanan. "A Fast GPU Algorithm for Graph Connectivity." 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Ph.D. Forum (IPDPSW), pp. 1--8, 2010.
[37]
Sparse Matrix Collection, https://sparse.tamu.edu/, last accessed on 1/23/2018.
[38]
Wang, Y., A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J.D. Owens. "Gunrock: A High-performance Graph Processing Library on the GPU." 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Article 11, 12 pages, 2016.
[39]
Wu, M., X. Li, C.K. Kwoh, and S. K. Ng. "A Core-Attachment-Based Method to Detect Protein Complexes in PPI Networks." BMC Bioinformatics, 10(1):169, 2009.

Cited By

View all
  • (2025)Advances in ArborX to support exascale applicationsInternational Journal of High Performance Computing Applications10.1177/1094342024129829639:1(167-176)Online publication date: 1-Jan-2025
  • (2024)Increasing Parallelism in Forward-backward Distributed Algorithm for Finding Strongly Connected Components of Directed GraphsJournal of Telecommunications and Information Technology10.26636/jtit.2024.3.1693Online publication date: 30-Sep-2024
  • (2024)Real-time soft body dissection simulation with parallelized graph-based shape matching on GPUComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2024.108171250:COnline publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
June 2018
291 pages
ISBN:9781450357852
DOI:10.1145/3208040
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU implementation
  2. connected components
  3. graph algorithm
  4. parallelism
  5. union-find

Qualifiers

  • Research-article

Conference

HPDC '18

Acceptance Rates

HPDC '18 Paper Acceptance Rate 22 of 111 submissions, 20%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)13
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Advances in ArborX to support exascale applicationsInternational Journal of High Performance Computing Applications10.1177/1094342024129829639:1(167-176)Online publication date: 1-Jan-2025
  • (2024)Increasing Parallelism in Forward-backward Distributed Algorithm for Finding Strongly Connected Components of Directed GraphsJournal of Telecommunications and Information Technology10.26636/jtit.2024.3.1693Online publication date: 30-Sep-2024
  • (2024)Real-time soft body dissection simulation with parallelized graph-based shape matching on GPUComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2024.108171250:COnline publication date: 1-Jun-2024
  • (2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
  • (2023)Fast tree-based algorithms for DBSCAN for low-dimensional data on GPUsProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605594(503-512)Online publication date: 7-Aug-2023
  • (2023)A High-Performance MST Implementation for GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607093(1-13)Online publication date: 12-Nov-2023
  • (2023)Reduce, Reuse, and Adapt: Accelerating Graph Processing on GPUs2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00050(335-346)Online publication date: 18-Dec-2023
  • (2022)Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental AnalysisACM Journal of Experimental Algorithmics10.1145/356459327(1-31)Online publication date: 13-Dec-2022
  • (2022)Improving the Speed and Quality of Parallel Graph ColoringACM Transactions on Parallel Computing10.1145/35435459:3(1-35)Online publication date: 18-Aug-2022
  • (2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media