research-article

A high-performance connected components implementation for GPUs

Authors:

Jayadharini Jaiganesh,

Martin BurtscherAuthors Info & Claims

HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing

Pages 92 - 104

https://doi.org/10.1145/3208040.3208041

Published: 11 June 2018 Publication History

Abstract

Computing connected components is an important graph algorithm that is used, for example, in medicine, image processing, and biochemistry. This paper presents a fast connected-components implementation for GPUs called ECL-CC. It builds upon the best features of prior algorithms and augments them with GPU-specific optimizations. For example, it incorporates a parallelism-friendly version of pointer jumping to speed up union-find operations and uses several compute kernels to exploit the multiple levels of hardware parallelism. The resulting CUDA code is asynchronous and lock free, employs load balancing, visits each edge exactly once, and only processes edges in one direction. It is 1.8 times faster on average than the fastest prior GPU implementation running on a Titan X and faster on most of the eighteen real-world and synthetic graphs we tested.

References

[1]

Ahmad, M., F. Hijaz, Q. Shi, and O. Khan. "CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores." 2015 IEEE International Symposium on Workload Characterization, pp. 44--55, 2015.

Digital Library

[2]

Ben-Nun, T., M. Sutton, S. Pai, and K. Pingali. "Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations." 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 235--248, 2017.

Digital Library

[3]

Boost, http://www.boost.org/doc/libs/1_62_0/boost/graph/connected_components.hpp, last accessed on 1/23/2018.

[4]

CRONO, https://github.com/masabahmad/CRONO, last accessed on 8/2/2016.

[5]

Csardi G. and T. Nepusz. "The igraph Software Package for Complex Network Research." InterJournal, Complex Systems 1695, 2006.

[6]

Dezső, B., A. Jüttner, and P. Kovács. "LEMON - An Open Source C++ Graph Template Library." Electronic Notes in Theoretical Computer Science, 264(5):23--45, 2011.

Digital Library

[7]

DIMACS, http://www.dis.uniromal.it/challenge9/download.shtml, last accessed on 1/23/2018.

[8]

ECL-CC, http://cs.txstate.edu/~burtscher/research/ECL-CC/, last accessed on 1/23/2018.

[9]

Galler, B.A. and M.J. Fischer. "An Improved Equivalence Algorithm." Communications of the ACM, 7:301--303, 1964.

Digital Library

[10]

Galois, http://iss.ices.utexas.edu/projects/galois/downloads/Galois-2.3.0.tar.bz2, last accessed on 1/23/2018.

[11]

Greiner, J. "A Comparison of Parallel Algorithms for Connected Components." Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 16--25, 1994.

Digital Library

[12]

Groute, https://github.com/groute/groute/tree/master/samples/cc, last accessed on 1/23/2018.

[13]

Gunrock https://github.com/gunrock/gunrock/blob/master/tests/cc/test_cc.cu, last accessed on 1/23/2018.

[14]

He, L., X. Ren, Q. Gao, X. Zhao, B. Yao, and Y. Chao. "The Connected-Component Labeling Problem: A Review of State-of-the-Art Algorithms." Pattern Recognition, 70:25--43, 2017.

Digital Library

[15]

Hopcroft, J. and R. Tarjan. "Algorithm 447: Efficient Algorithms for Graph Manipulation." Communications of the ACM, 16(6):372--378, 1973.

Digital Library

[16]

Hossam, M.M., A.E. Hassanien, and M. Shoman. "3D Brain Tumor Segmentation Scheme using K-Mean Clustering and Connected Component Labeling Algorithms." 10th International Conference on Intelligent Systems Design and Applications, pp. 320--324, 2010.

[17]

igraph, https://github.com/igraph/igraph/blob/master/src/components.c, last accessed on 1/23/2018.

[18]

IrGL, code obtained from Sreepathi Pai.

[19]

Kulkarni, M., K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L.P. Chew. "Optimistic Parallelism Requires Abstractions." 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 211--222, 2007.

Digital Library

[20]

Lemon, http://lemon.cs.elte.hu/trac/lemon/wiki/Downloads, last accessed on 1/23/2018.

[21]

Ligra+ BFSCC, https://github.com/jshun/ligra/blob/master/apps/BFSCC.C, last accessed on 1/23/2018.

[22]

Ligra+ Comp, https://github.com/jshun/ligra/blob/master/apps/Components.C, last accessed on 1/23/2018.

[23]

Liu, H. and H. H. Huang. "Enterprise: Breadth-First Graph Traversal on GPUs." International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1--12, 2015.

Digital Library

[24]

Multistep, https://github.com/HPCGraphAnalysis/Connectivity, last accessed on 5/8/2018.

[25]

ndHybrid, https://people.csail.mit.edu/jshun/connectedComponents.tar, last accessed on 5/8/2018.

[26]

Pai, S. and K. Pingali. "A Compiler for Throughput Optimization of Graph Algorithms on GPUs." 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 1--19, 2016.

Digital Library

[27]

Patwary, M. M. A., P. Refsnes, and F. Manne. "Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure." IEEE 26th International Parallel and Distributed Processing Symposium, pp. 827--835, 2012.

Digital Library

[28]

Shiloach, Y., and U. Vishkin. "An O(log n) Parallel Connectivity Algorithm." Journal of Algorithms, 3(1):57--67, 1982.

Digital Library

[29]

Shun, J. and G.E. Blelloch. "Ligra: A Lightweight Graph Processing Framework for Shared Memory." 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 135--146, 2013.

Digital Library

[30]

Shun, J., L. Dhulipala, and G.E. Blelloch. "A Simple and Practical Linear-Work Parallel Algorithm for Connectivity." 26th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 143--153, 2014.

Digital Library

[31]

Shun, J., L. Dhulipala, and G.E. Blelloch. "Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+." 2015 Data Compression Conference, pp. 403--412, 2015.

Digital Library

[32]

Siek, J.G., L.-Q. Lee, and A. Lumsdaine. "The Boost Graph Library: User Guide and Reference Manual." Addison-Wesley, 2001. ISBN 978-0-201-72914-6.

[33]

Slota, G.M., S. Rajamanickam, and K. Madduri. "BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related Problems." 28th IEEE International Parallel and Distributed Processing Symposium, pp. 550--559, 2014.

Digital Library

[34]

SNAP, https://snap.stanford.edu/data/, last accessed on 1/23/2018.

[35]

Soman, https://github.com/jyosoman/GpuConnectedComponents/blob/master/conn.cu, last accessed on 1/23/2018.

[36]

Soman, J., K. Kishore, and P. J. Narayanan. "A Fast GPU Algorithm for Graph Connectivity." 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Ph.D. Forum (IPDPSW), pp. 1--8, 2010.

[37]

Sparse Matrix Collection, https://sparse.tamu.edu/, last accessed on 1/23/2018.

[38]

Wang, Y., A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J.D. Owens. "Gunrock: A High-performance Graph Processing Library on the GPU." 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Article 11, 12 pages, 2016.

Digital Library

[39]

Wu, M., X. Li, C.K. Kwoh, and S. K. Ng. "A Core-Attachment-Based Method to Detect Protein Complexes in PPI Networks." BMC Bioinformatics, 10(1):169, 2009.

Cited By

Heroux MProkopenko AArndt DLebrun-Grandié DTurcksin BFrontiere NEmberson JBuehlmann M(2025)Advances in ArborX to support exascale applicationsInternational Journal of High Performance Computing Applications10.1177/1094342024129829639:1(167-176)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1177/10943420241298296
Ryżko D(2024)Increasing Parallelism in Forward-backward Distributed Algorithm for Finding Strongly Connected Components of Directed GraphsJournal of Telecommunications and Information Technology10.26636/jtit.2024.3.1693Online publication date: 30-Sep-2024
https://doi.org/10.26636/jtit.2024.3.1693
Yu PZhao ZWang RPan J(2024)Real-time soft body dissection simulation with parallelized graph-based shape matching on GPUComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2024.108171250:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.cmpb.2024.108171
Show More Cited By

Index Terms

A high-performance connected components implementation for GPUs
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent algorithms
2. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

A GPU Algorithm for Detecting Strongly Connected Components
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Detecting strongly connected components (SCCs) is an important step in various graph computations. The fastest GPU and CPU implementations from the literature work well on graphs where most of the vertices belong to a single SCC and the vertex degrees ...
A High-Performance MST Implementation for GPUs
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Finding a minimum spanning tree (MST) is a fundamental graph algorithm with applications in many fields. This paper presents ECL-MST, a fast MST implementation designed specifically for GPUs. ECL-MST is based on a parallelization approach that unifies ...
Fast Parallel Connected Components Algorithms on GPUs
Revised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 8805

We study parallel connected components algorithms on GPUs in comparison with CPUs. Although straightforward implementation of PRAM algorithms performs relatively better on GPUs than on CPUs, the GPU memory subsystem performance is poor due to non-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing

June 2018

291 pages

ISBN:9781450357852

DOI:10.1145/3208040

General Chair:
Ming Zhao
Arizona State University
,
Program Chairs:
Abhishek Chandra
University of Minnesota
,
Lavanya Ramakrishnan
Lawrence Berkeley National Lab

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '18

Sponsor:

HPDC '18: The 27th International Symposium on High-Performance Parallel and Distributed Computing

June 11 - 15, 2018

Arizona, Tempe

Acceptance Rates

HPDC '18 Paper Acceptance Rate 22 of 111 submissions, 20%;

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
321
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)13

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Heroux MProkopenko AArndt DLebrun-Grandié DTurcksin BFrontiere NEmberson JBuehlmann M(2025)Advances in ArborX to support exascale applicationsInternational Journal of High Performance Computing Applications10.1177/1094342024129829639:1(167-176)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1177/10943420241298296
Ryżko D(2024)Increasing Parallelism in Forward-backward Distributed Algorithm for Finding Strongly Connected Components of Directed GraphsJournal of Telecommunications and Information Technology10.26636/jtit.2024.3.1693Online publication date: 30-Sep-2024
https://doi.org/10.26636/jtit.2024.3.1693
Yu PZhao ZWang RPan J(2024)Real-time soft body dissection simulation with parallelized graph-based shape matching on GPUComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2024.108171250:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.cmpb.2024.108171
Concessao KCheramangalath UDev RNasre R(2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10766-024-00774-z
Prokopenko ALebrun-Grandie DArndt D(2023)Fast tree-based algorithms for DBSCAN for low-dimensional data on GPUsProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605594(503-512)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605594
Fallin AGonzalez ASeo JBurtscher MMohror KArnold DBadia R(2023)A High-Performance MST Implementation for GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607093(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607093
A UNasre RGovindarajan R(2023)Reduce, Reuse, and Adapt: Accelerating Graph Processing on GPUs2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00050(335-346)Online publication date: 18-Dec-2023
https://doi.org/10.1109/HiPC58850.2023.00050
Liu XLumsdaine AHalappanavar MBarker KGebremedhin A(2022)Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental AnalysisACM Journal of Experimental Algorithmics10.1145/356459327(1-31)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3564593
Alabandi GBurtscher M(2022)Improving the Speed and Quality of Parallel Graph ColoringACM Transactions on Parallel Computing10.1145/35435459:3(1-35)Online publication date: 18-Aug-2022
https://dl.acm.org/doi/10.1145/3543545
Ji YLiu HHu YHuang H(2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
https://dl.acm.org/doi/10.1145/3543542
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents