article

Efficient decomposition of strongly connected components on GPUs

Authors:

Fumin YangAuthors Info & Claims

Journal of Systems Architecture: the EUROMICRO Journal, Volume 60, Issue 1

Pages 1 - 10

https://doi.org/10.1016/j.sysarc.2013.10.014

Published: 01 January 2014 Publication History

Abstract

The GPU (Graphics Processing Unit) has recently become one of the most power efficient processors in embedded and many other environments, and has been integrated into more and more SoCs (System on Chip). Thus modern GPUs play a very important role in power aware computing. Strongly Connected Component (SCC) decomposition is a fundamental graph algorithm which has wide applications in model checking, electronic design automation, social network analysis and other fields. GPUs have been shown to have great potential in accelerating many types of computations including graph algorithms. Recent work have demonstrated the plausibility of GPU SCC decomposition, but the implementation is inefficient due to insufficient consideration of the distinguishing GPU programming model, which leads to poor performance on irregular and sparse graphs. This paper presents a new GPU SCC decomposition algorithm that focuses on full utilization of the contemporary embedded and desktop GPU architecture. In particular, a subgraph numbering scheme is proposed to facilitate the safe and efficient management of the subgraph IDs and to serve as the basis of efficient source selection. Furthermore, we adopt a multi-source partition procedure that greatly reduces the recursion depth and use a vertex labeling approach that can highly optimize the GPU memory access. The evaluation results show that the proposed approach achieves up to 41x speedup over Tarjan's algorithm, one of the most efficient sequential SCC decomposition algorithms, and up to 3.8x speedup over the previous GPU algorithms.

References

[1]

L.K. Fleischer, B. Hendrickson, A. Pinar. On identifying strongly connected components in parallel, in: Parallel and Distributed Processing, vol. 1800 of LNCS, Springer, 2000, pp. 505-511.

[2]

S. Orzan. On Distributed Verification and Verified Distribution, PhD thesis, Free University of Amsterdam, 2004.

[3]

D. Coppersmith, L. Fleischer, B. Hendrickson, A. Pinar. A divide-and-conquer algorithm for identifying strongly connected components, Technical Report RC23744, IBM Research, 2005.

[4]

Barnat, J., Chaloupka, J. and van de Pol, J., Distributed algorithms for SCC decomposition. J. Logic Comp. v21 i1. 23-44.

[5]

J. Barnat, P. Moravec, Parallel Algorithms for Finding SCCs in Implicitly Given Graphs, in: Formal Methods: Applications and Technology, vol. 4346 of LNCS, Springer, 2006, pp. 316-330.

[6]

Tarjan, R., Depth-first search and linear graph algorithms. SIAM J. Comput. v1 i2. 146-160.

[7]

E.W. Dijkstra, A Discipline of Programming, Prentice Hall, NJ, Ch.25, 1976.

[8]

Depth-first search is inherently sequential. Inf. Process. Lett. v20 i5. 229-234.

[9]

J. Barnat, P. Bauch, L. Brim, M. Ceska. Computing Strongly Connected Components in Parallel on CUDA, in: Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS'11), IEEE Computer Society, 2011, pp. 541-552.

Digital Library

[10]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms, 3rd ed., The MIT Press, 2009.

[11]

P. Harish, P.J. Narayanan, Accelerating large graph algorithms on the GPU using CUDA, in: HiPC'07: Proceedings of the 14th international conference on High performance computing, Berlin, Heidelberg, Springer-Verlag, 2007, pp. 197-208.

Digital Library

[12]

S. Hong, S.K. Kim, T. Oguntebi, K. Olukotun, Accelerating CUDA graph algorithms at maximum warp, in: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP'11, New York, NY, USA, ACM, 2011, pp. 267-276.

Digital Library

[13]

L. Luo, M. Wong, W. Hwu, An effective gpu implementation of breadth-first search, in: Proceedings of the 47th Design Automation Conference, DAC'10, New York, NY, USA, ACM, 2010, 52-55.

Digital Library

[14]

D. Merrill, M. Garland, A. Grimshaw, Scalable GPU Graph Traversal, in: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP'12), ACM, 2012, pp. 117-128.

[15]

D.A. Bader, K. Madduri, GTgraph: A synthetic graph generator suite in: Technical Report GA 30332, Georgia Institute of Technology, Atlanta, 2006.

[16]

NVIDIA CUDA C Programming Guide Version 4.2. https://developer.nvidia.com/cuda-downloads, 2012.

[17]

NVIDIA, CUDA. Available from: <http://www.nvidia.com/cuda/>.

[18]

D. Chakrabarti, Y. Zhan, C. Faloutsos. R-MAT: A Recursive Model for Graph Mining. In SDM, SIAM, 2004, pp. 442-446.

[19]

D.A. Bader, K. Madduri, Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors. In HiPC, volume 3769 of LNCS, Springer, 2005, pp. 465-476.

[20]

University of Florida Sparse Matrix Collection. Available from: <http://www.cise.ufl.edu/research/sparse/matrices/>.

[21]

Manssen, M., Weigel, M. and Hartmann, A.K., Random number generators for massively parallel simulations on GPU. Eur. Phys. J. Special Topics. v210. 53-71.

[22]

Hubert Nguyen, GPU gems 3, Addison-Wesley Professional, 2007.

[23]

S. Warren, Finding Strongly Connected Components in Parallel Using O(log2n) Reachability Queries, In SPAA, ACM, 2008, pp. 146-151.

[24]

Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.P., Numerical Recipes: The Art of Scientific Computing. 2007. 3rd edition. Cambridge University Press, Cambridge.

[25]

K.T.T. Cheng, Y.C. Wang, Using mobile gpu for general-purpose computing - a case study of face recognition on smartphones, International Symposium on VLSI Design, Automation and Test (VLSI-DAT), IEEE, 2011, pp. 1-4.

[26]

G. Calandrini, A. Gardel, P. Revenga, J.L. Lázaro, GPU Acceleration on embedded devices a power consumption approach, in: Proceedings of the 14th International Conference on High Performance Computing and Communication & 9th International Conference on Embedded Software and Systems (HPCC-ICESS), IEEE, 2012, pp. 1806-1812.

[27]

D. Merrill, A. Grimshaw, Parallel Scan for Stream Architectures, Technical Report #CS2009-14, Department of Computer Science, University of Virginia, 2009.

Cited By

Xie HFang YXia YLuo WMa C(2023)On Querying Connected Components in Large Temporal GraphsProceedings of the ACM on Management of Data10.1145/35893151:2(1-27)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589315
Wang LDong XGu YSun Y(2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589259
Niewenhuis DVarbanescu ASterpone LBartolini AButko A(2022)Efficient trimming for strongly connected components calculationProceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530247(131-140)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3528416.3530247
Show More Cited By

Efficient decomposition of strongly connected components on GPUs
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs
PMAM'17: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores

Detecting strongly connected components (SCC) has been broadly used in many real-world applications. To speedup SCC detection for large-scale graphs, parallel algorithms have been proposed to leverage modern GPUs. Existing GPU implementations are able ...
Auto-tuning dense vector and matrix-vector operations for fermi GPUs
PPAM'11: Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I

In this paper, we consider the automatic performance tuning of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore ...
Throughput and Power Efficiency Evaluations of Block Ciphers on Kepler and GCN GPUs
CANDAR '13: Proceedings of the 2013 First International Symposium on Computing and Networking

Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption is a primary deterrent for data center security on cloud services and handheld devices such as smartphones and ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Systems Architecture: the EUROMICRO Journal

Journal of Systems Architecture: the EUROMICRO Journal Volume 60, Issue 1

January, 2014

149 pages

ISSN:1383-7621

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2013.

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 01 January 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xie HFang YXia YLuo WMa C(2023)On Querying Connected Components in Large Temporal GraphsProceedings of the ACM on Management of Data10.1145/35893151:2(1-27)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589315
Wang LDong XGu YSun Y(2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589259
Niewenhuis DVarbanescu ASterpone LBartolini AButko A(2022)Efficient trimming for strongly connected components calculationProceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530247(131-140)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3528416.3530247
Hou JWang SWu GMa BZhang L(2020)Parallel SCC Detection Based on Reusing Warps and Coloring Partitions on GPUsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_3(31-46)Online publication date: 2-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60245-1_3
Hou JWang SWu GFu GJia SWang YLi BZhang L(2019)Parallel Strongly Connected Components Detection with Multi-partition on GPUsComputational Science – ICCS 201910.1007/978-3-030-22747-0_2(16-30)Online publication date: 12-Jun-2019
https://dl.acm.org/doi/10.1007/978-3-030-22747-0_2
Xu TWang G(2018)Finding strongly connected components of simple digraphs based on generalized rough sets theoryKnowledge-Based Systems10.1016/j.knosys.2018.02.038149:C(88-98)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1016/j.knosys.2018.02.038
Tran HCambria E(2018)A survey of graph processing on graphics processing unitsThe Journal of Supercomputing10.1007/s11227-017-2225-174:5(2086-2115)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s11227-017-2225-1
Devshatwar SAmilkanthwar MNasre RKaeli DCavazos J(2016)GPU centric extensions for parallel strongly connected components computationProceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/2884045.2884048(2-11)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2884045.2884048
Wijs AKatoen JBošnaăźKi D(2016)Efficient GPU algorithms for parallel decomposition of graphs into strongly connected and maximal end componentsFormal Methods in System Design10.1007/s10703-016-0246-748:3(274-300)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10703-016-0246-7
Wijs AKatoen JBošnaăki D(2014)GPU-Based Graph Decomposition into Strongly Connected and Maximal End ComponentsProceedings of the 16th International Conference on Computer Aided Verification - Volume 855910.1007/978-3-319-08867-9_20(310-326)Online publication date: 18-Jul-2014
https://dl.acm.org/doi/10.1007/978-3-319-08867-9_20

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents