Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Efficient decomposition of strongly connected components on GPUs

Published: 01 January 2014 Publication History

Abstract

The GPU (Graphics Processing Unit) has recently become one of the most power efficient processors in embedded and many other environments, and has been integrated into more and more SoCs (System on Chip). Thus modern GPUs play a very important role in power aware computing. Strongly Connected Component (SCC) decomposition is a fundamental graph algorithm which has wide applications in model checking, electronic design automation, social network analysis and other fields. GPUs have been shown to have great potential in accelerating many types of computations including graph algorithms. Recent work have demonstrated the plausibility of GPU SCC decomposition, but the implementation is inefficient due to insufficient consideration of the distinguishing GPU programming model, which leads to poor performance on irregular and sparse graphs. This paper presents a new GPU SCC decomposition algorithm that focuses on full utilization of the contemporary embedded and desktop GPU architecture. In particular, a subgraph numbering scheme is proposed to facilitate the safe and efficient management of the subgraph IDs and to serve as the basis of efficient source selection. Furthermore, we adopt a multi-source partition procedure that greatly reduces the recursion depth and use a vertex labeling approach that can highly optimize the GPU memory access. The evaluation results show that the proposed approach achieves up to 41x speedup over Tarjan's algorithm, one of the most efficient sequential SCC decomposition algorithms, and up to 3.8x speedup over the previous GPU algorithms.

References

[1]
L.K. Fleischer, B. Hendrickson, A. Pinar. On identifying strongly connected components in parallel, in: Parallel and Distributed Processing, vol. 1800 of LNCS, Springer, 2000, pp. 505-511.
[2]
S. Orzan. On Distributed Verification and Verified Distribution, PhD thesis, Free University of Amsterdam, 2004.
[3]
D. Coppersmith, L. Fleischer, B. Hendrickson, A. Pinar. A divide-and-conquer algorithm for identifying strongly connected components, Technical Report RC23744, IBM Research, 2005.
[4]
Barnat, J., Chaloupka, J. and van de Pol, J., Distributed algorithms for SCC decomposition. J. Logic Comp. v21 i1. 23-44.
[5]
J. Barnat, P. Moravec, Parallel Algorithms for Finding SCCs in Implicitly Given Graphs, in: Formal Methods: Applications and Technology, vol. 4346 of LNCS, Springer, 2006, pp. 316-330.
[6]
Tarjan, R., Depth-first search and linear graph algorithms. SIAM J. Comput. v1 i2. 146-160.
[7]
E.W. Dijkstra, A Discipline of Programming, Prentice Hall, NJ, Ch.25, 1976.
[8]
Depth-first search is inherently sequential. Inf. Process. Lett. v20 i5. 229-234.
[9]
J. Barnat, P. Bauch, L. Brim, M. Ceska. Computing Strongly Connected Components in Parallel on CUDA, in: Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS'11), IEEE Computer Society, 2011, pp. 541-552.
[10]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms, 3rd ed., The MIT Press, 2009.
[11]
P. Harish, P.J. Narayanan, Accelerating large graph algorithms on the GPU using CUDA, in: HiPC'07: Proceedings of the 14th international conference on High performance computing, Berlin, Heidelberg, Springer-Verlag, 2007, pp. 197-208.
[12]
S. Hong, S.K. Kim, T. Oguntebi, K. Olukotun, Accelerating CUDA graph algorithms at maximum warp, in: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP'11, New York, NY, USA, ACM, 2011, pp. 267-276.
[13]
L. Luo, M. Wong, W. Hwu, An effective gpu implementation of breadth-first search, in: Proceedings of the 47th Design Automation Conference, DAC'10, New York, NY, USA, ACM, 2010, 52-55.
[14]
D. Merrill, M. Garland, A. Grimshaw, Scalable GPU Graph Traversal, in: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP'12), ACM, 2012, pp. 117-128.
[15]
D.A. Bader, K. Madduri, GTgraph: A synthetic graph generator suite in: Technical Report GA 30332, Georgia Institute of Technology, Atlanta, 2006.
[16]
NVIDIA CUDA C Programming Guide Version 4.2. https://developer.nvidia.com/cuda-downloads, 2012.
[17]
NVIDIA, CUDA. Available from: <http://www.nvidia.com/cuda/>.
[18]
D. Chakrabarti, Y. Zhan, C. Faloutsos. R-MAT: A Recursive Model for Graph Mining. In SDM, SIAM, 2004, pp. 442-446.
[19]
D.A. Bader, K. Madduri, Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors. In HiPC, volume 3769 of LNCS, Springer, 2005, pp. 465-476.
[20]
University of Florida Sparse Matrix Collection. Available from: <http://www.cise.ufl.edu/research/sparse/matrices/>.
[21]
Manssen, M., Weigel, M. and Hartmann, A.K., Random number generators for massively parallel simulations on GPU. Eur. Phys. J. Special Topics. v210. 53-71.
[22]
Hubert Nguyen, GPU gems 3, Addison-Wesley Professional, 2007.
[23]
S. Warren, Finding Strongly Connected Components in Parallel Using O(log2n) Reachability Queries, In SPAA, ACM, 2008, pp. 146-151.
[24]
Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.P., Numerical Recipes: The Art of Scientific Computing. 2007. 3rd edition. Cambridge University Press, Cambridge.
[25]
K.T.T. Cheng, Y.C. Wang, Using mobile gpu for general-purpose computing - a case study of face recognition on smartphones, International Symposium on VLSI Design, Automation and Test (VLSI-DAT), IEEE, 2011, pp. 1-4.
[26]
G. Calandrini, A. Gardel, P. Revenga, J.L. Lázaro, GPU Acceleration on embedded devices a power consumption approach, in: Proceedings of the 14th International Conference on High Performance Computing and Communication & 9th International Conference on Embedded Software and Systems (HPCC-ICESS), IEEE, 2012, pp. 1806-1812.
[27]
D. Merrill, A. Grimshaw, Parallel Scan for Stream Architectures, Technical Report #CS2009-14, Department of Computer Science, University of Virginia, 2009.

Cited By

View all
  • (2023)On Querying Connected Components in Large Temporal GraphsProceedings of the ACM on Management of Data10.1145/35893151:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
  • (2022)Efficient trimming for strongly connected components calculationProceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530247(131-140)Online publication date: 17-May-2022
  • Show More Cited By
  1. Efficient decomposition of strongly connected components on GPUs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Systems Architecture: the EUROMICRO Journal
    Journal of Systems Architecture: the EUROMICRO Journal  Volume 60, Issue 1
    January, 2014
    149 pages

    Publisher

    Elsevier North-Holland, Inc.

    United States

    Publication History

    Published: 01 January 2014

    Author Tags

    1. GPU
    2. Parallel algorithms
    3. Power efficiency
    4. SCC decomposition

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)On Querying Connected Components in Large Temporal GraphsProceedings of the ACM on Management of Data10.1145/35893151:2(1-27)Online publication date: 20-Jun-2023
    • (2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
    • (2022)Efficient trimming for strongly connected components calculationProceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530247(131-140)Online publication date: 17-May-2022
    • (2020)Parallel SCC Detection Based on Reusing Warps and Coloring Partitions on GPUsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60245-1_3(31-46)Online publication date: 2-Oct-2020
    • (2019)Parallel Strongly Connected Components Detection with Multi-partition on GPUsComputational Science – ICCS 201910.1007/978-3-030-22747-0_2(16-30)Online publication date: 12-Jun-2019
    • (2018)Finding strongly connected components of simple digraphs based on generalized rough sets theoryKnowledge-Based Systems10.1016/j.knosys.2018.02.038149:C(88-98)Online publication date: 1-Jun-2018
    • (2018)A survey of graph processing on graphics processing unitsThe Journal of Supercomputing10.1007/s11227-017-2225-174:5(2086-2115)Online publication date: 1-May-2018
    • (2016)GPU centric extensions for parallel strongly connected components computationProceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/2884045.2884048(2-11)Online publication date: 12-Mar-2016
    • (2016)Efficient GPU algorithms for parallel decomposition of graphs into strongly connected and maximal end componentsFormal Methods in System Design10.1007/s10703-016-0246-748:3(274-300)Online publication date: 1-Jun-2016
    • (2014)GPU-Based Graph Decomposition into Strongly Connected and Maximal End ComponentsProceedings of the 16th International Conference on Computer Aided Verification - Volume 855910.1007/978-3-319-08867-9_20(310-326)Online publication date: 18-Jul-2014

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media