research-article

Public Access

Gunrock: a high-performance graph processing library on the GPU

Authors:

Yangzihao Wang,

Andrew Davidson,

John D. OwensAuthors Info & Claims

ACM SIGPLAN Notices, Volume 51, Issue 8

Article No.: 11, Pages 1 - 12

https://doi.org/10.1145/3016078.2851145

Published: 27 February 2016 Publication History

Abstract

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock instead implements a novel data-centric abstraction centered on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five graph primitives (BFS, BC, SSSP, CC, and PageRank) and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

Supplementary Material

Supplemental material. (a11-wang.zip)

Download
107.61 KB

References

[1]

S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 12:1--12:10, Nov. 2012.

Digital Library

[2]

U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2):163--177, 2001.

[3]

M. Burtscher, R. Nasre, and K. Pingali. A quantitative study of irregular programs on GPUs. In IEEE International Symposium on Workload Characterization, IISWC-2012, pages 141--151, Nov. 2012.

Digital Library

[4]

D. Cederman and P. Tsigas. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008, pages 57--64, June 2008.

Digital Library

[5]

A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, pages 349--359, May 2014.

Digital Library

[6]

D. Delling, A. V. Goldberg, A. Nowatzyk, and R. F. Werneck. PHAST: Hardware-accelerated shortest path trees. Journal of Parallel and Distributed Computing, 73:940--952, Sept. 2010.

[7]

E. Elsen and V. Vaidyanathan. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the gather-apply-scatter abstraction, 2013. http://www.github.com/RoyalCaliber/vertexAPI2.

[8]

Z. Fu, M. Personick, and B. Thompson. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 2:1--2:6, June 2014.

Digital Library

[9]

A. Geil, Y. Wang, and J. D. Owens. WTF, GPU! Computing Twitter's who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks, COSN '14, Oct. 2014.

Digital Library

[10]

R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments, ALENEX08, pages 90--100, Jan. 2008.

Digital Library

[11]

A. Goel. The "who-to-follow" system at Twitter: Algorithms, impact, and further research. WWW 2014 industry track, 2014.

[12]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI '12, pages 17--30. USENIX Association, Oct. 2012.

Digital Library

[13]

D. Gregor and A. Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In Parallel Object-Oriented Scientific Computing (POOSC), July 2005.

[14]

J. Greiner. A comparison of parallel algorithms for connected components. In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '94, pages 16--25, June 1994.

Digital Library

[15]

S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, Mar. 2012.

Digital Library

[16]

Y. Jia, V. Lu, J. Hoberock, M. Garland, and J. C. Hart. Edge v. node parallelism for graph centrality metrics. In W. W. Hwu, editor, GPU Computing Gems Jade Edition, chapter 2, pages 15--28. Morgan Kaufmann, Oct. 2011.

[17]

S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. IEEE Micro, 31(5):7--17, Sept. 2011.

Digital Library

[18]

F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 239--252, June 2014.

Digital Library

[19]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI-10, pages 340--349, July 2010.

[20]

G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, June 2010.

Digital Library

[21]

R. C. McColl, D. Ediger, J. Poovey, D. Campbell, and D. A. Bader. A performance evaluation of open source graph databases. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, pages 11--18, Feb. 2014.

Digital Library

[22]

A. McLaughlin and D. A. Bader. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, pages 572--583, Nov. 2014.

Digital Library

[23]

A. McLaughlin, J. Riedy, and D. A. Bader. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In 2015 IEEE High Performance Extreme Computing Conference, HPEC '15, Sept. 2015.

[24]

D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, Feb. 2012.

Digital Library

[25]

U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, Oct. 2003. 1998 European Symposium on Algorithms.

Digital Library

[26]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471, Nov. 2013.

Digital Library

[27]

P. R. Pande and D. A. Bader. Computing betweenness centrality for small world networks on a GPU. In HPEC, 2011.

[28]

K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 12--25, June 2011.

Digital Library

[29]

S. Salihoglu and J. Widom. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 3:1--3:6, June 2014.

Digital Library

[30]

S. Sallinen, A. Gharaibeh, and M. Ripeanu. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR, abs/1503.04359(1503.04359v1), Mar. 2015.

[31]

A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, Mar. 2013.

Digital Library

[32]

J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 135--146, Feb. 2013.

Digital Library

[33]

J. G. Siek, L.-Q. Lee, and A. Lumsdaine. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley, Dec. 2001.

[34]

J. Soman, K. Kishore, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, IPDPSW 2010, pages 1--8, Apr. 2010.

[35]

S. Tzeng, B. Lloyd, and J. D. Owens. A GPU task-parallel model with dependency resolution. IEEE Computer, 45(8):34--41, Aug. 2012.

Digital Library

[36]

Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens. Performance characterization for high-level programming models for GPU graph analytics. In IEEE International Symposium on Workload Characterization, IISWC-2015, pages 66--75, Oct. 2015.

Digital Library

[37]

J. Zhong and B. He. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, 25(6):1543--1552, June 2014.

Digital Library

Cited By

ÖZ I(2024)Quantitative Performance Analysis of BLAS Libraries on GPU ArchitecturesBLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans AnaliziDeu Muhendislik Fakultesi Fen ve Muhendislik10.21205/deufmd.202426760626:76(40-48)Online publication date: 23-Jan-2024
https://doi.org/10.21205/deufmd.2024267606
Zhao XChen SKang Y(2024)Load Balanced PIM-Based Graph ProcessingACM Transactions on Design Automation of Electronic Systems10.1145/365995129:4(1-22)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3659951
Neff RZarch MMinutoli MHalappanavar MTumeo AKalyanaraman ABecchi M(2024)FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale SystemsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656621(38-49)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656621
Show More Cited By

Recommendations

Gunrock: a high-performance graph processing library on the GPU
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level ...
Gunrock: GPU Graph Analytics
Special Issue: Invited papers from PPoPP 2016, Part 1

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our ...
Gunrock: a high-performance graph processing library on the GPU
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 51, Issue 8

PPoPP '16

August 2016

405 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3016078

Editor:
Matthew Fluet

Issue’s Table of Contents

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2016
420 pages
ISBN:9781450340922
DOI:10.1145/2851141
General Chair:
Rafael Asenjo
University of Málaga, Spain
,
Program Chair:
Tim Harris
Oracle Labs, Cambridge, UK

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Published in SIGPLAN Volume 51, Issue 8

Check for updates

Badges

Distinguished Paper

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency
U.S. Army
UC Lab Fees Research Program Award
National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

330
Total Citations
View Citations
5,468
Total Downloads

Downloads (Last 12 months)656
Downloads (Last 6 weeks)64

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

ÖZ I(2024)Quantitative Performance Analysis of BLAS Libraries on GPU ArchitecturesBLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans AnaliziDeu Muhendislik Fakultesi Fen ve Muhendislik10.21205/deufmd.202426760626:76(40-48)Online publication date: 23-Jan-2024
https://doi.org/10.21205/deufmd.2024267606
Zhao XChen SKang Y(2024)Load Balanced PIM-Based Graph ProcessingACM Transactions on Design Automation of Electronic Systems10.1145/365995129:4(1-22)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3659951
Neff RZarch MMinutoli MHalappanavar MTumeo AKalyanaraman ABecchi M(2024)FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale SystemsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656621(38-49)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656621
Turimbetov ISasongko MUnat D(2024)GPU-Initiated Resource Allocation for Irregular WorkloadsProceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions10.1145/3642961.3643799(1-8)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3642961.3643799
Ye CLi YHe BLi ZSun J(2024)Large-Scale Graph Label Propagation on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333632936:10(5234-5248)Online publication date: Oct-2024
https://doi.org/10.1109/TKDE.2023.3336329
Sattar NLu HWang FHalappanavar M(2024)Distributed Multi-GPU Community Detection on Exascale Computing Platforms2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00147(815-824)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00147
Vap JKogge P(2024)RIMR: Reverse Influence Maximization Rank2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00146(805-814)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00146
Yuan LAhmad AYan DHan JAdhikari SYu XZhou Y(2024) G 2 -AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00245(3164-3177)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00245
Wang ZLai LLiu YShui BTian CZhong S(2024)Parallelization of butterfly counting on hierarchical memoryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00856-x33:5(1453-1484)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00856-x
Wang ZLai LLiu YShui BTian CZhong S(2023)I/O-Efficient Butterfly Counting at ScaleProceedings of the ACM on Management of Data10.1145/35887141:1(1-27)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588714
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents