Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

AsynGraph: Maximizing Data Parallelism for Efficient Iterative Graph Processing on GPUs

Published: 30 September 2020 Publication History

Abstract

Recently, iterative graph algorithms are proposed to be handled by GPU-accelerated systems. However, in iterative graph processing, the parallelism of GPU is still underutilized by existing GPU-based solutions. In fact, because of the power-law property of the natural graphs, the paths between a small set of important vertices (e.g., high-degree vertices) play a more important role in iterative graph processing’s convergence speed. Based on this fact, for faster iterative graph processing on GPUs, this article develops a novel system, called AsynGraph, to maximize its data parallelism. It first proposes an efficient structure-aware asynchronous processing way. It enables the state propagations of most vertices to be effectively conducted on the GPUs in a concurrent way to get a higher GPU utilization ratio through efficiently handling the paths between the important vertices. Specifically, a graph sketch (consisting of the paths between the important vertices) is extracted from the original graph to serve as a fast bridge for most state propagations. Through efficiently processing this sketch more times within each round of graph processing, higher parallelism of GPU can be utilized to accelerate most state propagations. In addition, a forward-backward intra-path processing way is also adopted to asynchronously handle the vertices on each path, aiming to further boost propagations along paths and also ensure smaller data access cost. In comparison with existing GPU-based systems, i.e., Gunrock, Groute, Tigr, and DiGraph, AsynGraph can speed up iterative graph processing by 3.06–11.52, 2.47–5.40, 2.23–9.65, and 1.41–4.05 times, respectively.

References

[1]
2019. Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/index.html.
[2]
Zhiyuan Ai, Mingxing Zhang, Yongwei Wu, Xuehai Qian, Kang Chen, and Weimin Zheng. 2017. Squeezing out all the value of loaded data: An out-of-core graph processing system with reduced disk I/O. In Proceedings of the 2017 USENIX Annual Technical Conference. 125--137.
[3]
Timo Aila and Samuli Laine. 2009. Understanding the efficiency of ray traversal on GPUs. In Proceedings of the 2009 Conference on High Performance Graphics. 145--149.
[4]
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An asynchronous multi-GPU programming model for irregular computations. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 235--248.
[5]
Elisabetta Bergamini, Michele Borassi, Pierluigi Crescenzi, Andrea Marino, and Henning Meyerhenke. 2019. Computing top-k closeness centrality faster in unweighted graphs. ACM Transactions on Knowledge Discovery from Data 13, 5 (2019), 1--40.
[6]
Hanhua Chen, Hai Jin, and Xiaolong Cui. 2017. Hybrid followee recommendation in microblogging systems. Science China Information Sciences 60, 012102 (2017), 1--14.
[7]
Abdullah Gharaibeh, Lauro Beltro Costa, Elizeu Santos-Neto, and Matei Ripeanu. 2012. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 345--354.
[8]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17--30.
[9]
Wei Han, Daniel Mawhirter, Bo Wu, and Matthew Buland. 2017. Graphie: Large-scale asynchronous graph traversals on just a GPU. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 233--245.
[10]
Loc Hoang, Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, and Vijaya Ramachandran. 2019. A round-efficient distributed betweenness centrality algorithm. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 272--286.
[11]
Changwan Hong, Aravind Sukumaranrajam, Jinsung Kim, and P. Sadayappan. 2017. MultiGraph: Efficient graph processing on GPUs. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27--40.
[12]
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming irregular graphs for GPU-friendly graph processing. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 622--636.
[13]
Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. 2017. A distributed multi-GPU system for fast graph processing. Proceedings of the VLDB Endowment 11, 3 (2017), 297--310.
[14]
Ruoming Jin, Ning Ruan, Bo You, and Haixun Wang. 2013. Hub-accelerator: Fast and exact shortest path computation in large social networks. In arXiv. 1--12.
[15]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239--252.
[16]
Min Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. 2016. GTS: A fast and scalable graph processing method based on streaming topology to GPUs. In Proceedings of the 2016 International Conference on Management of Data. 447--461.
[17]
Amlan Kusum, Keval Vora, Rajiv Gupta, and Iulian Neamtiu. 2016. Efficient processing of large graphs via input reduction. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. 245--257.
[18]
Xue Li, Mingxing Zhang, Kang Chen, and Yongwei Wu. 2018. ReGraph: A graph processing framework that alternately shrinks and repartitions the graph. In Proceedings of the 2018 International Conference on Supercomputing. 172--183.
[19]
Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis. 68:1–68:12.
[20]
Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and processing of graph algorithms on GPUs. In Proceedings of the 2019 USENIX Annual Technical Conference. 411--428.
[21]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716--727.
[22]
Xinqiao Lv, Wei Xiao, Yu Zhang, Xiaofei Liao, Hai Jin, and Qiangsheng Hua. 2019. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science 13, 3 (2019), 539--551.
[23]
Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the 2017 USENIX Annual Technical Conference. 195--207.
[24]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 135--146.
[25]
Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. 1--14.
[26]
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU graph analytics. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479--490.
[27]
Shuang Song, Xu Liu, Qinzhe Wu, Andreas Gerstlauer, Tao Li, and Lizy K. John. 2018. Start late or finish early: A distributed graph processing system with redundancy reduction. Proceedings of the VLDB Endowment 12, 2 (2018), 154--168.
[28]
Keval Vora, Chen Tian, Rajiv Gupta, and Ziang Hu. 2017. CoRAL: Confined recovery in distributed asynchronous graph processing. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 223--236.
[29]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 11:1–11:12.
[30]
Yangzihao Wang, Yuechao Pan, Andrew Davidson, Yuduo Wu, and John D. Owens. 2017. Gunrock: GPU graph analytics. ACM Transactions on Parallel Computing 4, 2 (2017), 39:1–39:50.
[31]
Mingxing Zhang, Yongwei Wu, Youwei Zhuo, Xuehai Qian, Chengying Huan, and Kang Chen. 2018. Wonderland: A novel abstraction-based out-of-core graph processing system. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 608--621.
[32]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2014. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems 25, 8 (2014), 2091--2100.
[33]
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Guang Tan, and Bing Bing Zhou. 2017. HotGraph: Efficient asynchronous processing for real-world graphs. IEEE Trans. Comput. 66, 5 (2017), 799--809.
[34]
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, and Bing Bing Zhou. 2018. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2018), 895--907.
[35]
Yu Zhang, Xiaofei Liao, Hai Jin, Bingsheng He, Haikun Liu, and Lin Gu. 2019. DiGraph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proceedings of the 2019 Architectural Support for Programming Languages and Operating Systems. 601--614.
[36]
Yu Zhang, Xiaofei Liao, Hai Jin, Li Lin, and Feng Lu. 2014. An adaptive switching scheme for iterative computing in the cloud. Frontiers of Computer Science 8, 6 (2014), 872--884.
[37]
Long Zheng, Xiaofei Liao, and Hai Jin. 2018. Efficient and scalable graph parallel processing with symbolic execution. ACM Transactions on Architecture and Code Optimization 15, 1 (2018), 3:1–3:25.
[38]
Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1543--1552.

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2023)Edge Intelligence with Distributed Processing of DNNs: A SurveyComputer Modeling in Engineering & Sciences10.32604/cmes.2023.023684136:1(5-42)Online publication date: 2023
  • (2023)Design of an RFID-Based Self-Jamming Identification and Sensing PlatformIEEE Transactions on Mobile Computing10.1109/TMC.2023.328094223:5(3802-3816)Online publication date: 29-May-2023
  • Show More Cited By

Index Terms

  1. AsynGraph: Maximizing Data Parallelism for Efficient Iterative Graph Processing on GPUs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 17, Issue 4
      December 2020
      430 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3427420
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 September 2020
      Accepted: 01 August 2020
      Revised: 01 July 2020
      Received: 01 April 2020
      Published in TACO Volume 17, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPU
      2. convergence speed
      3. data parallelism
      4. graph processing

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • National Natural Science Foundation of China
      • National Key Research and Development Program of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)169
      • Downloads (Last 6 weeks)19
      Reflects downloads up to 18 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
      • (2023)Edge Intelligence with Distributed Processing of DNNs: A SurveyComputer Modeling in Engineering & Sciences10.32604/cmes.2023.023684136:1(5-42)Online publication date: 2023
      • (2023)Design of an RFID-Based Self-Jamming Identification and Sensing PlatformIEEE Transactions on Mobile Computing10.1109/TMC.2023.328094223:5(3802-3816)Online publication date: 29-May-2023
      • (2022)Software Systems Implementation and Domain-Specific Architectures towards Graph AnalyticsIntelligent Computing10.34133/2022/98067582022Online publication date: 29-Oct-2022
      • (2022)Arbitrarily Parallelizable Code: A Model of Computation Evaluated on a Message-Passing Many-Core SystemComputers10.3390/computers1111016411:11(164)Online publication date: 18-Nov-2022
      • (2022)Edge-Path Bundling: A Less Ambiguous Edge Bundling ApproachIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311479528:1(313-323)Online publication date: 1-Jan-2022
      • (2022)A Structure-Aware Storage Optimization for Out-of-Core Concurrent Graph ProcessingIEEE Transactions on Computers10.1109/TC.2021.309897671:7(1612-1625)Online publication date: 1-Jul-2022
      • (2022)GGraph: An Efficient Structure-Aware Approach for Iterative Graph ProcessingIEEE Transactions on Big Data10.1109/TBDATA.2020.30196418:5(1182-1194)Online publication date: 1-Oct-2022
      • (2021)LargeGraphACM Transactions on Architecture and Code Optimization10.1145/347760318:4(1-24)Online publication date: 29-Sep-2021
      • (2021)GrusACM Transactions on Architecture and Code Optimization10.1145/344484418:2(1-25)Online publication date: 9-Feb-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media