Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3472456.3472462acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing

Published: 05 October 2021 Publication History

Abstract

The skewed degree distribution of real-world graphs is the main source of poor locality in traversing all edges of the graph, known as Sparse Matrix-Vector (SpMV) Multiplication. Conventional graph traversal methods, such as push and pull, traverse all vertices in the same manner, and we show applying a uniform traversal direction for all edges leads to sub-optimal memory locality, hence poor efficiency. This paper argues that different vertices in power-law graphs have different locality characteristics and the traversal method should be adapted to these characteristics.
To solve this problem, we propose to inspect the number of destination and source vertices in selecting a cache-compatible traversal direction for each type of vertex. We introduce in-Hub Temporal Locality (iHTL), a structure-aware SpMV that combines push and pull in one graph traversal, but for different vertex types. iHTL exploits temporal locality by traversing incoming edges to in-hubs in push direction, while processing other edges in pull direction.
The evaluation shows iHTL is 1.5 × - 2.4 × faster than pull and 4.8 × - 9.5 × faster than push in state-of-the-art graph processing frameworks such as GraphGrind, GraphIt and Galois. More importantly, iHTL is 1.3 × - 1.5 × faster than pull traversal of state-of-the-art locality optimizing reordering algorithms such as SlashBurn, GOrder, and Rabbit-Order.

References

[1]
Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17(1997), 354–364.
[2]
Junya Arai, Hiroaki Shiokawa, Takeshi Yamamuro, Makoto Onizuka, and Sotetsu Iwamura. 2016. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 22–31.
[3]
Scott Beamer, Krste Asanović, and David Patterson. 2012. Direction-optimizing Breadth-first Search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC ’12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 12, 10 pages.
[4]
M. Besta, F. Marending, E. Solomonik, and T. Hoefler. 2017. SlimSell: A Vectorizable Graph Representation for Breadth-First Search. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 32–41.
[5]
Maciej Besta, Michał Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (Washington, DC, USA) (HPDC ’17). Association for Computing Machinery, New York, NY, USA, 93–104.
[6]
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. J. ACM 46, 5 (Sept. 1999), 720–748.
[7]
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. 2004. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software: Practice & Experience 34, 8 (2004), 711–726.
[8]
Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. 2014. BUbiNG: Massive Crawling for the Masses. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 227–228.
[9]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A Multiresolution Coordinate-free Ordering for Compressing Social Networks. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW ’11). ACM, New York, NY, USA, 587–596.
[10]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proceedings of the 13th International Conference on World Wide Web (New York, NY, USA) (WWW ’04). ACM, New York, NY, USA, 595–602.
[11]
Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks 30, 1-7 (April 1998), 107–117.
[12]
Anna D. Broido and Aaron Clauset. 2019. Scale-free networks are rare. Nature Communications 10, 1 (Mar 2019).
[13]
Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM. Washington DC, USA.
[14]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (Bordeaux, France) (EuroSys ’15). ACM, New York, NY, USA, Article 1, 15 pages.
[15]
Charles L Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the trec 2009 web track. Technical Report. DTIC Document.
[16]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2020. Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory. Proc. VLDB Endow. 13, 8 (April 2020), 1304–1318.
[17]
Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making Pull-based Graph Processing Performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP ’18). ACM, New York, NY, USA, 246–260.
[18]
F. Irigoin and R. Triolet. 1988. Supernode Partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, California, USA) (POPL ’88). Association for Computing Machinery, New York, NY, USA, 319–329.
[19]
U. Kang, D. H. Chau, and C. Faloutsos. 2011. Mining large graphs: Algorithms, inference, and discoveries. In 2011 IEEE 27th International Conference on Data Engineering. 243–254.
[20]
Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM 46, 5 (Sept. 1999), 604–632.
[21]
Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2021. How Do Graph Relabeling Algorithms Improve Memory Locality?. In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE Computer Society, 84–86.
[22]
Jérôme Kunegis. 2013. KONECT – The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343–1350.
[23]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 591–600.
[24]
Yongsub Lim, U Kang, and Christos Faloutsos. 2014. SlashBurn: Graph Compression and Mining beyond Caveman Communities. IEEE Transactions on Knowledge and Data Engineering 26, 12 (Dec 2014), 3077–3089.
[25]
Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs(PPoPP ’19). Association for Computing Machinery, New York, NY, USA, 201–213.
[26]
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (San Diego, California, USA) (IMC ’07). Association for Computing Machinery, New York, NY, USA, 29–42.
[27]
Sharan Narang, Gregory F. Diamos, Shubho Sengupta, and Erich Elsen. 2017. Exploring Sparsity in Recurrent Neural Networks. CoRR abs/1704.05119(2017). arxiv:1704.05119
[28]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI.
[29]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP ’13). ACM, New York, NY, USA, 472–488.
[30]
Youcef Saad. 1994. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2.
[31]
Hiroaki Shiokawa, Tomokatsu Takahashi, and Hiroyuki Kitagawa. 2018. ScaleSCAN: Scalable Density-Based Graph Clustering. In Database and Expert Systems Applications, Sven Hartmann, Hui Ma, Günther Pernul, and Roland R. Wagner (Eds.). Springer International Publishing, Cham, 18–34.
[32]
Friendster social network. [n.d.]. Friendster: The online gaming social network. archive.org/details/friendster-dataset-201107.
[33]
Bor-Yiing Su, Tasneem G. Brutch, and Kurt Keutzer. 2010. Parallel BFS graph traversal on images using structured grid. In 2010 IEEE International Conference on Image Processing. 4489–4492.
[34]
Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning. In 2017 46th International Conference on Parallel Processing (ICPP). 181–190.
[35]
Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. GraphGrind: Addressing Load Imbalance of Graph Partitioning. In Proceedings of the International Conference on Supercomputing (Chicago, Illinois) (ICS ’17). ACM, New York, NY, USA, Article 16, 10 pages.
[36]
Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2018. VEBO: A Vertex- and Edge-Balanced Ordering Heuristic to Load Balance Parallel Graph Processing. CoRR abs/1806.06576(2018). arxiv:1806.06576
[37]
Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 157–173.
[38]
Hans Vandierendonck. 2020. Graptor: Efficient Pull and Push Style Vectorized Graph Processing. In Proceedings of the 34th ACM International Conference on Supercomputing (Barcelona, Spain) (ICS ’20). Association for Computing Machinery, New York, NY, USA, Article 13, 13 pages.
[39]
Hao Wang, Liang Geng, Rubao Lee, Kaixi Hou, Yanfeng Zhang, and Xiaodong Zhang. 2019. SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP ’19). Association for Computing Machinery, New York, NY, USA, 38–52.
[40]
Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J. Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. CoRR abs/1909.01315(2019). arxiv:1909.01315
[41]
Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup Graph Processing by Graph Ordering. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD ’16). ACM, NewYork, NY, USA, 1813–1828.
[42]
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger. 2007. SCAN: A Structural Clustering Algorithm for Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Jose, California, USA) (KDD ’07). ACM, New York, NY, USA, 824–833.
[43]
Serif Yesil, Azin Heidarshenas, Adam Morrison, and Josep Torrellas. 2020. Speeding up SpMV for Power-Law Graph Analytics by Enhancing Locality and Vectorization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC ’20). IEEE Press, Article 86, 15 pages.
[44]
Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-aware Graph-structured Analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Francisco, CA, USA) (PPoPP 2015). ACM, New York, NY, USA, 183–193.
[45]
Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Matei Zaharia, and Saman P. Amarasinghe. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302.
[46]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-Performance Graph DSL. Proc. ACM Program. Lang. 2, OOPSLA, Article 121 (Oct. 2018), 30 pages.
[47]
Xiaojin Zhu and Zoubin Ghahramani. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report.

Cited By

View all
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
  • (2023)Efficient Algorithm Design of Optimizing SpMV on GPUProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593002(115-128)Online publication date: 7-Aug-2023
  • (2023)On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386309(215-220)Online publication date: 15-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph algorithms
  2. Graph traversal
  3. High performance computing
  4. Memory locality
  5. Sparse Matrix-Vector Multiplication

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
  • (2023)Efficient Algorithm Design of Optimizing SpMV on GPUProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593002(115-128)Online publication date: 7-Aug-2023
  • (2023)On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386309(215-220)Online publication date: 15-Dec-2023
  • (2023)Performance Modelling-Driven Optimization of RISC-V Hardware for Efficient SpMVHigh Performance Computing10.1007/978-3-031-40843-4_36(486-499)Online publication date: 21-May-2023
  • (2022)MASTIFFProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532365(1-13)Online publication date: 28-Jun-2022
  • (2022)GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319832041:11(3579-3590)Online publication date: 1-Nov-2022
  • (2022)SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS55109.2022.00015(138-140)Online publication date: May-2022
  • (2022)A Dynamic Variational Framework for Open-World Node Classification in Structured Sequences2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00081(703-712)Online publication date: Nov-2022
  • (2021)Locality Analysis of Graph Reordering Algorithms2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00020(101-112)Online publication date: Nov-2021
  • (2021)Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00042(226-237)Online publication date: Sep-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media