research-article

Open access

A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations

Authors:

Yunfei Pang, and

Guangming TanAuthors Info & Claims

ICS '24: Proceedings of the 38th ACM International Conference on Supercomputing

May 2024

Pages 460 - 472

https://doi.org/10.1145/3650200.3661896

Published: 03 June 2024 Publication History

All formats PDF

Abstract

Graph Neural Networks (GNNs) have garnered significant interest across various domains due to their efficacy in learning from graph-structured data. In pursuit of heightened performance, numerous GNN frameworks have emerged recently. However, recent work tends to study performance optimization at the computational graph level and operator level separately, and the existing optimization techniques rely on pattern matching and manual intervention, driven by human expertise. Consequently, their performances remain sub-optimal and sensitive to input graphs and GNN models. In this work, we develop an efficient coordinated strategy named AlphaGNN, which achieves an effective combination of computational graph optimization and operator optimization. To render this coordinated optimization impactful, a rule-based computational graph optimization and a performance-driven operator optimization are proposed. The experimental results confirm that AlphaGNN achieves up to 12.39 × (2.94 × on average) performance improvement over the state-of-the-art methods on diverse GNN models.

References

[1]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, 2018. { TVM} : An automated { End-to-End} optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578–594.

[2]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[3]

Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. Alphasparse: Generating high performance spmv codes directly from sparse matrices. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.

[4]

Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).

[5]

Kaihua Fu, Quan Chen, Yuzhuo Yang, Jiuchen Shi, Chao Li, and Minyi Guo. 2023. BLAD: Adaptive Load Balanced Scheduling and Operator Overlap Pipeline For Accelerating The Dynamic GNN Training. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–13.

Digital Library

[6]

Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse gpu kernels for deep learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–14.

Digital Library

[7]

Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997 (2018).

[8]

Thomas Gaudelet, Ben Day, Arian R Jamasb, Jyothish Soman, Cristian Regep, Gertrude Liu, Jeremy BR Hayter, Richard Vickers, Charles Roberts, Jian Tang, 2021. Utilizing graph machine learning within drug discovery and development. Briefings in bioinformatics 22, 6 (2021), bbab159.

[9]

Yuntao Gui, Yidi Wu, Han Yang, Tatiana Jin, Boyang Li, Qihui Zhou, James Cheng, and Fan Yu. 2022. HGL: accelerating heterogeneous GNN training with holistic representation and optimization. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.

[10]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).

[11]

Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, and Yida Wang. 2020. Featgraph: A flexible and efficient backend for graph neural network systems. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–13.

[12]

Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. 2020. Ge-spmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–12.

[13]

Peng Jiang, Changwan Hong, and Gagan Agrawal. 2020. A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs. In Proceedings of the 25th ACM SIGPLAN symposium on principles and practice of parallel programming. 376–388.

Digital Library

[14]

Mahmut Kandemir, Alok Choudhary, Jagannathan Ramanujam, and Prithviraj Banerjee. 1998. Improving locality using loop and data transformations in an integrated framework. In Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, 285–296.

Digital Library

[15]

Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe. 2017. Taco: A tool to generate tensor algebra kernels. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 943–948.

[16]

Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: Parallel deep neural network computation on large graphs. In Proceedings of the 2019 USENIX Annual Technical Conference, USENIX ATC 2019. 443–457.

[17]

Duane Merrill and Michael Garland. 2016. Merge-based sparse matrix-vector multiplication (spmv) using the csr storage format. Acm Sigplan Notices 51, 8 (2016), 1–2.

Digital Library

[18]

Mathias Parger, Martin Winter, Daniel Mlakar, and Markus Steinberger. 2020. spECK: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 362–375.

Digital Library

[19]

Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4353–4361.

[20]

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15. Springer, 593–607.

[21]

Jie Sun, Li Su, Zuocheng Shi, Wenting Shen, Zeke Wang, Lei Wang, Jie Zhang, Yong Li, Wenyuan Yu, Jingren Zhou, 2023. Legion: Automatically Pushing the Envelope of { Multi-GPU} System for { Billion-Scale}{ GNN} Training. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). 165–179.

[22]

Chao Tian, Lingxiao Ma, Zhi Yang, and Yafei Dai. 2020. Pcgcn: Partition-centric processing for accelerating graph convolutional network. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 936–945.

[23]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).

[24]

Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, 2019. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019).

[25]

Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. { GNNAdvisor} : An adaptive and efficient runtime system for { GNN} acceleration on { GPUs}. In 15th USENIX symposium on operating systems design and implementation (OSDI 21). 515–531.

[26]

Yuke Wang, Boyuan Feng, Zheng Wang, Guyue Huang, and Yufei Ding. 2023. { TC-GNN} : Bridging Sparse { GNN} Computation and Dense Tensor Cores on { GPUs}. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). 149–164.

[27]

Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chenguang Zheng, James Cheng, and Fan Yu. 2021. Seastar: vertex-centric programming for graph neural networks. In Proceedings of the Sixteenth European Conference on Computer Systems. 359–375.

Digital Library

[28]

Zhiqiang Xie, Minjie Wang, Zihao Ye, Zheng Zhang, and Rui Fan. 2022. Graphiler: Optimizing graph neural networks with message passing data flow graph. Proceedings of Machine Learning and Systems 4 (2022), 515–528.

[29]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826 (2018).

[30]

Carl Yang, Aydın Buluç, and John D Owens. 2018. Design principles for sparse matrix multiplication on the gpu. In European Conference on Parallel Processing. Springer, 672–687.

Digital Library

[31]

Shuangyan Yang, Minjia Zhang, Wenqian Dong, and Dong Li. 2023. Betty: Enabling large-scale gnn training with batch-level graph partitioning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 103–117.

Digital Library

[32]

Hengrui Zhang, Zhongming Yu, Guohao Dai, Guyue Huang, Yufei Ding, Yuan Xie, and Yu Wang. 2022. Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective. 4 (2022), 467–484.

[33]

Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. Aligraph: A comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730 (2019).

Recommendations

Almost every graph is divergent under the biclique operator

A biclique of a graph G is a maximal induced complete bipartite subgraph of G . The biclique graph of G denoted by K B ( G ) , is the intersection graph of all the bicliques of G . The biclique graph can be thought as an operator between the class of ...
Read More
Partial characterizations of clique-perfect and coordinated graphs: Superclasses of triangle-free graphs

A graph G is clique-perfect if the cardinality of a maximum clique-independent set of H equals the cardinality of a minimum clique-transversal of H, for every induced subgraph H of G. A graph G is coordinated if the minimum number of colors that can be ...
Read More
Understanding and bridging the gaps in current GNN performance optimizations
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Graph Neural Network (GNN) has recently drawn a rapid increase of interest in many domains for its effectiveness in learning over graphs. Maximizing its performance is essential for many tasks, but remains preliminarily understood. In this work, we ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '24: Proceedings of the 38th ACM International Conference on Supercomputing

May 2024

582 pages

ISBN:9798400706103

DOI:10.1145/3650200

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2024

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Natural Science Foundation of China,
Beijing Science and Technology Program,

Conference

ICS '24

Sponsor:

SIGARCH

ICS '24: 2024 International Conference on Supercomputing

June 4 - 7, 2024

Kyoto, Japan

Acceptance Rates

ICS '24 Paper Acceptance Rate 45 of 125 submissions, 36%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
72
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)72

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents