research-article

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Authors:

Aishwarya Reganti,

Edward W. Huang,

Charles Dickens,

Karthik Subbian,

Danai KoutraAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 19, Issue 1

Article No.: 23, Pages 1 - 26

https://doi.org/10.1145/3701563

Published: 28 December 2024 Publication History

Abstract

Distributed graph neural network (GNN) training facilitates learning on massive graphs that surpass the storage and computational capabilities of a single machine. Traditional distributed frameworks strive for performance parity with centralized training by maximally recovering cross-instance node dependencies, relying either on inter-instance communication or periodic fallback to centralized training. However, these processes create overhead and constrain the scalability of the framework. In this work, we propose a streamlined framework for distributed GNN training that eliminates these costly operations, yielding improved scalability, convergence speed, and performance over state-of-the-art approaches. Our framework (1) comprises independent trainers that asynchronously learn local models from locally available parts of the training graph and (2) synchronizes these local models only through periodic (time-based) model aggregation. Contrary to prevailing belief, our theoretical analysis shows that it is not essential to maximize the recovery of cross-instance node dependencies to achieve performance parity with centralized training. Instead, our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph in order to enhance data uniformity and minimize discrepancies in gradient and loss function across instances. Experiments on social and e-commerce networks with up to 1.3 billion edges show that our proposed framework achieves state-of-the-art performance and 2.31\(\times\) speedup compared to the fastest baseline despite using less training data.

References

[1]

Alexandra Angerd, Keshav Balasubramanian, and Murali Annavaram. 2020. Distributed training of graph convolutional networks using subgraph approximation. arXiv:2012.04930. Retrieved from https://arxiv.org/abs/2012.04930

[2]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https://arxiv.org/abs/1607.06450

[3]

Adam Breuer, Roee Eilat, and Udi Weinsberg. 2020. Friend or faux: Graph-based early detection of fake accounts on social networks. In Proceedings of the World Wide Web Conference, 1287–1297.

Digital Library

[4]

Qi Cao, Huawei Shen, Jinhua Gao, Bingzheng Wei, and Xueqi Cheng. 2020. Popularity prediction on social platforms with coupled graph neural networks. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM ’20).

Digital Library

[5]

Jianfei Chen, Jun Zhu, and Le Song. 2018. Stochastic training of graph convolutional networks with variance reduction. In Proceedings of the International Conference on Machine Learning. PMLR, 942–950.

[6]

Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 257–266.

Digital Library

[7]

Charles Dickens, Edward W. Huang, Aishwarya Reganti, Jiong Zhu, Karthik Subbian, and Danai Koutra. 2024. Graph coarsening via convolution matching for scalable graph neural network training. In Companion Proceedings of the ACM on Web Conference (WWW ’24). ACM, New York, NY, 1502–1510. DOI:

Digital Library

[8]

Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S. Yu. 2020. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM’ 20). 10 pages.

Digital Library

[9]

Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social recommendation. In Proceedings of the World Wide Web Conference, 417–426.

Digital Library

[10]

Matthias Fey, Jan E. Lenssen, Frank Weichert, and Jure Leskovec. 2021. Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In Proceedings of theInternational Conference on Machine Learning. PMLR, 3294–3304.

[11]

Fabrizio Frasca, Emanuele Rossi, Davide Eynard, Ben Chamberlain, Michael Bronstein, and Federico Monti. 2020. Sign: Scalable inception graph neural networks. In Proceedings of the Workshop of Graph Representation Learning and Beyond (GRL+).

[12]

Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In Proceedings of the 15th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) ’21), 551–568.

[13]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677. Retrieved from https://arxiv.org/abs/1706.02677

[14]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS ’17).

[15]

Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, and Jure Leskovec. 2021. OGB-LSC: A large-scale challenge for machine learning on graphs. arXiv:2103.09430. Retrieved from https://arxiv.org/abs/2103.09430

[16]

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. arXiv:2005.00687. Retrieved from https://arxiv.org/abs/2005.00687

[17]

Ajay Kumar Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, and Zhangyang Wang. 2023. Graph ladling: Shockingly simple parallel GNN training without intermediate communication. In Proceedings of the International Conference on Machine Learning. PMLR, 14679–14690.

[18]

Peng Jiang and Masuma Akter Rumi. 2021. Communication-efficient sampling for distributed training of graph convolutional networks. arXiv:2101.07706. Retrieved from https://arxiv.org/abs/2101.07706

[19]

Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. 2022. Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 720–730.

Digital Library

[20]

Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah. 2022. Graph condensation for graph neural networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=WLEx3Jo4QaB

[21]

George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1 (1998), 359–392.

Digital Library

[22]

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR).

[23]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. Imagenet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.

Digital Library

[24]

Juanhui Li, Harry Shomer, Jiayuan Ding, Yiqi Wang, Yao Ma, Neil Shah, Jiliang Tang, and Dawei Yin. 2022. Are graph neural networks really helpful for knowledge graph completion? arXiv:2205.10652. Retrieved from https://arxiv.org/abs/2205.10652

[25]

Mu Li, David G. Andersen, Alexander J. Smola, and Kai Yu. 2014. Communication efficient distributed machine learning with the parameter server. In Proceedings of the 27th International Conference on Neural Information Processing Systems.

[26]

Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph summarization methods and applications: A survey. ACM Computing Surveys 51, 3 (2018), 62:1–62:34.

Digital Library

[27]

Michael S. Matena and Colin A. Raffel. 2022. Merging models with fisher-weighted averaging. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 17703–17716.

Digital Library

[28]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 1273–1282.

[29]

Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed, and Sasikanth Avancha. 2021. Distgnn: Scalable distributed training for large-scale graph neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1–14.

Digital Library

[30]

Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, 1–15.

Digital Library

[31]

S Deepak Narayanan, Aditya Sinha, Prateek Jain, Purushottam Kar, and Sundararajan Sellamanickam. 2021. IGLU: Efficient GCN training via lazy updates. In Proceedings of the International Conference on Learning Representations.

[32]

Boris T. Polyak and Anatoli B. Juditsky. 1992. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization 30, 4 (1992), 838–855.

Digital Library

[33]

Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. DeepInf: Social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on (KDD ’18). 2110–2119.

Digital Library

[34]

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Mahmut Kandemir, and Anand Sivasubramaniam. 2021. Learn locally, correct globally: A distributed algorithm for training graph neural networks. In Proceedings of the International Conference on Learning Representations.

[35]

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In Proceedings of the 15th International Conference on the Semantic Web (ESWC ’18). Springer, 593–607.

Digital Library

[36]

Sebastian U Stich. 2018. Local SGD converges fast and communicates little. arXiv:1805.09767. Retrieved from https://arxiv.org/abs/1805.09767

[37]

Alok Tripathy, Katherine Yelick, and Aydi̇n Buluç. 2020. Reducing communication in graph neural network training. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–14.

Digital Library

[38]

Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv:1706.02263. Retrieved from https://arxiv.org/abs/1706.02263

[39]

Jianyu Wang, Rui Wen, Chunming Wu, Yu Huang, and Jian Xiong. 2019. FdGars: Fraudster detection via graph convolutional networks in online app review system. In Proceedings of the World Wide Web Conference, 310–316.

Digital Library

[40]

Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 1, 1 (2020), 396–413.

[41]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2018. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19). 165–174.

[42]

Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, et al. 2022. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In Proceedings of the International Conference on Machine Learning. PMLR, 23965–23998.

[43]

Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning (ICML).

[44]

Bishan Yang, Scott Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR).

[45]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eskombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18). 974–983.

Digital Library

[46]

Jiaxuan You, Zhitao Ying, and Jure Leskovec. 2020. Design space for graph neural networks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 17009–17021.

Digital Library

[47]

Hao Yu, Sen Yang, and Shenghuo Zhu. 2019. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 5693–5700.

Digital Library

[48]

Lingfan Yu, Jiajun Shen, Jinyang Li, and Adam Lerer. 2020. Scalable graph neural networks for heterogeneous graphs. arXiv:2011.09679. Retrieved from https://arxiv.org/abs/2011.09679

[49]

Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, and Ren Chen. 2021. Decoupling the depth and scope of graph neural networks. In Proceedings of the 35th International Conference on Neural Information Processing Systems, 19665–19679.

Digital Library

[50]

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019. GraphSAINT: Graph sampling based inductive learning method. In Proceedings of the International Conference on Learning Representations.

[51]

Zhanqiu Zhang, Jie Wang, Jieping Ye, and Feng Wu. 2022. Rethinking graph convolutional networks in knowledge graph completion. In Proceedings of the ACM Web Conference, 798–807.

Digital Library

[52]

Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. Distdgl: distributed graph neural network training for billion-scale graphs. In Proceedings of the IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 36–44.

[53]

Da Zheng, Xiang Song, Chengru Yang, Dominique LaSalle, and George Karypis. 2021. Distributed hybrid CPU and GPU training for graph neural networks on billion-scale graphs. arXiv:2112.15345. Retrieved from https://arxiv.org/abs/2112.15345

[54]

Houquan Zhou, Shenghua Liu, Danai Koutra, Huawei Shen, and Xueqi Cheng. 2023. A provable framework of learning graph embeddings via summarization. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI ’23). AAAI Press, 4946–4953. DOI:

Digital Library

[55]

Jiong Zhu, Gaotang Li, Yao-An Yang, Jing Zhu, Xuehao Cui, and Danai Koutra. 2024. On the impact of feature heterophily on link prediction with graph neural networks. arXiv:2409.17475. Retrieved from https://arxiv.org/abs/2409.17475

[56]

Jing Zhu, Xiang Song, Vassilis N. Ioannidis, Danai Koutra, and Christos Faloutsos. 2024. TouchUp-G: Improving feature representation through graph-centric finetuning. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). ACM, New York, NY, 2662–2666. DOI:

Digital Library

[57]

Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond homophily in graph neural networks: Current limitations and effective designs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS ’20).

[58]

Jing Zhu, Yuhang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai, Xiang Song, and Danai Koutra. 2024c. Pitfalls in link prediction with graph neural networks: Understanding the impact of target-link inclusion & better practices. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM). ACM, New York, NY, 994–1002. DOI:

Digital Library

[59]

Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. Aligraph: A comprehensive graph neural network platform. arXiv:1902.08730. Retrieved from https://arxiv.org/abs/1902.08730

Index Terms

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Parallel computing methodologies
    1. Parallel algorithms
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Social networks

Recommendations

Distribution Consistency based Self-Training for Graph Neural Networks with Sparse Labels
WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-training has emerged as a widely popular framework to ...
Scaling Graph Neural Networks with Approximate PageRank
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge -- many recently proposed scalable GNN approaches rely on an expensive message-passing ...
Scalable Graph Neural Networks with Deep Graph Library
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Learning from graph and relational data plays a major role in many applications including social network analysis, marketing, e-commerce, information retrieval, knowledge modeling, medical and biological sciences, engineering, and others. Recently, Graph ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 19, Issue 1

January 2025

603 pages

EISSN:1556-472X

DOI:10.1145/3703003

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Online AM: 08 November 2024

Accepted: 18 September 2024

Revised: 15 July 2024

Received: 20 October 2023

Published in TKDD Volume 19, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
110
Total Downloads

Downloads (Last 12 months)110
Downloads (Last 6 weeks)52

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents