Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Published: 28 December 2024 Publication History

Abstract

Distributed graph neural network (GNN) training facilitates learning on massive graphs that surpass the storage and computational capabilities of a single machine. Traditional distributed frameworks strive for performance parity with centralized training by maximally recovering cross-instance node dependencies, relying either on inter-instance communication or periodic fallback to centralized training. However, these processes create overhead and constrain the scalability of the framework. In this work, we propose a streamlined framework for distributed GNN training that eliminates these costly operations, yielding improved scalability, convergence speed, and performance over state-of-the-art approaches. Our framework (1) comprises independent trainers that asynchronously learn local models from locally available parts of the training graph and (2) synchronizes these local models only through periodic (time-based) model aggregation. Contrary to prevailing belief, our theoretical analysis shows that it is not essential to maximize the recovery of cross-instance node dependencies to achieve performance parity with centralized training. Instead, our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph in order to enhance data uniformity and minimize discrepancies in gradient and loss function across instances. Experiments on social and e-commerce networks with up to 1.3 billion edges show that our proposed framework achieves state-of-the-art performance and 2.31\(\times\) speedup compared to the fastest baseline despite using less training data.

References

[1]
Alexandra Angerd, Keshav Balasubramanian, and Murali Annavaram. 2020. Distributed training of graph convolutional networks using subgraph approximation. arXiv:2012.04930. Retrieved from https://arxiv.org/abs/2012.04930
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https://arxiv.org/abs/1607.06450
[3]
Adam Breuer, Roee Eilat, and Udi Weinsberg. 2020. Friend or faux: Graph-based early detection of fake accounts on social networks. In Proceedings of the World Wide Web Conference, 1287–1297.
[4]
Qi Cao, Huawei Shen, Jinhua Gao, Bingzheng Wei, and Xueqi Cheng. 2020. Popularity prediction on social platforms with coupled graph neural networks. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM ’20).
[5]
Jianfei Chen, Jun Zhu, and Le Song. 2018. Stochastic training of graph convolutional networks with variance reduction. In Proceedings of the International Conference on Machine Learning. PMLR, 942–950.
[6]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 257–266.
[7]
Charles Dickens, Edward W. Huang, Aishwarya Reganti, Jiong Zhu, Karthik Subbian, and Danai Koutra. 2024. Graph coarsening via convolution matching for scalable graph neural network training. In Companion Proceedings of the ACM on Web Conference (WWW ’24). ACM, New York, NY, 1502–1510. DOI:
[8]
Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S. Yu. 2020. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM’ 20). 10 pages.
[9]
Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social recommendation. In Proceedings of the World Wide Web Conference, 417–426.
[10]
Matthias Fey, Jan E. Lenssen, Frank Weichert, and Jure Leskovec. 2021. Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In Proceedings of theInternational Conference on Machine Learning. PMLR, 3294–3304.
[11]
Fabrizio Frasca, Emanuele Rossi, Davide Eynard, Ben Chamberlain, Michael Bronstein, and Federico Monti. 2020. Sign: Scalable inception graph neural networks. In Proceedings of the Workshop of Graph Representation Learning and Beyond (GRL+).
[12]
Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In Proceedings of the 15th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) ’21), 551–568.
[13]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677. Retrieved from https://arxiv.org/abs/1706.02677
[14]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS ’17).
[15]
Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, and Jure Leskovec. 2021. OGB-LSC: A large-scale challenge for machine learning on graphs. arXiv:2103.09430. Retrieved from https://arxiv.org/abs/2103.09430
[16]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. arXiv:2005.00687. Retrieved from https://arxiv.org/abs/2005.00687
[17]
Ajay Kumar Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, and Zhangyang Wang. 2023. Graph ladling: Shockingly simple parallel GNN training without intermediate communication. In Proceedings of the International Conference on Machine Learning. PMLR, 14679–14690.
[18]
Peng Jiang and Masuma Akter Rumi. 2021. Communication-efficient sampling for distributed training of graph convolutional networks. arXiv:2101.07706. Retrieved from https://arxiv.org/abs/2101.07706
[19]
Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. 2022. Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 720–730.
[20]
Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah. 2022. Graph condensation for graph neural networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=WLEx3Jo4QaB
[21]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1 (1998), 359–392.
[22]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR).
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. Imagenet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.
[24]
Juanhui Li, Harry Shomer, Jiayuan Ding, Yiqi Wang, Yao Ma, Neil Shah, Jiliang Tang, and Dawei Yin. 2022. Are graph neural networks really helpful for knowledge graph completion? arXiv:2205.10652. Retrieved from https://arxiv.org/abs/2205.10652
[25]
Mu Li, David G. Andersen, Alexander J. Smola, and Kai Yu. 2014. Communication efficient distributed machine learning with the parameter server. In Proceedings of the 27th International Conference on Neural Information Processing Systems.
[26]
Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph summarization methods and applications: A survey. ACM Computing Surveys 51, 3 (2018), 62:1–62:34.
[27]
Michael S. Matena and Colin A. Raffel. 2022. Merging models with fisher-weighted averaging. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 17703–17716.
[28]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 1273–1282.
[29]
Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed, and Sasikanth Avancha. 2021. Distgnn: Scalable distributed training for large-scale graph neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1–14.
[30]
Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, 1–15.
[31]
S Deepak Narayanan, Aditya Sinha, Prateek Jain, Purushottam Kar, and Sundararajan Sellamanickam. 2021. IGLU: Efficient GCN training via lazy updates. In Proceedings of the International Conference on Learning Representations.
[32]
Boris T. Polyak and Anatoli B. Juditsky. 1992. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization 30, 4 (1992), 838–855.
[33]
Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. DeepInf: Social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on (KDD ’18). 2110–2119.
[34]
Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Mahmut Kandemir, and Anand Sivasubramaniam. 2021. Learn locally, correct globally: A distributed algorithm for training graph neural networks. In Proceedings of the International Conference on Learning Representations.
[35]
Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In Proceedings of the 15th International Conference on the Semantic Web (ESWC ’18). Springer, 593–607.
[36]
Sebastian U Stich. 2018. Local SGD converges fast and communicates little. arXiv:1805.09767. Retrieved from https://arxiv.org/abs/1805.09767
[37]
Alok Tripathy, Katherine Yelick, and Aydi̇n Buluç. 2020. Reducing communication in graph neural network training. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–14.
[38]
Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv:1706.02263. Retrieved from https://arxiv.org/abs/1706.02263
[39]
Jianyu Wang, Rui Wen, Chunming Wu, Yu Huang, and Jian Xiong. 2019. FdGars: Fraudster detection via graph convolutional networks in online app review system. In Proceedings of the World Wide Web Conference, 310–316.
[40]
Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 1, 1 (2020), 396–413.
[41]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2018. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19). 165–174.
[42]
Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, et al. 2022. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In Proceedings of the International Conference on Machine Learning. PMLR, 23965–23998.
[43]
Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning (ICML).
[44]
Bishan Yang, Scott Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR).
[45]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eskombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18). 974–983.
[46]
Jiaxuan You, Zhitao Ying, and Jure Leskovec. 2020. Design space for graph neural networks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 17009–17021.
[47]
Hao Yu, Sen Yang, and Shenghuo Zhu. 2019. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 5693–5700.
[48]
Lingfan Yu, Jiajun Shen, Jinyang Li, and Adam Lerer. 2020. Scalable graph neural networks for heterogeneous graphs. arXiv:2011.09679. Retrieved from https://arxiv.org/abs/2011.09679
[49]
Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, and Ren Chen. 2021. Decoupling the depth and scope of graph neural networks. In Proceedings of the 35th International Conference on Neural Information Processing Systems, 19665–19679.
[50]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019. GraphSAINT: Graph sampling based inductive learning method. In Proceedings of the International Conference on Learning Representations.
[51]
Zhanqiu Zhang, Jie Wang, Jieping Ye, and Feng Wu. 2022. Rethinking graph convolutional networks in knowledge graph completion. In Proceedings of the ACM Web Conference, 798–807.
[52]
Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. Distdgl: distributed graph neural network training for billion-scale graphs. In Proceedings of the IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 36–44.
[53]
Da Zheng, Xiang Song, Chengru Yang, Dominique LaSalle, and George Karypis. 2021. Distributed hybrid CPU and GPU training for graph neural networks on billion-scale graphs. arXiv:2112.15345. Retrieved from https://arxiv.org/abs/2112.15345
[54]
Houquan Zhou, Shenghua Liu, Danai Koutra, Huawei Shen, and Xueqi Cheng. 2023. A provable framework of learning graph embeddings via summarization. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI ’23). AAAI Press, 4946–4953. DOI:
[55]
Jiong Zhu, Gaotang Li, Yao-An Yang, Jing Zhu, Xuehao Cui, and Danai Koutra. 2024. On the impact of feature heterophily on link prediction with graph neural networks. arXiv:2409.17475. Retrieved from https://arxiv.org/abs/2409.17475
[56]
Jing Zhu, Xiang Song, Vassilis N. Ioannidis, Danai Koutra, and Christos Faloutsos. 2024. TouchUp-G: Improving feature representation through graph-centric finetuning. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). ACM, New York, NY, 2662–2666. DOI:
[57]
Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond homophily in graph neural networks: Current limitations and effective designs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS ’20).
[58]
Jing Zhu, Yuhang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai, Xiang Song, and Danai Koutra. 2024c. Pitfalls in link prediction with graph neural networks: Understanding the impact of target-link inclusion & better practices. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM). ACM, New York, NY, 994–1002. DOI:
[59]
Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. Aligraph: A comprehensive graph neural network platform. arXiv:1902.08730. Retrieved from https://arxiv.org/abs/1902.08730

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 19, Issue 1
January 2025
603 pages
EISSN:1556-472X
DOI:10.1145/3703003
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024
Online AM: 08 November 2024
Accepted: 18 September 2024
Revised: 15 July 2024
Received: 20 October 2023
Published in TKDD Volume 19, Issue 1

Check for updates

Author Tags

  1. graph neural networks
  2. scalability
  3. distributed learning
  4. model aggregation training

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 110
    Total Downloads
  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)52
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media