research-article

Open access

Historical Embedding-Guided Efficient Large-Scale Federated Graph Learning

Authors:

Han YuAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 2, Issue 3

Article No.: 144, Pages 1 - 24

https://doi.org/10.1145/3654947

Published: 30 May 2024 Publication History

Abstract

Graph convolutional networks (GCNs) are promising for graph learning tasks. For privacy-preserving graph learning tasks involving distributed graph datasets, federated learning (FL)-based GCN (FedGCN) training is required. An important open challenge for FedGCN is scaling to large graphs, which typically incurs 1) high computation overhead for handling the explosively-increasing number of neighbors, and 2) high communication overhead of training GCNs involving multiple FL clients. Thus, neighbor sampling is being studied to enhance the scalability of FedGCNs. Existing FedGCN training techniques with neighbor sampling often produce extremely large communication and computation overhead and inaccurate node embeddings, leading to poor model performance. To bridge this gap, we propose the <u>Fed</u>erated <u>A</u>daptive <u>A</u>ttention-based <u>S</u>ampling (FedAAS) approach. It achieves substantial cost savings by efficiently leveraging historical embedding estimators and focusing the limited communication resources on transmitting the most influential neighbor node embeddings across FL clients. We further design an adaptive embedding synchronization scheme to optimize the efficiency and accuracy of FedAAS on large-scale datasets. Theoretical analysis shows that the approximation error induced by the staleness of historical embedding is upper bounded, and the model is guaranteed to converge in an efficient manner. Extensive experimental evaluation against four state-of-the-art baselines on six real-world graph datasets show that FedAAS achieves up to 5.12% higher test accuracy, while saving communication and computation costs by 95.11% and 94.76%, respectively.

Supplemental Material

ZIP File

The supplementary materials include slides, presentation video - short version and poster.

Download
42.50 MB

References

[1]

Fahao Chen, Peng Li, Toshiaki Miyazaki, and Celimuge Wu. 2021. Fedgraph: Federated graph learning with intelligent sampling. IEEE Transactions on Parallel and Distributed Systems 33, 8 (2021), 1775--1786.

Digital Library

[2]

Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018).

[3]

Jianfei Chen, Jun Zhu, and Le Song. 2017. Stochastic training of graph convolutional networks with variance reduction. arXiv preprint arXiv:1710.10568 (2017).

[4]

Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 257--266.

Digital Library

[5]

Hanjun Dai, Zornitsa Kozareva, Bo Dai, Alex Smola, and Le Song. 2018. Learning steady-states of iterative algorithms over graphs. In International conference on machine learning. PMLR, 1106--1114.

[6]

Pan Deng, Xuefeng Liu, Jianwei Niu, and Chunming Hu. 2023. GraphFed: A Personalized Subgraph Federated Learning Framework for Non-IID Graphs. In 2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems (MASS). IEEE, 227--233.

[7]

Bingqian Du and Chuan Wu. 2022. Federated Graph Learning with Periodic Neighbour Sampling. In 2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS). IEEE, 1--10.

[8]

Vijay Prakash Dwivedi, Chaitanya K Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2023. Benchmarking graph neural networks. Journal of Machine Learning Research 24, 43 (2023), 1--48.

[9]

Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2021. Graph neural networks with learnable structural and positional representations. arXiv preprint arXiv:2110.07875 (2021).

[10]

Vijay Prakash Dwivedi, Ladislav Rampá?ek, Michael Galkin, Ali Parviz, Guy Wolf, Anh Tuan Luu, and Dominique Beaini. 2022. Long range graph benchmark. Advances in Neural Information Processing Systems 35 (2022), 22326--22340.

[11]

Davide Falessi, Aalok Ahluwalia, and Massimiliano DI Penta. 2021. The impact of dormant defects on defect prediction: A study of 19 apache projects. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 1 (2021), 1--26.

Digital Library

[12]

Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLRWorkshop on Representation Learning on Graphs and Manifolds.

[13]

Matthias Fey, Jan E Lenssen, Frank Weichert, and Jure Leskovec. 2021. Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In International Conference on Machine Learning. PMLR, 3294--3304.

[14]

Saeed Ghadimi and Guanghui Lan. 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23, 4 (2013), 2341--2368.

Digital Library

[15]

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 1263--1272.

[16]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).

[17]

Andrew Hard, Kanishka Rao, Rajiv Mathews, and Ramaswamy. 2018. Federated learning for mobile keyboard prediction. DeepAI (2018).

[18]

Chaoyang He, Keshav Balasubramanian, Emir Ceyani, Carl Yang, Han Xie, Lichao Sun, Lifang He, Liangwei Yang, Philip S Yu, Yu Rong, et al. 2021. Fedgraphnn: A federated learning system and benchmark for graph neural networks. arXiv preprint arXiv:2104.07145 (2021).

[19]

Steffen Herbold, Alexander Trautsch, and Jens Grabowski. 2018. A comparative study to benchmark cross-project defect prediction approaches. In Proceedings of the 40th International Conference on Software Engineering. 1063--1063.

Digital Library

[20]

Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. 2018. Adaptive sampling towards fast graph representation learning. Advances in neural information processing systems 31 (2018).

[21]

Angelos Katharopoulos and François Fleuret. 2018. Not all samples are created equal: Deep learning with importance sampling. In International conference on machine learning. PMLR, 2525--2534.

[22]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[23]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[24]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web. 591--600.

Digital Library

[25]

Anran Li, Lan Zhang, Junhao Wang, Juntao Tan, Feng Han, Yaxuan Qin, Nikolaos M Freris, and Xiang-Yang Li. 2021. Efficient Federated-Learning Model Debugging. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 372--383.

[26]

Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. 2022. Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 965--978.

[27]

Rui Liu, Pengwei Xing, Zichao Deng, Anran Li, Cuntai Guan, and Han Yu. 2022. Federated graph neural networks: Overview, techniques and challenges. arXiv preprint arXiv:2202.07256 (2022).

[28]

Xin Liu, Mingyu Yan, Lei Deng, Guoqi Li, Xiaochun Ye, and Dongrui Fan. 2021. Sampling methods for efficient training of graph convolutional networks: A survey. IEEE/CAA Journal of Automatica Sinica 9, 2 (2021), 205--234.

[29]

Brendan McMahan, Eider Moore, and Ramage. 2017. Communication-efficient learning of deep networks from decentralized data. In ICML.

[30]

Qiying Pan and Yifei Zhu. 2022. FedWalk: Communication Efficient Federated Unsupervised Node Embedding with Differential Privacy. arXiv preprint arXiv:2205.15896 (2022).

[31]

Morteza Ramezani, Weilin Cong, Mahmut T Kandemir, and Anand Sivasubramaniam. 2021. Learn locally, correct globally: A distributed algorithm for training graph neural networks. arXiv preprint arXiv:2111.08202 (2021).

[32]

Ladislav Rampá?ek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. 2022. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems 35 (2022), 14501--14515.

[33]

Yu Rong, Yatao Bian, Tingyang Xu,Weiyang Xie, YingWei,Wenbing Huang, and Junzhou Huang. 2020. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems 33 (2020), 12559--12571.

[34]

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI magazine 29, 3 (2008), 93--93.

[35]

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868 (2018).

[36]

Mengying Sun, Sendong Zhao, Olivier Elemento, Jiayu Zhou, and Fei Wang. 2020. Graph convolutional networks for computational drug development and discovery. Briefings in bioinformatics 21, 3 (2020), 919--935.

[37]

Yue Tan, Yixin Liu, Guodong Long, Jing Jiang, Qinghua Lu, and Chengqi Zhang. 2022. Federated Learning on Non-IID Graphs via Structural Knowledge Sharing. arXiv preprint arXiv:2211.13009 (2022).

[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[39]

Xiaoyang Wang, Yao Ma, Yiqi Wang, Wei Jin, Xin Wang, Jiliang Tang, Caiyan Jia, and Jian Yu. 2020. Traffic flow prediction via spatial temporal graph neural network. In Proceedings of The Web Conference 2020. 1082--1092.

Digital Library

[40]

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI conference on artificial intelligence, Vol. 28.

[41]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974--983.

Digital Library

[42]

Mikhail Yurochkin, Mayank Agarwal, Nghia Hoang, and Yasaman Khazaeni. 2019. Bayesian nonparametric federated learning of neural networks. In International conference on machine learning. PMLR, 7252--7261.

[43]

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931 (2019).

[44]

Huanding Zhang, Tao Shen, Fei Wu, Mingyang Yin, Hongxia Yang, and Chao Wu. 2021. Federated Graph Learning--A Position Paper. arXiv preprint arXiv:2105.11099 (2021).

[45]

Ke Zhang, Carl Yang, Xiaoxiao Li, Lichao Sun, and Siu Ming Yiu. 2021. Subgraph federated learning with missing neighbor generation. Advances in Neural Information Processing Systems 34 (2021), 6671--6682.

[46]

Taolin Zhang, Chuan Chen, Yaomin Chang, Lin Shu, and Zibin Zheng. 2022. FedEgo: Privacy-preserving Personalized Federated Graph Learning with Ego-graphs. arXiv preprint arXiv:2208.13685 (2022).

[47]

Difan Zou, Ziniu Hu, Song Jiang, Yizhou Sun, and Quanquan Gu. 2019. Layer-dependent importance sampling for training deep and large graph convolutional networks. Advances in neural information processing systems 32 (2019).

Index Terms

Historical Embedding-Guided Efficient Large-Scale Federated Graph Learning
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Efficient Biclique Counting in Large Bipartite Graphs
PACMMOD

A (p,q)-biclique is a complete subgraph (X,Y) that |X|=p, |Y|=q. Counting (p,q)-bicliques in bipartite graphs is an important operator for many bipartite graph analysis applications. However, getting the count of (p,q)-bicliques for large p and q (e.g., ...
An efficient federated learning framework for graph learning in hyperbolic space
Abstract
With the increasing number of graph data, Graph Federated Learning (GFL) has emerged and been used in medicine, chemistry, social networks and other fields. Consequently, the efficiency of graph classification has become a crucial issue in the ...
Embedding spanning subgraphs into large dense graphs

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 2, Issue 3

SIGMOD

June 2024

1953 pages

EISSN:2836-6573

DOI:10.1145/3670010

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024

Published in PACMMOD Volume 2, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

AI Singapore Programme

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
233
Total Downloads

Downloads (Last 12 months)233
Downloads (Last 6 weeks)59

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents