Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Eliminating Data Processing Bottlenecks in GNN Training over Large Graphs via Two-level Feature Compression

Published: 30 August 2024 Publication History

Abstract

Training GNNs over large graphs faces a severe data processing bottleneck, involving both sampling and feature loading. To tackle this issue, we introduce F2CGT, a fast GNN training system incorporating feature compression. To avoid potential accuracy degradation, we propose a two-level, hybrid feature compression approach that applies different compression methods to various graph nodes. This differentiated choice strikes a balance between rounding errors, compression ratios, model accuracy loss, and preprocessing costs. Our theoretical analysis proves that this approach offers convergence and comparable model accuracy as the conventional training without feature compression. Additionally, we also co-design the on-GPU cache sub-system with compression-enabled training within F2CGT. The new cache sub-system, driven by a cost model, runs new cache policies to carefully choose graph nodes with high access frequencies, and well partitions the spare GPU memory for various types of graph data, for improving cache hit rates. Finally, extensive evaluation of F2CGT on two popular GNN models and four datasets, including three large public datasets, demonstrates that F2CGT achieves a compression ratio of up to 128 and provides GNN training speedups of 1.23-2.56× and 3.58--71.46× for single-machine and distributed training, respectively, with up to 32 GPUs and marginal accuracy loss.

References

[1]
Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, and Eduard Alarcón. 2021. Computing graph neural networks: A survey from algorithms to accelerators. ACM Computing Surveys (CSUR) 54, 9 (2021), 1--38.
[2]
David M Allen. 1971. Mean square error of prediction as a criterion for selecting variables. Technometrics 13, 3 (1971), 469--475.
[3]
Anonymous. 2024. F2CGT open source code. https://github.com/gpzlx1/F2CGT/. [Online; accessed July-2024].
[4]
Anonymous. 2024. F2CGT supplemental material. https://github.com/gpzlx1/F2CGT-supplemental. [Online; accessed July-2024].
[5]
NVIDIA Corporation. 2023. NVIDIA CUDA Unified Addressing. https://docs.nvidia.com/cuda/cuda-driver-api/group___CUDA__UNIFIED.html. [Online; accessed July-2024].
[6]
Team DGL. 2023. DGL Homepage. https://www.dgl.ai/. [Online; accessed July-2024].
[7]
Mucong Ding, Kezhi Kong, Jingling Li, Chen Zhu, John Dickerson, Furong Huang, and Tom Goldstein. 2021. VQ-GNN: A universal framework to scale up graph neural networks using vector quantization. Advances in Neural Information Processing Systems 34 (2021), 6733--6746.
[8]
Jialin Dong, Da Zheng, Lin F Yang, and George Karypis. 2021. Global neighbor sampling for mixed CPU-GPU training on giant graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 289--299.
[9]
Boyuan Feng, Yuke Wang, Xu Li, Shu Yang, Xueqiao Peng, and Yufei Ding. 2020. Sgquant: Squeezing the last bit on graph neural networks with specialized quantization. In 2020 IEEE 32nd international conference on tools with artificial intelligence (ICTAI). IEEE, 1044--1052.
[10]
Matthias Fey, Jan E Lenssen, Frank Weichert, and Jure Leskovec. 2021. Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In International Conference on Machine Learning. PMLR, 3294--3304.
[11]
Chen Gao, Yu Zheng, Nian Li, Yinfeng Li, Yingrong Qin, Jinghua Piao, Yuhan Quan, Jianxin Chang, Depeng Jin, Xiangnan He, et al. 2023. A survey of graph neural networks for recommender systems: Challenges, methods, and directions. ACM Transactions on Recommender Systems 1, 1 (2023), 1--51.
[12]
Ruiqi Gao, Tianle Cai, Haochuan Li, Cho-Jui Hsieh, Liwei Wang, and Jason D Lee. 2019. Convergence of adversarial training in overparametrized neural networks. Advances in Neural Information Processing Systems 32 (2019).
[13]
Ping Gong, Renjie Liu, Zunyao Mao, Zhenkun Cai, Xiao Yan, Cheng Li, Minjie Wang, and Zhuozhao Li. 2023. gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning. In Proceedings of the 29th Symposium on Operating Systems Principles (Koblenz, Germany) (SOSP '23). Association for Computing Machinery, New York, NY, USA, 562--578.
[14]
Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).
[15]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
[16]
Jared Heinly, Enrique Dunn, and Jan-Michael Frahm. 2012. Comparative evaluation of binary features. In European Conference on Computer Vision. Springer, 759--773.
[17]
Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, and Jure Leskovec. 2021. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. arXiv preprint arXiv:2103.09430 (2021).
[18]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020), 22118--22133.
[19]
Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using GPUs. In Proceedings of the sixteenth European conference on computer systems. 311--326.
[20]
George Karypis and Vipin Kumar. 1997. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. (1997).
[21]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.
[22]
Xiao Li, Li Sun, Mengjie Ling, and Yan Peng. 2023. A survey of graph neural network based recommendation in social networks. Neurocomputing 549 (2023), 126441.
[23]
Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 344--352.
[24]
Zhiqi Lin, Cheng Li, Youshan Miao, Yunxin Liu, and Yinlong Xu. 2020. Pagraph: Scaling gnn training on large graphs via computation-aware caching. In Proceedings of the 11th ACM Symposium on Cloud Computing. 401--415.
[25]
Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE transactions on visualization and computer graphics 20, 12 (2014), 2674--2683.
[26]
Peter Lindstrom and Martin Isenburg. 2006. Fast and efficient compression of floating-point data. IEEE transactions on visualization and computer graphics 12, 5 (2006), 1245--1250.
[27]
Shie Mannor, Dori Peleg, and Reuven Rubinstein. 2005. The cross entropy method for classification. In Proceedings of the 22nd international conference on Machine learning. 561--568.
[28]
Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S. Li, and Hang Liu. 2020. C-SAW: A Framework for Graph Sampling and Random Walk on GPUs. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1--15.
[29]
Team PyG. 2023. PyTorch Geometric. https://pyg.org/. [Online; accessed July-2024].
[30]
Team PyTorch. 2023. Pytorch Homepage. https://pytorch.org/. [Online; accessed July-2024].
[31]
Manon Réau, Nicolas Renaud, Li C Xue, and Alexandre MJJ Bonvin. 2023. DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics 39, 1 (2023), btac759.
[32]
Pau Rodriguez, Miguel A Bautista, Jordi Gonzàlez, and Sergio Escalera. 2018. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing 75 (2018), 21--31.
[33]
T. Konstantin Rusch, Michael M. Bronstein, and Siddhartha Mishra. 2023. A Survey on Oversmoothing in Graph Neural Networks. arXiv:2303.10993 [cs.LG]
[34]
Amazon Web Services. 2023. Amazon EC2 G4 Instances. https://aws.amazon.com/ec2/instance-types/g4/. accessed, July-2024.
[35]
Jie Sun, Li Su, Zuocheng Shi, Wenting Shen, Zeke Wang, Lei Wang, Jie Zhang, Yong Li, Wenyuan Yu, Jingren Zhou, and Fei Wu. 2023. Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 165--179. https://www.usenix.org/conference/atc23/presentation/sun
[36]
Jie Sun, Li Su, Zuocheng Shi, Wenting Shen, Zeke Wang, Lei Wang, Jie Zhang, Yong Li, Wenyuan Yu, Jingren Zhou, and Fei Wu. 2023. Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 165--179. https://www.usenix.org/conference/atc23/presentation/sun
[37]
Shyam A Tailor, Javier Fernandez-Marques, and Nicholas D Lane. 2020. Degree-quant: Quantization-aware training for graph neural networks. arXiv preprint arXiv:2008.05000 (2020).
[38]
SNAP Team. 2023. Friendster social network and ground-truth communities. https://snap.stanford.edu/data/com-Friendster.html. accessed, July-2024.
[39]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[40]
Borui Wan, Juntao Zhao, and Chuan Wu. 2023. Adaptive message quantization and parallelization for distributed full-graph gnn training. Proceedings of Machine Learning and Systems 5 (2023).
[41]
Junfu Wang, Yunhong Wang, Zhen Yang, Liang Yang, and Yuanfang Guo. 2021. Bi-gcn: Binary graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1561--1570.
[42]
Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 1, 1 (2020), 396--413.
[43]
Bernard Widrow, Istvan Kollar, and Ming-Chang Liu. 1996. Statistical theory of quantization. IEEE Transactions on instrumentation and measurement 45, 2 (1996), 353--361.
[44]
Jianbang Yang, Dahai Tang, Xiaoniu Song, Lei Wang, Qiang Yin, Rong Chen, Wenyuan Yu, and Jingren Zhou. 2022. GNNLab: a factored system for sample-based GNN training over GPUs. In Proceedings of the Seventeenth European Conference on Computer Systems. 417--434.
[45]
Xin Zhang, Yanyan Shen, Yingxia Shao, and Lei Chen. 2023. DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU. Proc. ACM Manag. Data 1, 2, Article 166 (jun 2023), 24 pages.
[46]
Zhe Zhang, Ziyue Luo, and Chuan Wu. 2023. Two-level Graph Caching for Expediting Distributed GNN Training. In IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. 1--10.
[47]
D. Zheng, C. Ma, M. Wang, J. Zhou, Q. Su, X. Song, Q. Gan, Z. Zhang, and G. Karypis. 2020. DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE Computer Society, Los Alamitos, CA, USA, 36--44.
[48]
Zeyu Zhu, Fanrong Li, Zitao Mo, Qinghao Hu, Gang Li, Zejian Liu, Xiaoyao Liang, and Jian Cheng. 2023. A2Q: Aggregation-Aware Quantization for Graph Neural Networks. arXiv preprint arXiv:2302.00193 (2023).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 11
July 2024
1039 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 30 August 2024
Published in PVLDB Volume 17, Issue 11

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 176
    Total Downloads
  • Downloads (Last 12 months)176
  • Downloads (Last 6 weeks)39
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media