Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3649153.3649191acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Open access

A Unified CPU-GPU Protocol for GNN Training

Published: 02 July 2024 Publication History


Training a Graph Neural Network (GNN) model on large-scale graphs involves a high volume of data communication and computations. While state-of-the-art CPUs and GPUs feature high computing power, the Standard GNN training protocol adopted in existing GNN frameworks cannot efficiently utilize the platform resources. To this end, we propose a novel Unified CPU-GPU protocol that can improve the resource utilization of GNN training on a CPU-GPU platform. The Unified CPU-GPU protocol instantiates multiple GNN training processes in parallel on both the CPU and the GPU. By allocating training processes on the CPU to perform GNN training collaboratively with the GPU, the proposed protocol improves the platform resource utilization and reduces the CPU-GPU data transfer overhead. Since the performance of a CPU and a GPU varies, we develop a novel load balancer that balances the workload dynamically between CPUs and GPUs during runtime. We evaluate our protocol using two representative GNN sampling algorithms, with two widely-used GNN models, on three datasets. Compared with the Standard training protocol adopted in the state-of-the-art GNN frameworks, our protocol effectively improves resource utilization and improves the overall training time. On a platform where the GPU moderately outperforms the CPU, our protocol speeds up GNN training by up to 1.41×. On a platform where the GPU significantly outperforms the CPU, our protocol speeds up GNN training by up to 1.26×. Our protocol is open-sourced and can be seamlessly integrated into state-of-the-art GNN frameworks and accelerate GNN training. Our protocol particularly benefits those with limited GPU access due to its high demand.


Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. 2016. Revisiting Distributed Synchronous SGD. In International Conference on Learning Representations Workshop.
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed Deep Graph Learning at Scale. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21).
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 1263--1272.
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems.
Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, and Jure Leskovec. 2021. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. arXiv preprint arXiv:2103.09430 (2021).
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv preprint arXiv:2005.00687 (2020).
Vinod Kathail. 2020. Xilinx vitis unified software platform. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 173--174.
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.
Changmin Lee, Won Woo Ro, and Jean-Luc Gaudiot. 2014. Boosting CUDA Applications with CPU-GPU Hybrid Computing. In International Journal of Parallel Programming.
Janghaeng Lee, Mehrzad Samadi, Yongjun Park, and Scott Mahlke. 2015. SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration. ACM Trans. Comput. Syst. (2015).
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, and Kevin J. Barker. 2020. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems (TPDS) (2020).
Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, and Soumith Chintala. 2020. PyTorch Distributed: Experiences on Accelerating Data Parallel Training. Proceedings of the VLDB Endowment (2020).
Yi-Chien Lin, Yuyang Chen, Sameh Gobriel, Nilesh Jain, Gopi Krishna Jha, and Viktor Prasanna. 2024. ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor. arXiv:2402.03671 [cs.DC]
Yi-Chien Lin and Viktor Prasanna. 2023. HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture. In International Parallel and Distributed Processing Symposium.
Yi-Chien Lin, Bingyi Zhang, and Viktor Prasanna. 2022. Accelerating GNN Training on CPU+Multi-FPGA Heterogeneous Platform. In High Performance Computing. Springer International Publishing.
Yi-Chien Lin, Bingyi Zhang, and Viktor Prasanna. 2022. HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform. In International Symposium on Field-Programmable Gate Arrays.
Yi-Chien Lin, Bingyi Zhang, and Viktor Prasanna. 2023. HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform. arXiv:2303.01568 [cs.DC]
Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, and John D. Owens. 2022. Building a Performance Model for Deep Learning Recommendation Model Training on GPUs. In 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
Zhiqi Lin, Cheng Li, Youshan Miao, Yunxin Liu, and Yinlong Xu. 2020. PaGraph: Scaling GNN Training on Large Graphs via Computation-Aware Caching. In Proceedings of the 11th ACM Symposium on Cloud Computing.
Daniela Sánchez Lopera, Lorenzo Servadei, Gamze Naz Kiprit, Souvik Hazra, Robert Wille, and Wolfgang Ecker. 2021. A survey of graph neural networks for electronic design automation. In Workshop on Machine Learning for CAD (MLCAD). IEEE.
Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj D. Kalamkar, Nesreen K. Ahmed, and Sasikanth Avancha. 2021. DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
Raúl Nozal and Jose Luis Bosque. 2021. Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective. In Euro-Par 2021: Parallel Processing.
NVIDIA. [n. d.]. System Management Interface [Online]. https://developer.nvidia.com/nvidia-system-management-interface Accessed: 2023-09-05.
Prasanna Pandit and R. Govindarajan. 2014. Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (Orlando, FL, USA) (CGO '14). 273--283.
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
James Reinders. 2005. VTune performance analyzer essentials. Vol. 9. Intel Press Santa Clara.
Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi, Zherui Liu, Yibo Zhu, and Chuanxiong Guo. 2021. Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 (2021).
John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. 2021. Dorylus: Affordable, scalable, and accurate GNN training with distributed CPU servers and serverless threads. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21).
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=rJXMpikCZ
Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315 (2019).
Zehuan Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Shijie Liu, Daniel G. Abel, Xu Guo, Jianbing Dong, Ji Shi, and Kunlun Li. 2022. Merlin HugeCTR: GPU-Accelerated Recommender System Training and Inference. In Proceedings of the 16th ACM Conference on Recommender Systems (Seattle, WA, USA) (RecSys '22). Association for Computing Machinery, New York, NY, USA, 534--537. https://doi.org/10.1145/3523227.3547405
Nan Wu, Hang Yang, Yuan Xie, Pan Li, and Cong Hao. 2022. High-Level Synthesis Performance Prediction Using GNNs: Benchmarking, Modeling, and Advancing. In Proceedings of the Design Automation Conference.
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations. https://openreview.net/forum?id=ryGs6iA5Km
Chu-I Yang and Yi-Pei Li. 2023. Explainable uncertainty quantifications for deep learning-based molecular property prediction. Journal of Cheminformatics (2023).
Jianbang Yang, Dahai Tang, Xiaoniu Song, Lei Wang, Qiang Yin, Rong Chen, Wenyuan Yu, and Jingren Zhou. 2022. GNNLab: A Factored System for Sample-Based GNN Training over GPUs. In Proceedings of the 17th European Conference on Computer Systems.
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM International Conference on Knowledge Discovery & Data Mining.
Hanqing Zeng and Viktor Prasanna. 2020. GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.
Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, and Ren Chen. 2021. Decoupling the Depth and Scope of Graph Neural Networks. In Advances in Neural Information Processing Systems.
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In International Conference on Learning Representations.
Z. Zhang, P. Cui, and W. Zhu. 2022. Deep Learning on Graphs: A Survey. IEEE Transactions on Knowledge & Data Engineering (jan 2022). https://doi.org/10.1109/TKDE.2020.2981333
Da Zheng et al. 2020. Distdgl: distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3).
Da Zheng et al. 2022. Distributed Hybrid CPU and GPU Training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment (2019).

Index Terms

  1. A Unified CPU-GPU Protocol for GNN Training



    Information & Contributors


    Published In

    cover image ACM Conferences
    CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
    May 2024
    345 pages
    This work is licensed under a Creative Commons Attribution International 4.0 License.



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 July 2024

    Check for updates

    Author Tags

    1. GNN
    2. GNN training
    3. Unified CPU-GPU protocol


    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • ARL
    • NSF


    CF '24

    Acceptance Rates

    CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
    Overall Acceptance Rate 273 of 785 submissions, 35%

    Upcoming Conference

    CF '25


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • 0
      Total Citations
    • 229
      Total Downloads
    • Downloads (Last 12 months)229
    • Downloads (Last 6 weeks)51
    Reflects downloads up to 01 Jan 2025

    Other Metrics


    View Options

    View options


    View or Download as a PDF file.



    View online with eReader.


    Login options







    Share this Publication link

    Share on social media