research-article

RSC: accelerate graph neural networks training via randomized sparse computations

AUTHORs:

Shengyuan Chen,

Xia HuAuthors Info & Claims

ICML'23: Proceedings of the 40th International Conference on Machine Learning

Article No.: 910, Pages 21951 - 21968

Published: 23 July 2023 Publication History

Abstract

Training graph neural networks (GNNs) is extremely time-consuming because sparse graph-based operations are hard to be accelerated by community hardware. Prior art successfully reduces the computation cost of dense matrix based operations (e.g., convolution and linear) via sampling-based approximation. However, unlike dense matrices, sparse matrices are stored in an irregular data format such that each row/column may have a different number of non-zero entries. Thus, compared to the dense counterpart, approximating sparse operations has two unique challenges (1) we cannot directly control the efficiency of approximated sparse operation since the computation is only executed on non-zero entries; (2) sampling sparse matrices is much more inefficient due to the irregular data format. To address the issues, our key idea is to control the accuracy-efficiency trade-off by optimizing computation resource allocation layer-wisely and epoch-wisely. For the first challenge, we customize the computation resource to different sparse operations, while limiting the total used resource below a certain budget. For the second challenge, we cache previously sampled sparse matrices to reduce the epoch-wise sampling overhead. To this end, we propose Randomized Sparse Computation. In practice, RSC can achieve up to 11.6× speedup for a single sparse operation and 1.6× end-to-end wall-clock time speedup with almost no accuracy drop. Codes are available at https://github.com/warai-0toko/RSC-ICML.

References

[1]

Adelman, M., Levy, K., Hakimi, I., and Silberstein, M. Faster neural network training with approximate tensor operations. Advances in Neural Information Processing Systems, 34:27877-27889, 2021.

[2]

Cai, T., Luo, S., Xu, K., He, D., Liu, T.-y., and Wang, L. Graphnorm: A principled approach to accelerating graph neural network training. In International Conference on Machine Learning, pp. 1204-1215. PMLR, 2021.

[3]

Chen, B., Dao, T., Liang, K., Yang, J., Song, Z., Rudra, A., and Re, C. Pixelated butterfly: Simple and efficient sparse training for neural network models. arXiv preprint arXiv:2112.00029, 2021.

[4]

Chen, J., Zhu, J., and Song, L. Stochastic training of graph convolutional networks with variance reduction. In International conference on machine learning. PMLR, 2017.

[5]

Chen, M., Wei, Z., Huang, Z., Ding, B., and Li, Y. Simple and deep graph convolutional networks. In International Conference on Machine Learning, pp. 1725-1735. PMLR, 2020.

Digital Library

[6]

Chiang, W.-L., Liu, X., Si, S., Li, Y., Bengio, S., and Hsieh, C.-J. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257-266, 2019.

Digital Library

[7]

Cohen, E. and Lewis, D. D. Approximating matrix multiplication for pattern recognition tasks. Journal of Algorithms, 30(2):211-252, 1999.

Digital Library

[8]

Cong, W., Forsati, R., Kandemir, M., and Mahdavi, M. Minimal variance sampling with provable guarantees for fast training of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1393-1403, 2020.

Digital Library

[9]

Dao, T., Chen, B., Sohoni, N. S., Desai, A., Poli, M., Grogan, J., Liu, A., Rao, A., Rudra, A., and Ré, C. Monarch: Expressive structured matrices for efficient and accurate training. In International Conference on Machine Learning, pp. 4690-4721. PMLR, 2022.

[10]

Drineas, P. and Kannan, R. Fast monte-carlo algorithms for approximate matrix multiplication. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pp. 452-459. IEEE, 2001.

[11]

Drineas, P., Kannan, R., and Mahoney, M. W. Fast monte carlo algorithms for matrices i: Approximating matrix multiplication. SIAM Journal on Computing, 36(1):132- 157, 2006a.

Digital Library

[12]

Drineas, P., Kannan, R., and Mahoney, M. W. Fast monte carlo algorithms for matrices i: Approximating matrix multiplication. SIAM Journal on Computing, 36(1):132- 157, 2006b.

Digital Library

[13]

Duan, K., Liu, Z., Wang, P., Zheng, W., Zhou, K., Chen, T., Hu, X., and Wang, Z. A comprehensive study on large-scale graph training: Benchmarking and rethinking. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022a. URL https://openreview.net/forum?id=2QrFr_U782Z.

[14]

Duan, K., Liu, Z., Wang, P., Zheng, W., Zhou, K., Chen, T., Hu, X., and Wang, Z. A comprehensive study on large-scale graph training: Benchmarking and rethinking. 2022b.

[15]

Feng, W., Zhang, J., Dong, Y., Han, Y., Luan, H., Xu, Q., Yang, Q., Kharlamov, E., and Tang, J. Graph random neural networks for semi-supervised learning on graphs. Advances in neural information processing systems, 33: 22092-22103, 2020.

[16]

Feng, W., Dong, Y., Huang, T., Yin, Z., Cheng, X., Kharlamov, E., and Tang, J. Grand+: Scalable graph random neural networks. In Proceedings of the ACM Web Conference 2022, pp. 3248-3258, 2022.

Digital Library

[17]

Fey, M. and Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.

[18]

Fey, M., Lenssen, J. E., Weichert, F., and Leskovec, J. Gnnautoscale: Scalable and expressive graph neural networks via historical embeddings. In International conference on machine learning, 2021.

[19]

Glorot, X. and Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249-256. JMLR Workshop and Conference Proceedings, 2010.

[20]

Hamilton, W. L., Ying, R., and Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025-1035, 2017.

Digital Library

[21]

Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.

[22]

Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., and Dally, W. J. Eie: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44(3):243-254, 2016.

Digital Library

[23]

Han, X., Jiang, Z., Liu, N., and Hu, X. G-mixup: Graph data augmentation for graph classification. arXiv preprint arXiv:2202.07179, 2022.

[24]

Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., and Leskovec, J. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687, 2020.

[25]

Huang, G., Dai, G., Wang, Y., and Yang, H. Ge-spmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-12. IEEE, 2020a.

[26]

Huang, Q., He, H., Singh, A., Lim, S.-N., and Benson, A. R. Combining label propagation and simple models out-performs graph neural networks. In International Conference on Learning Representations, 2020b.

[27]

Huang, W., Zhang, T., Rong, Y., and Huang, J. Adaptive sampling towards fast graph representation learning. In Advances in Neural Information Processing Systems, 2018.

[28]

Jiang, Z., Han, X., Fan, C., Liu, Z., Zou, N., Mostafavi, A., and Hu, X. Fmp: Toward fair graph message passing against topology bias. arXiv preprint arXiv:2202.04187, 2022.

[29]

Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., and Tang, J. Graph structure learning for robust graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 66-74, 2020.

Digital Library

[30]

Kaler, T., Stathas, N., Ouyang, A., Iliopoulos, A.-S., Schardl, T., Leiserson, C. E., and Chen, J. Accelerating training and inference of graph neural networks with fast sampling and pipelining. Proceedings of Machine Learning and Systems, 4:172-189, 2022.

[31]

Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.

[32]

Klicpera, J., Bojchevski, A., and Günnemann, S. Predict then propagate: Graph neural networks meet personalized pagerank. In International Conference on Learning Representations, 2018.

[33]

Li, Y., Wei, C., and Ma, T. Towards explaining the regularization effect of initial large learning rate in training neural networks. Advances in Neural Information Processing Systems, 32, 2019.

[34]

Liu, Z., Jin, H., Wang, T.-H., Zhou, K., and Hu, X. Divaug: Plug-in automated data augmentation with explicit diversity maximization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4762- 4770, 2021.

[35]

Martinsson, P.-G. and Tropp, J. Randomized numerical linear algebra: foundations & algorithms (2020). arXiv preprint arXiv:2002.01387, 2020.

[36]

Md, V., Misra, S., Ma, G., Mohanty, R., Georganas, E., Heinecke, A., Kalamkar, D., Ahmed, N. K., and Avancha, S. Distgnn: Scalable distributed training for large-scale graph neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-14, 2021.

Digital Library

[37]

Narayanan, S. D., Sinha, A., Jain, P., Kar, P., and SELLAMANICKAM, S. Iglu: Efficient GCN training via lazy updates. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=5kq11Tl1z4.

[38]

Qiu, J., Dhulipala, L., Tang, J., Peng, R., and Wang, C. Lightne: A lightweight graph processing system for network embedding. In Proceedings of the 2021 international conference on management of data, pp. 2281-2289, 2021.

Digital Library

[39]

Rahman, M. K., Sujon, M. H., and Azad, A. Fusedmm: A unified sddmm-spmm kernel for graph embedding and graph neural networks. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 256-266. IEEE, 2021.

[40]

Ramezani, M., Cong, W., Mahdavi, M., Kandemir, M., and Sivasubramaniam, A. Learn locally, correct globally: A distributed algorithm for training graph neural networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=FndDxSz3LxQ.

[41]

Rong, Y., Huang, W., Xu, T., and Huang, J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903, 2019.

[42]

Savas, B. and Dhillon, I. S. Clustered low rank approximation of graphs in information science applications. In Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 164-175. SIAM, 2011.

[43]

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2017.

[44]

Wan, C., Li, Y., Kim, N. S., and Lin, Y. {BDS}- {gcn}: Efficient full-graph training of graph convolutional nets with partition-parallelism and boundary sampling, 2021. URL https://openreview.net/forum?id=uFA24r7v4wL.

[45]

Wan, C., Li, Y., Li, A., Kim, N. S., and Lin, Y. Bns-gcn: Efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proceedings of Machine Learning and Systems, 4:673-693, 2022a.

[46]

Wan, C., Li, Y., Wolfe, C. R., Kyrillidis, A., Kim, N. S., and Lin, Y. Pipegcn: Efficient full-graph training of graph convolutional networks with pipelined feature communication. arXiv preprint arXiv:2203.10428, 2022b.

[47]

Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Gai, Y., Xiao, T., He, T., Karypis, G., Li, J., and Zhang, Z. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.

[48]

Wang, Y., Feng, B., and Ding, Y. Tc-gnn: Accelerating sparse graph neural network computation via dense tensor core on gpus. arXiv preprint arXiv:2112.02052, 2021.

[49]

Wang, Z., Wu, X. C., Xu, Z., and Ng, T. E. Cupcake: Acompression optimizer for scalable communication-efficient distributed training.

[50]

Wang, Z., Xu, Z., Wu, X., Shrivastava, A., and Ng, T. E. Dragonn: Distributed randomized approximate gradients of neural networks. In International Conference on Machine Learning, pp. 23274-23291. PMLR, 2022.

[51]

Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., and Weinberger, K. Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861- 6871. PMLR, 2019.

[52]

Xu, K., Zhang, M., Jegelka, S., and Kawaguchi, K. Optimization of graph neural networks: Implicit acceleration by skip connections and more depth. In International Conference on Machine Learning, pp. 11592- 11602. PMLR, 2021.

[53]

Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974-983, 2018.

Digital Library

[54]

Yu, L., Shen, J., Li, J., and Lerer, A. Scalable graph neural networks for heterogeneous graphs. arXiv preprint arXiv:2011.09679, 2020.

[55]

Yuan, B., Wolfe, C. R., Dun, C., Tang, Y., Kyrillidis, A., and Jermaine, C. Distributed learning of fully connected neural networks using independent subnet training. Proc. VLDB Endow., 15(8):1581-1590, 2022. URL https://www.vldb.org/pvldb/vol15/p1581-wolfe.pdf.

Digital Library

[56]

Zeng, H., Zhou, H., Srivastava, A., Kannan, R., and Prasanna, V. Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJe8pkHFwS.

[57]

Zha, D., Feng, L., Tan, Q., Liu, Z., Lai, K.-H., Bhushanam, B., Tian, Y., Kejariwal, A., and Hu, X. Dreamshard: Generalizable embedding table placement for recommender systems. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=_atSgd9Np52.

[58]

Zha, D., Feng, L., Luo, L., Bhushanam, B., Liu, Z., Hu, Y., Nie, J., Huang, Y., Tian, Y., Kejariwal, A., and Hu, X. Pretrain and search: Efficient embedding table sharding with pre-trained neural cost models. CoRR, abs/2305.01868, 2023.

[59]

Zhang, H., Yu, Z., Dai, G., Huang, G., Ding, Y., Xie, Y., and Wang, Y. Understanding gnn computational graph: A coordinated computation, io, and memory perspective. Proceedings of Machine Learning and Systems, 4:467- 484, 2022.

[60]

Zheng, D., Ma, C., Wang, M., Zhou, J., Su, Q., Song, X., Gan, Q., Zhang, Z., and Karypis, G. Distdgl: distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pp. 36-44. IEEE, 2020.

[61]

Zhong, S., Zhang, G., Huang, N., and Xu, S. Revisit kernel pruning with lottery regulated grouped convolutions. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=LdEhiMG9WLO.

[62]

Zhou, K., Liu, Z., Chen, R., Li, L., Choi, S., and Hu, X. Table2graph: Transforming tabular data to unified weighted graph. In Raedt, L. D. (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23- 29 July 2022, pp. 2420-2426. ijcai.org, 2022.

[63]

Zhou, K., Choi, S.-H., Liu, Z., Liu, N., Yang, F., Chen, R., Li, L., and Hu, X. Adaptive label smoothing to regularize large-scale graph training. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 55-63. SIAM, 2023.

[64]

Zou, D., Hu, Z., Wang, Y., Jiang, S., Sun, Y., and Gu, Q. Layer-dependent importance sampling for training deep and large graph convolutional networks. arXiv preprint arXiv:1911.07323, 2019.

Recommendations

Image compressive sensing via Truncated Schatten-p Norm regularization

Low-rank property as a useful image prior has attracted much attention in image processing communities. Recently, a nonlocal low-rank regularization (NLR) approach toward exploiting low-rank property has shown the state-of-the-art performance in ...
Simultaneously Sparse and Low-Rank Matrix Reconstruction via Nonconvex and Nonseparable Regularization

Many real-world problems involve the recovery of a matrix from linear measurements, where the matrix lies close to some low-dimensional structure. This paper considers the problem of reconstructing a matrix with a simultaneously sparse and low-rank ...
Fast image deconvolution using closed-form thresholding formulas of Lq(q=12,23) regularization

In this paper, we focus on the research of fast deconvolution algorithm based on the non-convex L"q(q=12,23) sparse regularization. Recently, we have deduced the closed-form thresholding formula for L"1"2 regularization model (Xu (2010) [1]). In this ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'23: Proceedings of the 40th International Conference on Machine Learning

July 2023

43479 pages

Copyright © 2023.

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten