Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3637528.3671603acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

GraphStorm: All-in-one Graph Machine Learning Framework for Industry Applications

Published: 24 August 2024 Publication History

Abstract

Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a <u>dozen</u> <u>billion-scale</u> industry applications after its release in May 2023. It is open-sourced in Github: https://github.com/awslabs/graphstorm.

References

[1]
Euler github. https://github.com/alibaba/euler, September 2020.
[2]
Pgl github. https://github.com/PaddlePaddle/PGL, September 2020.
[3]
D. Busbridge, D. Sherburn, P. Cavallo, and N. Y. Hammerla. Relational graph attention networks. 2019.
[4]
Y. Cen, Z. Hou, Y.Wang, Q. Chen, Y. Luo, Z. Yu, H. Zhang, X. Yao, A. Zeng, S. Guo, Y. Dong, Y. Yang, P. Zhang, G. Dai, Y. Wang, C. Zhou, H. Yang, and J. Tang. Cogdl: A comprehensive library for graph deep learning. In Proceedings of the ACM Web Conference 2023 (WWW'23), 2023.
[5]
da Xu, chuanwei ruan, evren korpeoglu, sushant kumar, and kannan achan. Inductive representation learning on temporal graphs. In International Conference on Learning Representations (ICLR), 2020.
[6]
O. Ferludin, A. Eigenwillig, M. Blais, D. Zelle, J. Pfeifer, A. Sanchez-Gonzalez, W. L. S. Li, S. Abu-El-Haija, P. Battaglia, N. Bulut, J. Halcrow, F. M. G. de Almeida, P. Gonnet, L. Jiang, P. Kothari, S. Lattanzi, A. Linhares, B. Mayer, V. Mirrokni, J. Palowitch, M. Paradkar, J. She, A. Tsitsulin, K. Villela, L. Wang, D. Wong, and B. Perozzi. TF-GNN: graph neural networks in tensorflow. CoRR, abs/2207.03522, 2023.
[7]
M. Fey and J. E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLRWorkshop on Representation Learning on Graphs and Manifolds, 2019.
[8]
W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
[9]
Z. Hu, Y. Dong, K. Wang, and Y. Sun. Heterogeneous graph transformer. In WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20--24, 2020, pages 2704--2710. ACM / IW3C2, 2020.
[10]
V. N. Ioannidis, X. Song, D. Zheng, H. Zhang, J. Ma, Y. Xu, B. Zeng, T. Chilimbi, and G. Karypis. Efficient and effective training of language and graph neural network models, 2022.
[11]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359--392, jan 1998.
[12]
G. Karypis, K. Schloegel, and V. Kumar. Parmetis: Parallel graph partitioning and sparse matrix ordering library. Technical report, Department of Computer Science, University of Minnesota, 2011.
[13]
T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[14]
A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. PyTorch-BigGraph: A Large-scale Graph Embedding System. In Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA, 2019.
[15]
J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pages 43--52, 2015.
[16]
B. Rozemberczki, P. Scherer, Y. He, G. Panagopoulos, A. Riedel, M. Astefanoaei, O. Kiss, F. Beres, G. Lopez, N. Collignon, and R. Sarkar. PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, page 4564--4573, 2021.
[17]
V. Sanh, L. Debut, J. Chaumond, and T.Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
[18]
M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks, 2017.
[19]
A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. Hsu, and K. Wang. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web, pages 243--246, 2015.
[20]
P. Veličkovi#263;, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
[21]
M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai, T. Xiao, T. He, G. Karypis, J. Li, and Z. Zhang. Deep graph library: A graphcentric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.
[22]
H. Xie, D. Zheng, J. Ma, H. Zhang, V. N. Ioannidis, X. Song, Q. Ping, S. Wang, C. Yang, Y. Xu, B. Zeng, and T. Chilimbi. Graph-aware language model pretraining on a large graph corpus can help multiple graph applications. In Proceedings of the 29th ACMSIGKDD Conference on Knowledge Discovery and Data Mining, KDD '23, page 5270--5281, New York, NY, USA, 2023. Association for Computing Machinery.
[23]
B. Yang, S. W.-t. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, 2015.
[24]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10), Boston, MA, June 2010. USENIX Association.
[25]
D. Zhang, X. Huang, Z. Liu, J. Zhou, Z. Hu, X. Song, Z. Ge, L. Wang, Z. Zhang, and Y. Qi. Agl: A scalable system for industrial-purpose graph machine learning. Proceedings of the VLDB Endowment, 13(12).
[26]
S. Zhang, Y. Liu, Y. Sun, and N. Shah. Graph-less neural networks: Teaching old MLPs new tricks via distillation. In International Conference on Learning Representations, 2022.
[27]
J. Zhao, M. Qu, C. Li, H. Yan, Q. Liu, R. Li, X. Xie, and J. Tang. Learning on large-scale text-attributed graphs via variational inference. In The Eleventh International Conference on Learning Representations, 2023.
[28]
D. Zheng, C. Ma, M. Wang, J. Zhou, Q. Su, X. Song, Q. Gan, Z. Zhang, and G. Karypis. Distdgl: Distributed graph neural network training for billion-scale graphs. CoRR, abs/2010.05337, 2020.
[29]
D. Zheng, X. Song, C. Ma, Z. Tan, Z. Ye, J. Dong, H. Xiong, Z. Zhang, and G. Karypis. Dgl-ke: Training knowledge graph embeddings at scale. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '20, page 739--748, New York, NY, USA, 2020. Association for Computing Machinery.
[30]
W. Zheng, E. W. Huang, N. Rao, S. Katariya, Z. Wang, and K. Subbian. Cold brew: Distilling graph node representations with incomplete or missing neighborhoods. In International Conference on Learning Representations, 2022.
[31]
H. Zhou, D. Zheng, I. Nisa, V. Ioannidis, X. Song, and G. Karypis. TGL: A general framework for temporal gnn training on billion-scale graphs. Proc. VLDB Endow., 15(8), 2022.
[32]
J. Zhu, Y. Zhou, V. N. Ioannidis, S. Qian,W. Ai, X. Song, and D. Koutra. Spottarget: Rethinking the effect of target edges for link prediction in graph neural networks. arXiv e-prints, pages arXiv--2306, 2023.
[33]
R. Zhu, K. Zhao, H. Yang, W. Lin, C. Zhou, B. Ai, Y. Li, and J. Zhou. Aligraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment, 12(12):2094--2105, 2019.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2024
6901 pages
ISBN:9798400704901
DOI:10.1145/3637528
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Check for updates

Author Tags

  1. graph machine learning
  2. industry scale

Qualifiers

  • Research-article

Conference

KDD '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 374
    Total Downloads
  • Downloads (Last 12 months)374
  • Downloads (Last 6 weeks)121
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media