research-article

Open access

GraphStorm: All-in-one Graph Machine Learning Framework for Industry Applications

Authors:

Theodore Vasiloudis,

Alejandro Mottini,

Huzefa Rangwala,

Christos Faloutsos,

George KarypisAuthors Info & Claims

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 6356 - 6367

https://doi.org/10.1145/3637528.3671603

Published: 24 August 2024 Publication History

Abstract

Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a <u>dozen</u> <u>billion-scale</u> industry applications after its release in May 2023. It is open-sourced in Github: https://github.com/awslabs/graphstorm.

References

[1]

Euler github. https://github.com/alibaba/euler, September 2020.

[2]

Pgl github. https://github.com/PaddlePaddle/PGL, September 2020.

[3]

D. Busbridge, D. Sherburn, P. Cavallo, and N. Y. Hammerla. Relational graph attention networks. 2019.

[4]

Y. Cen, Z. Hou, Y.Wang, Q. Chen, Y. Luo, Z. Yu, H. Zhang, X. Yao, A. Zeng, S. Guo, Y. Dong, Y. Yang, P. Zhang, G. Dai, Y. Wang, C. Zhou, H. Yang, and J. Tang. Cogdl: A comprehensive library for graph deep learning. In Proceedings of the ACM Web Conference 2023 (WWW'23), 2023.

Digital Library

[5]

da Xu, chuanwei ruan, evren korpeoglu, sushant kumar, and kannan achan. Inductive representation learning on temporal graphs. In International Conference on Learning Representations (ICLR), 2020.

[6]

O. Ferludin, A. Eigenwillig, M. Blais, D. Zelle, J. Pfeifer, A. Sanchez-Gonzalez, W. L. S. Li, S. Abu-El-Haija, P. Battaglia, N. Bulut, J. Halcrow, F. M. G. de Almeida, P. Gonnet, L. Jiang, P. Kothari, S. Lattanzi, A. Linhares, B. Mayer, V. Mirrokni, J. Palowitch, M. Paradkar, J. She, A. Tsitsulin, K. Villela, L. Wang, D. Wong, and B. Perozzi. TF-GNN: graph neural networks in tensorflow. CoRR, abs/2207.03522, 2023.

[7]

M. Fey and J. E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLRWorkshop on Representation Learning on Graphs and Manifolds, 2019.

[8]

W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.

[9]

Z. Hu, Y. Dong, K. Wang, and Y. Sun. Heterogeneous graph transformer. In WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20--24, 2020, pages 2704--2710. ACM / IW3C2, 2020.

Digital Library

[10]

V. N. Ioannidis, X. Song, D. Zheng, H. Zhang, J. Ma, Y. Xu, B. Zeng, T. Chilimbi, and G. Karypis. Efficient and effective training of language and graph neural network models, 2022.

[11]

G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359--392, jan 1998.

Digital Library

[12]

G. Karypis, K. Schloegel, and V. Kumar. Parmetis: Parallel graph partitioning and sparse matrix ordering library. Technical report, Department of Computer Science, University of Minnesota, 2011.

[13]

T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.

[14]

A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. PyTorch-BigGraph: A Large-scale Graph Embedding System. In Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA, 2019.

[15]

J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pages 43--52, 2015.

Digital Library

[16]

B. Rozemberczki, P. Scherer, Y. He, G. Panagopoulos, A. Riedel, M. Astefanoaei, O. Kiss, F. Beres, G. Lopez, N. Collignon, and R. Sarkar. PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, page 4564--4573, 2021.

Digital Library

[17]

V. Sanh, L. Debut, J. Chaumond, and T.Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.

[18]

M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks, 2017.

[19]

A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. Hsu, and K. Wang. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web, pages 243--246, 2015.

Digital Library

[20]

P. Veličkovi#263;, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.

[21]

M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai, T. Xiao, T. He, G. Karypis, J. Li, and Z. Zhang. Deep graph library: A graphcentric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.

[22]

H. Xie, D. Zheng, J. Ma, H. Zhang, V. N. Ioannidis, X. Song, Q. Ping, S. Wang, C. Yang, Y. Xu, B. Zeng, and T. Chilimbi. Graph-aware language model pretraining on a large graph corpus can help multiple graph applications. In Proceedings of the 29th ACMSIGKDD Conference on Knowledge Discovery and Data Mining, KDD '23, page 5270--5281, New York, NY, USA, 2023. Association for Computing Machinery.

[23]

B. Yang, S. W.-t. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, 2015.

[24]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10), Boston, MA, June 2010. USENIX Association.

[25]

D. Zhang, X. Huang, Z. Liu, J. Zhou, Z. Hu, X. Song, Z. Ge, L. Wang, Z. Zhang, and Y. Qi. Agl: A scalable system for industrial-purpose graph machine learning. Proceedings of the VLDB Endowment, 13(12).

[26]

S. Zhang, Y. Liu, Y. Sun, and N. Shah. Graph-less neural networks: Teaching old MLPs new tricks via distillation. In International Conference on Learning Representations, 2022.

[27]

J. Zhao, M. Qu, C. Li, H. Yan, Q. Liu, R. Li, X. Xie, and J. Tang. Learning on large-scale text-attributed graphs via variational inference. In The Eleventh International Conference on Learning Representations, 2023.

[28]

D. Zheng, C. Ma, M. Wang, J. Zhou, Q. Su, X. Song, Q. Gan, Z. Zhang, and G. Karypis. Distdgl: Distributed graph neural network training for billion-scale graphs. CoRR, abs/2010.05337, 2020.

[29]

D. Zheng, X. Song, C. Ma, Z. Tan, Z. Ye, J. Dong, H. Xiong, Z. Zhang, and G. Karypis. Dgl-ke: Training knowledge graph embeddings at scale. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '20, page 739--748, New York, NY, USA, 2020. Association for Computing Machinery.

Digital Library

[30]

W. Zheng, E. W. Huang, N. Rao, S. Katariya, Z. Wang, and K. Subbian. Cold brew: Distilling graph node representations with incomplete or missing neighborhoods. In International Conference on Learning Representations, 2022.

[31]

H. Zhou, D. Zheng, I. Nisa, V. Ioannidis, X. Song, and G. Karypis. TGL: A general framework for temporal gnn training on billion-scale graphs. Proc. VLDB Endow., 15(8), 2022.

Digital Library

[32]

J. Zhu, Y. Zhou, V. N. Ioannidis, S. Qian,W. Ai, X. Song, and D. Koutra. Spottarget: Rethinking the effect of target edges for link prediction in graph neural networks. arXiv e-prints, pages arXiv--2306, 2023.

[33]

R. Zhu, K. Zhao, H. Yang, W. Lin, C. Zhou, B. Ai, Y. Li, and J. Zhou. Aligraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment, 12(12):2094--2105, 2019.

Digital Library

Index Terms

GraphStorm: All-in-one Graph Machine Learning Framework for Industry Applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Graph Machine Learning Meets Multi-Table Relational Data
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

While graph machine learning, and notably graph neural networks (GNNs), have gained immense traction in recent years, application is predicated on access to a known input graph upon which predictive models can be trained. And indeed, within the most ...
Decomposing a graph into pseudoforests with one having bounded degree

The maximum average degree of a graph G, denoted by mad ( G ) , is defined as mad ( G ) = max H G 2 e ( H ) v ( H ) . Suppose that is an orientation of G, G denotes the oriented graph. It is well-known that for any graph G, there exists an orientation ...
The competition number of a graph having exactly one hole

Let D be an acyclic digraph. The competition graph of D has the same set of vertices as D and an edge between vertices u and v if and only if there is a vertex x in D such that (u,x) and (v,x) are arcs of D. The competition number of a graph G, denoted ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2024

6901 pages

ISBN:9798400704901

DOI:10.1145/3637528

General Chairs:
Ricardo Baeza-Yates
Northeastern University, USA
,
Francesco Bonchi
CENTAI / Eurecat, Italy

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '24

Sponsor:

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
374
Total Downloads

Downloads (Last 12 months)374
Downloads (Last 6 weeks)121

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents