Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees

Published: 30 May 2023 Publication History

Abstract

Representation learning over dynamic graphs is critical for many real-world applications such as social network services and recommender systems. Temporal graph neural networks (T-GNNs) are powerful representation learning methods and have achieved remarkable effectiveness on continuous-time dynamic graphs. However, T-GNNs still suffer from high time complexity, which increases linearly with the number of timestamps and grows exponentially with the model depth, causing them not scalable to large dynamic graphs. To address the limitations, we propose Orca, a novel framework that accelerates T-GNN training by non-trivially caching and reusing intermediate embeddings. We design an optimal cache replacement algorithm, named MRU, under a practical cache limit. MRU not only improves the efficiency of training T-GNNs by maximizing the number of cache hits but also reduces the approximation errors by avoiding keeping and reusing extremely stale embeddings. Meanwhile, we develop profound theoretical analyses of the approximation error introduced by our reuse schemes and offer rigorous convergence guarantees. Extensive experiments have validated that Orca can obtain two orders of magnitude speedup over the state-of-the-art baselines while achieving higher precision on large dynamic graphs.

Supplemental Material

MP4 File
Presentation video for "Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees"

References

[1]
2023. AskUbuntu. http://snap.stanford.edu/data/sx-askubuntu.html.
[2]
2023. SuperUser. http://snap.stanford.edu/data/sx-superuser.html.
[3]
2023. The technical report. https://github.com/LuckyLYM/Orca/blob/main/technical_report.pdf.
[4]
2023. Wiki-talk. http://snap.stanford.edu/data/wiki-talk-temporal.html.
[5]
2023. Wikipedia edit history dump. https://meta.wikimedia.org/wiki/Data_dumps.
[6]
Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, and Kim M. Hazelwood. 2021. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale. In HPCA. IEEE, 802--814.
[7]
Susanne Albers, Sanjeev Arora, and Sanjeev Khanna. 1999. Page Replacement for General Caching Problems. In SIAM. ACM/SIAM, 31--40.
[8]
Raghu Arghal, Eric Lei, and Shirin Saeedi Bidokhti. 2021. Robust Graph Neural Networks via Probabilistic Lipschitz Constraints. CoRR abs/2112.07575 (2021).
[9]
Laszlo A. Belady. 1966. A Study of Replacement Algorithms for Virtual-Storage Computer. IBM Syst. J. 5, 2 (1966), 78--101.
[10]
Anant P. Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, and Aditya G. Parameswaran. 2015. DataHub: Collaborative Data Science & Dataset Version Management at Scale. In CIDR. www.cidrdb.org.
[11]
Souvik Bhattacherjee, Amit Chavan, Silu Huang, Amol Deshpande, and Aditya G. Parameswaran. 2015. Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff. PVLDB 8, 12 (2015), 1346--1357.
[12]
Matthias Boehm, Arun Kumar, and Jun Yang. 2019. Data Management in Machine Learning Systems. Morgan & Claypool Publishers.
[13]
Jianfei Chen, Jun Zhu, and Le Song. 2018. Stochastic Training of Graph Convolutional Networks with Variance Reduction. In ICML, Vol. 80. PMLR, 941--949.
[14]
Rada Chirkova and Jun Yang. 2012. Materialized Views. Foundations and Trends in Databases (TODS) 4, 4 (2012), 295--405.
[15]
Marek Chrobak and John Noga. 1998. LRU is Better than FIFO. In PODS. ACM/SIAM, 78--81.
[16]
Weilin Cong, Rana Forsati, Mahmut T. Kandemir, and Mehrdad Mahdavi. 2020. Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks. In SIGKDD. ACM, 1393--1403.
[17]
Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In NSDI. USENIX Association, 613--627.
[18]
Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Ziawasch Abedjan, Tilmann Rabl, and Volker Markl. 2020. Optimizing Machine Learning Workloads in Collaborative Environments. In SIGMOD. ACM, 1701--1716.
[19]
Matthias Fey, Jan Eric Lenssen, Frank Weichert, and Jure Leskovec. 2021. GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings. In ICML, Vol. 139. 3294--3304.
[20]
Arnaud Fréville. 2004. The multidimensional 0--1 knapsack problem: An overview. European Journal of Operational Research 155, 1 (2004), 1--21.
[21]
Fernando Gama, Joan Bruna, and Alejandro Ribeiro. 2020. Stability Properties of Graph Neural Networks. IEEE Trans. Signal Process. 68 (2020), 5680--5695.
[22]
Federico Girosi, Michael J. Jones, and Tomaso A. Poggio. 1995. Regularization Theory and Neural Networks Architectures. Neural Computation 7, 2 (1995), 219--269.
[23]
Palash Goyal, Sujit Rokka Chhetri, and Arquimedes Canedo. 2020. dyngraph2vec: Capturing network dynamics using dynamic graph representation learning. Knowledge Based System 187 (2020).
[24]
Palash Goyal, Sujit Rokka Chhetri, Ninareh Mehrabi, Emilio Ferrara, and Arquimedes Canedo. 2018. DynamicGEM: A Library for Dynamic Graph Embedding Methods. CoRR abs/1811.10734 (2018).
[25]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In SIGKDD. ACM, 855--864.
[26]
Ehsan Hajiramezanali, Arman Hasanzadeh, Krishna R. Narayanan, Nick Duffield, Mingyuan Zhou, and Xiaoning Qian. 2019. Variational Graph Recurrent Neural Networks. In NeurIPS. 10700--10710.
[27]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. 40, 3 (2017), 52--74.
[28]
William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS. 1024--1034.
[29]
Boris Hanin. 2018. Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?. In NeurIPS. 580--589.
[30]
Eric N. Hanson. 1987. A Performance Analysis of View Materialization Strategies. In SIGMOD. ACM Press, 440--453.
[31]
Kam-Chuen Jim, C. Lee Giles, and Bill G. Horne. 1996. An analysis of noise in recurrent neural networks: convergence and generalization. IEEE Transactions on Neural Networks 7, 6 (1996), 1424--1438.
[32]
Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In VLDB. Morgan Kaufmann, 439--450.
[33]
Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2017. NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale. PVLDB 10, 11 (2017), 1586--1597.
[34]
Thomas N. Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. CoRR abs/1611.07308 (2016).
[35]
Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. In SIGKDD. ACM, 1269--1278.
[36]
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, and Matteo Interlandi. 2018. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In OSDI, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 611--626.
[37]
Haoyang Li and Lei Chen. 2021. Cache-based GNN System for Dynamic Graphs. In CIKM. ACM, 937--946.
[38]
Edo Liberty, Zohar S. Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, David Arpin, Tim Januschowski, Valentin Flunkert, Yuyang Wang, Jan Gasthaus, Lorenzo Stella, Syama Sundar Rangapuram, David Salinas, Sebastian Schelter, and Alex Smola. 2020. Elastic Machine Learning Algorithms in Amazon SageMaker. In SIGMOD. ACM, 731--737.
[39]
Dongsheng Luo, Wei Cheng, Wenchao Yu, Bo Zong, Jingchao Ni, Haifeng Chen, and Xiang Zhang. 2021. Learning to Drop: Robust Graph Neural Network via Topological Denoising. In WSDM. ACM, 779--787.
[40]
Sedigheh Mahdavi, Shima Khoshraftar, and Aijun An. 2018. dynnode2vec: Scalable Dynamic Network Embedding. In BigData. IEEE, 3762--3765.
[41]
Imene Mami and Zohra Bellahsene. 2012. A survey of view selection methods. SIGMOD Record 41, 1 (2012), 20--29.
[42]
Hui Miao, Ang Li, Larry S. Davis, and Amol Deshpande. 2017. ModelHub: Deep Learning Lifecycle Management. In ICDE. IEEE Computer Society, 1393--1394.
[43]
Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao, and Bin Cui. 2021. HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework. PVLDB 15, 2 (2021), 312--320.
[44]
Supun Nakandala and Arun Kumar. 2020. Vista: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale. In SIGMOD. ACM, 1685--1700.
[45]
S. Deepak Narayanan, Aditya Sinha, Prateek Jain, Purushottam Kar, and Sundararajan Sellamanickam. 2021. IGLU: Efficient GCN Training via Lazy Updates. CoRR abs/2109.13995 (2021).
[46]
Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. 2020. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. In AAAI. 5363--5370.
[47]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In SIGKDD. ACM, 701--710.
[48]
Arnab Phani, Benjamin Rath, and Matthias Boehm. 2021. LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems. In SIGMOD. ACM, 1426--1439.
[49]
Hassan Ramchoun, Mohammed Amine Janati Idrissi, Youssef Ghanou, and Mohamed Ettaouil. 2016. Multilayer Perceptron: Architecture Optimization and Training. International Journal of Interactive Multimedia and Artificial Intelligence 4, 1 (2016), 26--30.
[50]
Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael M. Bronstein. 2020. Temporal Graph Networks for Deep Learning on Dynamic Graphs. CoRR abs/2006.10637 (2020).
[51]
Benjamin Van Roy. 2007. A short proof of optimality for the MIN cache replacement algorithm. Information Process. Letter 102, 2--3 (2007), 72--73.
[52]
Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016).
[53]
Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. 2020. DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention Networks. In WSDM. ACM, 519--527.
[54]
Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar, and David M. Pennock. 2002. Methods and metrics for cold-start recommendations. In SIGIR. ACM, 253--260.
[55]
Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. 2018. Structured Sequence Modeling with Graph Convolutional Recurrent Networks. In ICONIP, Vol. 11301. Springer, 362--373.
[56]
Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Ferdinand Kossmann, Philipp Eichmann, Yeounoh Chung, Carsten Binnig, Eli Upfal, and Tim Kraska. 2019. Democratizing Data Science through Interactive Curation of ML Pipelines. In SIGMOD. ACM, 1171--1188.
[57]
Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, and Benjamin Recht. 2017. KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics. In ICDE. IEEE Computer Society, 535--546.
[58]
John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. 2021. Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads. In OSDI. USENIX Association, 495--514.
[59]
Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. 2019. DyRep: Learning Representations over Dynamic Graphs. In ICLR.
[60]
Manasi Vartak, Joana M. F. da Trindade, Samuel Madden, and Matei Zaharia. 2018. MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis. In SIGMOD. ACM, 1285--1300.
[61]
Manasi Vartak and Samuel Madden. 2018. MODELDB: Opportunities and Challenges in Managing Machine Learning Models. IEEE Data Eng. Bull. 41, 4 (2018), 16--25.
[62]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
[63]
Aladin Virmaux and Kevin Scaman. 2018. Lipschitz regularity of deep neural networks: analysis and efficient estimation. In NeurIPS. 3839--3848.
[64]
Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin. 2022. PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication. CoRR abs/2203.10428 (2022).
[65]
Xuhong Wang, Ding Lyu, Mengjian Li, Yang Xia, Qi Yang, Xinwen Wang, Xinguang Wang, Ping Cui, Yupu Yang, Bowen Sun, and Zhenyu Guo. 2021. APAN: Asynchronous Propagation Attention Network for Real-time Temporal Graph Embedding. In SIGMOD. ACM, 2628--2638.
[66]
Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, and Pan Li. 2021. Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks. In ICLR.
[67]
Doris Xin, Litian Ma, Jialin Liu, Stephen Macke, Shuchen Song, and Aditya G. Parameswaran. 2018. Accelerating Human-in-the-loop Machine Learning: Challenges and Opportunities. In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, DEEM@SIGMOD. ACM, 9:1--9:4.
[68]
Doris Xin, Litian Ma, Jialin Liu, Stephen Macke, Shuchen Song, and Aditya G. Parameswaran. 2018. Helix: Accelerating Human-in-the-loop Machine Learning. PVLDB 11, 12 (2018), 1958--1961.
[69]
Doris Xin, Stephen Macke, Litian Ma, Jialin Liu, Shuchen Song, and Aditya G. Parameswaran. 2018. Helix: Holistic Optimization for Accelerating Iterative Machine Learning. PVLDB 12, 4 (2018), 446--460.
[70]
Da Xu, Chuanwei Ruan, Evren Körpeoglu, Sushant Kumar, and Kannan Achan. 2020. Inductive representation learning on temporal graphs. In ICLR.
[71]
Han Yang, Kaili Ma, and James Cheng. 2021. Rethinking Graph Regularization for Graph Neural Networks. In AAAI. AAAI Press, 4573--4581.
[72]
Wenchao Yu, Wei Cheng, Charu C. Aggarwal, Kai Zhang, Haifeng Chen, and Wei Wang. 2018. NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks. In SIGKDD. ACM, 2672--2681.
[73]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In ICLR.
[74]
Ce Zhang, Arun Kumar, and Christopher Ré. 2016. Materialization Optimizations for Feature Selection Workloads. ACM Transactions on Database Systems (TODS) 41, 1 (2016), 2:1--2:32.
[75]
Jingzhao Zhang, Tianxing He, Suvrit Sra, and Ali Jadbabaie. 2020. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity. In ICLR. OpenReview.net.

Cited By

View all
  • (2024)Efficient Training of Graph Neural Networks on Large GraphsProceedings of the VLDB Endowment10.14778/3685800.368584417:12(4237-4240)Online publication date: 1-Aug-2024
  • (2024)Fight Fire with Fire: Towards Robust Graph Neural Networks on Dynamic Graphs via Actively DefenseProceedings of the VLDB Endowment10.14778/3659437.365945717:8(2050-2063)Online publication date: 31-May-2024
  • (2024)ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic GraphsProceedings of the VLDB Endowment10.14778/3641204.364121517:5(1060-1072)Online publication date: 2-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. cache replacement
  2. temporal graph neural networks

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong ITC ITF
  • Hong Kong RGC AOE Project
  • Hong Kong RGC GRF Project
  • National Key Research and Development Program of China
  • Shanghai Municipal Science and Technology Major Project
  • National Science Foundation of China
  • Hong Kong RGC CRF Project
  • Guangdong Basic and Applied Basic Research Foundation
  • SJTU Global Strategic Partnership Fund
  • Hong Kong RGC Theme-based project
  • China NSFC
  • Microsoft Research Asia Collaborative Research Grant
  • HKUST-Webank joint research lab grant
  • HKUST Global Strategic Partnership Fund

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)222
  • Downloads (Last 6 weeks)20
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Training of Graph Neural Networks on Large GraphsProceedings of the VLDB Endowment10.14778/3685800.368584417:12(4237-4240)Online publication date: 1-Aug-2024
  • (2024)Fight Fire with Fire: Towards Robust Graph Neural Networks on Dynamic Graphs via Actively DefenseProceedings of the VLDB Endowment10.14778/3659437.365945717:8(2050-2063)Online publication date: 31-May-2024
  • (2024)ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic GraphsProceedings of the VLDB Endowment10.14778/3641204.364121517:5(1060-1072)Online publication date: 2-May-2024
  • (2024)SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data PlacementProceedings of the ACM on Management of Data10.1145/36549772:3(1-25)Online publication date: 30-May-2024
  • (2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
  • (2024)Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing ApproachProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671848(1701-1712)Online publication date: 25-Aug-2024
  • (2024)An Empirical Study on Noisy Label Learning for Program UnderstandingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639217(1-12)Online publication date: 20-May-2024
  • (2024)TimeSGN: Scalable and Effective Temporal Graph Neural Network2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00255(3297-3310)Online publication date: 13-May-2024
  • (2024)Incorporating Dynamic Temperature Estimation into Contrastive Learning on Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00224(2889-2903)Online publication date: 13-May-2024
  • (2024)WavingSketch: an unbiased and generic sketch for finding top-k items in data streamsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00869-633:5(1697-1722)Online publication date: 29-Jul-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media