Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning via Importance Sampling

Published: 07 December 2020 Publication History

Abstract

In real-world problems, heterogeneous entities are often related to each other through multiple interactions, forming a Heterogeneous Interaction Graph (HIG). While modeling HIGs to deal with fundamental tasks, graph neural networks present an attractive opportunity that can make full use of the heterogeneity and rich semantic information by aggregating and propagating information from different types of neighborhoods. However, learning on such complex graphs, often with millions or billions of nodes, edges, and various attributes, could suffer from expensive time cost and high memory consumption. In this article, we attempt to accelerate representation learning on large-scale HIGs by adopting the importance sampling of heterogeneous neighborhoods in a batch-wise manner, which naturally fits with most batch-based optimizations. Distinct from traditional homogeneous strategies neglecting semantic types of nodes and edges, to handle the rich heterogeneous semantics within HIGs, we devise both type-dependent and type-fusion samplers where the former respectively samples neighborhoods of each type and the latter jointly samples from candidates of all types. Furthermore, to overcome the imbalance between the down-sampled and the original information, we respectively propose heterogeneous estimators including the self-normalized and the adaptive estimators to improve the robustness of our sampling strategies.
Finally, we evaluate the performance of our models for node classification and link prediction on five real-world datasets, respectively. The empirical results demonstrate that our approach performs significantly better than other state-of-the-art alternatives, and is able to reduce the number of edges in computation by up to 93%, the memory cost by up to 92% and the time cost by up to 86%.

References

[1]
Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. 2015. Variance reduction in SGD by distributed importance sampling. arXiv preprint arXiv:1511.06481 (2015).
[2]
Sezin Kircali Ata, Yuan Fang, Min Wu, Xiao-Li Li, and Xiaokui Xiao. 2017. Disease gene classification with metagraph representations. Methods 131 (2017), 83--92.
[3]
HongYun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616--1637.
[4]
Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining (KDD'19), Anchorage, AK, USA, August 4-8, 2019. ACM, 1358--1368.
[5]
Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast learning with graph convolutional networks via importance sampling. In Proceedings of the 6th International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rytstxWAW.
[6]
Long Chen, Fajie Yuan, Joemon M. Jose, and Weinan Zhang. 2018. Improving negative sampling for word representation using self-embedded features. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 99--107.
[7]
Dominik Csiba and Peter Richtárik. 2018. Importance sampling for minibatches. Journal of Machine Learning Research 19 (2018), 27:1--27:21. Retrieved from http://jmlr.org/papers/v19/16-241.html.
[8]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the Advances in Neural Information Processing Systems. 3837--3845. Retrieved from http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.
[9]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 135--144.
[10]
Víctor Elvira, Luca Martino, David Luengo, and Mónica F. Bugallo. 2015. Efficient multiple importance sampling estimators. IEEE Signal Processing Letters 22, 10 (2015), 1757--1761.
[11]
Yuan Fang, Wenqing Lin, Vincent W. Zheng, Min Wu, Kevin Chen-Chuan Chang, and Xiao-Li Li. 2016. Semantic proximity search on graphs with metagraph-based learning. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE’16). IEEE, 277--288.
[12]
Tao-Yang Fu, Wang-Chien Lee, and Zhen Lei. 2017. HIN2Vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1797--1806.
[13]
Alberto García-Durán and Mathias Niepert. 2017. Learning graph representations with embedding propagation. In Proceedings of the Advances in Neural Information Processing Systems. 5125--5136. Retrieved from http://papers.nips.cc/paper/7097-learning-graph-representations-with-embedding-propagation.
[14]
Marco Gori, Gabriele Monfardini, and Franco Scarselli. 2005. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks. Vol. 2. IEEE, 729--734.
[15]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 855--864.
[16]
William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems. 1025--1035. Retrieved from http://papers.nips.cc/paper/6703-inductive-representation-learning-on-large-graphs.
[17]
Carsten Hartmann, Christof Schütte, Marcus Weber, and Wei Zhang. 2018. Importance sampling in path space for diffusion processes with slow-fast variables. Probability Theory and Related Fields 170, 1--2 (2018), 177--228.
[18]
Dimitar Hristovski, Andrej Kastrin, and Thomas C. Rindflesch. 2016. Implementing semantics-based cross-domain collaboration recommendation in biomedicine with a graph database. In Proceedings of the American Medical Informatics Association Annual Symposium. Retrieved from http://knowledge.amia.org/amia-63300-1.3360278/t005-1.3362920/f005-1.3362921/2499243-1.3363975/2499720-1.3363972.
[19]
Wen-bing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. 2018. Adaptive sampling towards fast graph representation learning. In Proceedings of the Advances in Neural Information Processing Systems. 4563--4572. Retrieved from http://papers.nips.cc/paper/7707-adaptive-sampling-towards-fast-graph-representation-learning.
[20]
Xiao Huang, Jundong Li, Na Zou, and Xia Hu. 2018. A general embedding framework for heterogeneous information learning in large-scale networks. ACM Transactions on Knowledge Discovery from Data 12, 6 (2018), 70:1--70:24.
[21]
Zan Huang, Wingyan Chung, and Hsinchun Chen. 2004. A graph model for E-commerce recommender systems. Journal of the Association for Information Science and Technology 55, 3 (2004), 259--274.
[22]
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1595--1604.
[23]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=SJU4ayYgl.
[24]
John Boaz Lee, Ryan A. Rossi, Sungchul Kim, Nesreen K. Ahmed, and Eunyee Koh. 2019. Attention models in graphs: A survey. ACM Transactions on Knowledge Discovery from Data 13, 6 (2019), 62.
[25]
Luca Martino, Víctor Elvira, and Francisco Louzada. 2017. Effective sample size for importance sampling based on discrepancy measures. Signal Processing 131 (2017), 386--401.
[26]
Art B. Owen. 2013. Monte Carlo Theory, Methods and Examples.
[27]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 701--710.
[28]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2008), 61--80.
[29]
Jingbo Shang, Meng Qu, Jialu Liu, Lance M. Kaplan, Jiawei Han, and Jian Peng. 2016. Meta-path guided embedding for similarity search in large-scale heterogeneous information networks. CoRR abs/1610.09769 (2016). arxiv:1610.09769 http://arxiv.org/abs/1610.09769
[30]
Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering 31, 2 (2019), 357--370.
[31]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S. Yu Philip. 2016. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2016), 17--37.
[32]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.
[33]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. 1067--1077.
[34]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2017. Graph attention networks. CoRR abs/1710.10903 (2017). arxiv:1710.10903 http://arxiv.org/abs/1710.10903
[35]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In Proceedings of the 28th International Conference on World Wide Web (WWW'19), San Francisco, CA, USA, May 13-17, 2019. ACM, 2022--2032.
[36]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019).
[37]
Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th International Conference on World Wide Web. 1271--1279.
[38]
Carl Yang, Jieyu Zhang, and Jiawei Han. 2019. Neural embedding propagation on heterogeneous networks. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM'19), Beijing, China, November 8-11, 2019. IEEE, 698--707.
[39]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. 974--983.
[40]
Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. ACM, 793--803.
[41]
Zi-Ke Zhang, Tao Zhou, and Yi-Cheng Zhang. 2010. Personalized recommendation via integrated diffusion on user--item--tag tripartite graphs. Physica A: Statistical Mechanics and its Applications 389, 1 (2010), 179--186.
[42]
Vincent W. Zheng, Mo Sha, Yuchen Li, Hongxia Yang, Yuan Fang, Zhenjie Zhang, Kian-Lee Tan, and Kevin Chen-Chuan Chang. 2018. Heterogeneous embedding propagation for large-scale e-commerce user alignment. In Proceedings of the IEEE International Conference on Data Mining. 1434--1439.
[43]
Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, and Quanquan Gu. 2019. Layer-dependent importance sampling for training deep and large graph convolutional networks. In Proceedings of the Advances in Neural Information Processing Systems. 11247--11256.

Cited By

View all
  • (2023)TDAN: Transferable Domain Adversarial Network for Link Prediction in Heterogeneous Social NetworksACM Transactions on Knowledge Discovery from Data10.1145/361022918:1(1-22)Online publication date: 6-Sep-2023
  • (2023)Dynamic Meta-path Guided Temporal Heterogeneous Graph Neural NetworksWorld Scientific Annual Review of Artificial Intelligence10.1142/S2811032323500029Online publication date: 10-Nov-2023
  • (2023)GIPA: A General Information Propagation Algorithm for Graph LearningDatabase Systems for Advanced Applications10.1007/978-3-031-30678-5_34(465-476)Online publication date: 17-Apr-2023
  • Show More Cited By

Index Terms

  1. Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning via Importance Sampling

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 1
        February 2021
        361 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3441647
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 December 2020
        Accepted: 01 August 2020
        Revised: 01 May 2020
        Received: 01 December 2019
        Published in TKDD Volume 15, Issue 1

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Heterogeneous interaction graphs
        2. importance sampling
        3. large-scale graphs
        4. type-dependent sampler
        5. type-fusion sampler

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Alibaba Innovative Research (AIR) programme
        • Campus for Research Excellence and Technological Enterprise (CREATE) programme, National Research Foundation, Prime Minister?s Office, Singapore
        • National Natural Science Foundation of China
        • National Key Research and Development Program of China

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)23
        • Downloads (Last 6 weeks)4
        Reflects downloads up to 01 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)TDAN: Transferable Domain Adversarial Network for Link Prediction in Heterogeneous Social NetworksACM Transactions on Knowledge Discovery from Data10.1145/361022918:1(1-22)Online publication date: 6-Sep-2023
        • (2023)Dynamic Meta-path Guided Temporal Heterogeneous Graph Neural NetworksWorld Scientific Annual Review of Artificial Intelligence10.1142/S2811032323500029Online publication date: 10-Nov-2023
        • (2023)GIPA: A General Information Propagation Algorithm for Graph LearningDatabase Systems for Advanced Applications10.1007/978-3-031-30678-5_34(465-476)Online publication date: 17-Apr-2023
        • (2022)Characterizing and Forecasting Urban Vibrancy Evolution: A Multi-View Graph Mining PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/356868317:5(1-24)Online publication date: 30-Nov-2022
        • (2022)Contact Tracing and Epidemic Intervention via Deep Reinforcement LearningACM Transactions on Knowledge Discovery from Data10.1145/354687017:3(1-24)Online publication date: 22-Aug-2022
        • (2022)A survey on heterogeneous information network based recommender systems: Concepts, methods, applications and resourcesAI Open10.1016/j.aiopen.2022.03.0023(40-57)Online publication date: 2022
        • (2021)Emerging Topics of Heterogeneous Graph RepresentationHeterogeneous Graph Representation Learning and Applications10.1007/978-981-16-6166-2_6(145-172)Online publication date: 5-Nov-2021
        • (2021)Dynamic Heterogeneous Graph Embedding via Heterogeneous Hawkes ProcessMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-030-86486-6_24(388-403)Online publication date: 13-Sep-2021

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media