Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Prioritized Relationship Analysis in Heterogeneous Information Networks

Published: 23 January 2018 Publication History

Abstract

An increasing number of applications are modeled and analyzed in network form, where nodes represent entities of interest and edges represent interactions or relationships between entities. Commonly, such relationship analysis tools assume homogeneity in both node type and edge type. Recent research has sought to redress the assumption of homogeneity and focused on mining heterogeneous information networks (HINs) where both nodes and edges can be of different types. Building on such efforts, in this work, we articulate a novel approach for mining relationships across entities in such networks while accounting for user preference over relationship type and interestingness metric. We formalize the problem as a top-k lightest paths problem, contextualized in a real-world communication network, and seek to find the k most interesting path instances matching the preferred relationship type. Our solution, PROphetic HEuristic Algorithm for Path Searching (PRO-HEAPS), leverages a combination of novel graph preprocessing techniques, well-designed heuristics and the venerable A* search algorithm. We run our algorithm on real-world large-scale graphs and show that our algorithm significantly outperforms a wide variety of baseline approaches with speedups as large as 100X.
To widen the range of applications, we also extend PRO-HEAPS to (i) support relationship analysis between two groups of entities and (ii) allow pattern path in the query to contain logical statements with operators AND, OR, NOT, and wild-card “.”. We run experiments using this generalized version of PRO-HEAPS and demonstrate that the advantage of PRO-HEAPS becomes even more pronounced for these general cases. Furthermore, we conduct a comprehensive analysis to study how the performance of PRO-HEAPS varies with respect to various attributes of the input HIN. We finally conduct a case study to demonstrate valuable applications of our algorithm.

References

[1]
Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 349--360.
[2]
Boanerges Aleman-Meza, Christian Halaschek-Wiener, Satya Sanket Sahoo, Amit Sheth, and I. Budak Arpinar. 2005. Template based semantic similarity for security applications. In Proceedings of the Intelligence and Security Informatics. Springer, 621--622.
[3]
Noga Alon, Raphael Yuster, and Uri Zwick. 1995. Color-coding. Journal of the ACM 42, 4 (1995), 844--856.
[4]
Yiyuan Bai, Chaokun Wang, Xiang Ying, Meng Wang, and Yunqing Gong. 2014. Path pattern query processing on large graphs. In Proceedings of the IEEE 4th International Conference on Big Data and Cloud Computing (BdCloud). IEEE.
[5]
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and Shashank Sudarshan. 2002. Keyword searching and browsing in databases using BANKS. In ICDE. IEEE.
[6]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022.
[7]
Thayne Coffman, Seth Greenblatt, and Sherry Marcus. 2004. Graph-based technologies for intelligence analysis. Communications of the ACM 47, 3 (2004), 45--47.
[8]
Diane J. Cook and Lawrence B. Holder. 1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1994), 231--255.
[9]
Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy. 2010. A sketch-based distance oracle for web-scale graphs. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 401--410.
[10]
Christos Faloutsos, Kevin S. McCurley, and Andrew Tomkins. 2004. Fast discovery of connection subgraphs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 118--127.
[11]
Lujun Fang, Anish Das Sarma, Cong Yu, and Philip Bohannon. 2011. Rex: Explaining relationships between entity pairs. Proceedings of the VLDB Endowment 5, 3 (2011), 241--252.
[12]
Michael R. Garey and David S. Johnson. 2002. Computers and Intractability, Vol. 29. WH Freeman.
[13]
Rosalba Giugno and Dennis Shasha. 2002. Graphgrep: A fast and universal method for querying graphs. In Proceedings of 16th International Conference on Pattern Recognition, Vol. 2. IEEE, 112--115.
[14]
Eleni Hadjiconstantinou and Nicos Christofides. 1999. An efficient implementation of an algorithm for finding k shortest simple paths. Networks 34.2 (1999), 88--101.
[15]
John Hershberger, Matthew Maxel, and Subhash Suri. 2007. Finding the k shortest simple paths: A new algorithm and its implementation. ACM Transactions on Algorithms 3, 4 (2007), 45.
[16]
Petter Holme and Beom Jun Kim. 2002. Growing scale-free networks with tunable clustering. Physical Review E 65, 2 (2002), 026107.
[17]
Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. 2005. Bidirectional expansion for keyword search on graph databases. Proceedings of the VLDB Endowment (2005), 505--516.
[18]
Naoki Katoh, Ibaraki Toshihide, and Mine Hisashi. 1982. An efficient algorithm for k shortest simple paths. Networks 12, 4 (1982), 411--427.
[19]
Arijit Khan, Nan Li, Xifeng Yan, Ziyu Guan, Supriyo Chakraborty, and Shu Tao. 2011. Neighborhood based fast graph search in large networks. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 901--912.
[20]
Arijit Khan, Yinghui Wu, Charu C. Aggarwal, and Xifeng Yan. 2013. Nema: Fast graph search with label similarity. Proceedings of the VLDB Endowment 6, 3 (2013), 181--192.
[21]
Ni Lao and William W. Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine Learning 81, 1 (2010), 53--67.
[22]
Jiongqian Liang, Deepak Ajwani, Patrick K. Nicholson, Alessandra Sala, and Srinivasan Parthasarathy. 2016. What links alice and bob? matching and ranking semantic patterns in heterogeneous networks. In Proceedings of the 25th International Conference on World Wide Web. ACM, 879--889.
[23]
Jiongqian Liang, Peter Jacobs, Jiankai Sun, and Srinivasan Parthasarathy. 2018. SEANO: Semi-supervised embedding in attributed networks with outliers. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM.
[24]
Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. 2015. Discovering meta-paths in large heterogeneous information networks. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 754--764.
[25]
Judea Pearl. 1984. Heuristics: Intelligent search strategies for computer problem solving. Addison-Wesley (1984).
[26]
Stuart Russell and Peter Norvig. 1995. Artificial Intelligence: A modern approach. Pearson Education 25 (1995), 97--104.
[27]
Jacob Scott, Trey Ideker, Richard M. Karp, and Roded Sharan. 2006. Efficient algorithms for detecting signaling pathways in protein interaction networks. Journal of Computational Biology 13, 2 (2006), 133--144.
[28]
Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the ACM SIGMOD Symposium on Principles of Database Systems. ACM, 39--52.
[29]
Chuan Shi, Xiangnan Kong, Yue Huang, S. Yu Philip, and Bin Wu. 2014. Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering 26, 10 (2014), 2479--2492.
[30]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S. Yu Philip. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2017), 17--37.
[31]
Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S. Yu, Yading Yue, and Bin Wu. 2015. Semantic path based personalized recommendation on weighted heterogeneous information networks. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 453--462.
[32]
Yu-Keng Shih and Srinivasan Parthasarathy. 2012. A single source k-shortest paths algorithm to infer regulatory pathways in a gene network. Bioinformatics 28, 12 (2012), i49--i58.
[33]
Christian Sommer. 2014. Shortest-path queries in static networks. ACM Computing Surveys 46, 4 (2014), 45.
[34]
Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter 14, 2 (2013), 20--28.
[35]
Yizhou Sun, Jiawei Han, Charu C. Aggarwal, and Nitesh V. Chawla. 2012. When will it happen? Relationship prediction in heterogeneous information networks. In Proceedings of the 5th ACM international conference on Web search and data mining. ACM, 663--672.
[36]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.
[37]
Hanghang Tong and Christos Faloutsos. 2006. Center-piece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 404--413.
[38]
Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31--42.
[39]
Alexander Ullrich and Christian V. Forst. 2009. k-PathA: K-shortest path algorithm. In Proceedings of the IEEE International Workshop on High Performance Computational Systems Biology.
[40]
Michael Wolverton, Pauline Berry, Ian W. Harrison, John D. Lowrance, David N. Morley, Andres C. Rodriguez, Enrique H. Ruspini, and Jerome Thomere. 2003. LAW: A workbench for approximate pattern matching in relational data. In Proceedings of the 5th Innovative Applications of Artificial Intelligence Conference, Vol. 3. 143--150.
[41]
Jin Y. Yen. 1971. Finding the shortest loopless paths in a network. Management Science 17, 11 (1971), 712--716.

Cited By

View all
  • (2024)Innovative Application of Heterogeneous Information Network Embedding Technology in Recommender SystemsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-17249:1Online publication date: 5-Jul-2024
  • (2022)An improved $$A^*$$ search algorithm for the shortest path under interval-valued Pythagorean fuzzy environmentGranular Computing10.1007/s41066-022-00326-18:2(241-251)Online publication date: 18-May-2022
  • (2019)Improvement of PRO-HEAPS Algorithm to Analyze Interaction Changes in Heterogeneous Graph2019 International Conference on Data and Software Engineering (ICoDSE)10.1109/ICoDSE48700.2019.9092735(1-6)Online publication date: Dec-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 3
June 2018
360 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3178546
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2018
Accepted: 01 October 2017
Revised: 01 September 2017
Received: 01 January 2017
Published in TKDD Volume 12, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Heterogeneous information networks
  2. graph algorithms
  3. semantic relationship queries

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)13
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Innovative Application of Heterogeneous Information Network Embedding Technology in Recommender SystemsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-17249:1Online publication date: 5-Jul-2024
  • (2022)An improved $$A^*$$ search algorithm for the shortest path under interval-valued Pythagorean fuzzy environmentGranular Computing10.1007/s41066-022-00326-18:2(241-251)Online publication date: 18-May-2022
  • (2019)Improvement of PRO-HEAPS Algorithm to Analyze Interaction Changes in Heterogeneous Graph2019 International Conference on Data and Software Engineering (ICoDSE)10.1109/ICoDSE48700.2019.9092735(1-6)Online publication date: Dec-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media