Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Influential Community Search over Large Heterogeneous Information Networks

Published: 01 April 2023 Publication History

Abstract

Recently, the topic of influential community search has gained much attention. Given a graph, it aims to find communities of vertices with high importance values from it. Existing works mainly focus on conventional homogeneous networks, where vertices are of the same type. Thus, they cannot be applied to heterogeneous information networks (HINs) like bibliographic networks and knowledge graphs, where vertices are of multiple types and their importance values are of heterogeneity (i.e., for vertices of different types, their importance meanings are also different). In this paper, we study the problem of influential community search over large HINs. We introduce a novel community model, called heterogeneous influential community (HIC), or a set of closely connected vertices that are of the same type and high importance values, using the meta-path-based core model. An HIC not only captures the importance of vertices in a community, but also considers the influence on meta-paths connecting them. To search the HICs, we mainly consider meta-paths with two and three vertex types. Then, we develop basic algorithms by iteratively peeling vertices with low importance values, and further propose advanced algorithms by identifying the key vertices and designing pruning strategies that allow us to quickly eliminate vertices with low importance values. Extensive experiments on four real large HINs show that our solutions are effective for searching HICs, and the advanced algorithms significantly outperform baselines.

References

[1]
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
[2]
Fei Bi, Lijun Chang, Xuemin Lin, and Wenjie Zhang. 2018. An Optimal and Progressive Approach to Online Search of Top-K Influential Communities. Proc. VLDB Endow. 11, 9 (2018), 1056--1068.
[3]
Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized core decomposition. In Proceedings of the 2019 International Conference on Management of Data. 1006--1023.
[4]
S. Borzsony, D. Kossmann, and K. Stocker. 2001. The Skyline operator. In Proceedings 17th International Conference on Data Engineering. 421--430.
[5]
Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Wenjie Zhang. 2015. Index-based optimal algorithms for computing steiner components with maximum connectivity. In SIGMOD. 459--474.
[6]
Lu Chen, Yunjun Gao, Yuanliang Zhang, Christian S Jensen, and Bolong Zheng. 2019. Efficient and incremental clustering algorithms on star-schema heterogeneous graphs. In ICDE. IEEE, 256--267.
[7]
Lu Chen, Chengfei Liu, Rui Zhou, Jianxin Li, Xiaochun Yang, and Bin Wang. 2018. Maximum co-located community search in large scale social networks. Proceedings of the VLDB Endowment 11, 10 (2018), 1233--1246.
[8]
Shu Chen, Ran Wei, Diana Popova, and Alex Thomo. 2016. Efficient computation of importance based communities in web-scale networks using a single machine. In CIKM. 1553--1562.
[9]
Jonathan Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National security agency technical report 16, 3.1 (2008).
[10]
Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, and Wei Wang. 2013. Online search of overlapping communities. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data. 277--288.
[11]
Wanyun Cui, Yanghua Xiao, Haixun Wang, and Wei Wang. 2014. Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 991--1002.
[12]
Zheng Dong, Xin Huang, Guorui Yuan, Hengshu Zhu, and Hui Xiong. 2021. Butterfly-core community search over labeled graphs. Proceedings of the VLDB Endowment (2021).
[13]
Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020. A survey of community search over big graphs. The VLDB Journal 29, 1 (2020), 353--392.
[14]
Yixiang Fang, Kai Wang, Xuemin Lin, and Wenjie Zhang. 2021. Cohesive subgraph search over big heterogeneous information networks: Applications, challenges, and solutions. In SIGMOD. 2829--2838.
[15]
Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. Proceedings of the VLDB Endowment 13, 6 (2020), 854--867.
[16]
Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3--5 (2010), 75--174.
[17]
Edoardo Galimberti, Francesco Bonchi, and Francesco Gullo. 2017. Core decomposition and densest subgraph in multilayer networks. In CIKM. 1807--1816.
[18]
Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, and Yixiang Fang. 2017. On minimal steiner maximum-connected subgraph queries. IEEE Transactions on Knowledge and Data Engineering 29, 11 (2017), 2455--2469.
[19]
Chengji Huang, Yixiang Fang, Xuemin Lin, Xin Cao, Wenjie Zhang, and Maria Orlowska. 2022. Estimating Node Importance Values in Heterogeneous Information Networks. In ICDE. 846--858.
[20]
Han Huang, Leilei Sun, Bowen Du, Chuanren Liu, Weifeng Lv, and Hui Xiong. 2021. Representation Learning on Knowledge Graphs for Node Importance Estimation. In KDD. 646--655.
[21]
Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1311--1322.
[22]
Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949--960.
[23]
Xin Huang, Laks VS Lakshmanan, Jeffrey Xu Yu, and Hong Cheng. 2015. Approximate Closest Community Search in Networks. Proceedings of the VLDB Endowment 9, 4 (2015).
[24]
Xun Jian, Yue Wang, and Lei Chen. 2020. Effective and efficient relational community detection and search in large dynamic heterogeneous information networks. Proceedings of the VLDB Endowment 13, 10 (2020), 1723--1736.
[25]
Yangqin Jiang, Yixiang Fang, Chenhao Ma, Xin Cao, and ChunshanLi. 2022. Effective Community Search over Large Star-Schema Heterogeneous Information Networks. Proceedings of the VLDB Endowment 15, 11 (2022), xxx--xxx.
[26]
Rong-Hua Li, Lu Qin, Fanghua Ye, Jeffrey Xu Yu, Xiaokui Xiao, Nong Xiao, and Zibin Zheng. 2018. Skyline Community Search in Multi-valued Networks. In SIGMOD. ACM, 457--472.
[27]
Rong-Hua Li, Lu Qin, Jeffrey Xu Yu, and Rui Mao. 2015. Influential Community Search in Large Networks. Proc. VLDB Endow. 8, 5 (2015), 509--520.
[28]
Rong-Hua Li, Lu Qin, Fanghua Ye, Guoren Wang, Jeffrey Xu Yu, Xiaokui Xiao, Nong Xiao, and Zibin Zheng. 2020. Finding skyline communities in multi-valued networks. The VLDB Journal 29, 6 (2020), 1407--1432.
[29]
Rong-Hua Li, Lu Qin, Jeffrey Xu Yu, and Rui Mao. 2017. Finding influential communities in massive networks. The VLDB Journal 26, 6 (2017), 751--776.
[30]
Boge Liu, Fan Zhang, Wenjie Zhang, Xuemin Lin, and Ying Zhang. 2021. Efficient community search with size constraint. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 97--108.
[31]
Wensheng Luo, Xu Zhou, Jianye Yang, Peng Peng, Guoqing Xiao, and Yunjun Gao. 2020. Efficient approaches to top-r influential community search. IEEE Internet of Things Journal 8, 16 (2020), 12650--12657.
[32]
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery. In SIGMOD.
[33]
Yu-Liang Ma, Ye Yuan, Fei-Da Zhu, Guo-Ren Wang, Jing Xiao, and Jian-Zong Wang. 2019. Who should be invited to my party: A size-constrained k-core problem in social networks. Journal of Computer Science and Technology 34 (2019), 170--184.
[34]
Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.
[35]
Webometrics Ranking of World Universities. 2022. Highly Cited Researchers (h > 100) according to their Google Scholar Citations public profiles. https://www.webometrics.info/en/hlargerthan100.
[36]
Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. 2019. Estimating node importance in knowledge graphs using graph neural networks. In KDD. 596--606.
[37]
Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. 2020. Multiimport: Inferring node importance in a knowledge graph from multiple input signals. In KDD. 503--512.
[38]
You Peng, Song Bian, Rui Li, Sibo Wang, and Jeffrey Xu Yu. 2022. Finding Top-r Influential Communities under Aggregation Functions. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1941--1954.
[39]
Lianpeng Qiao, Zhiwei Zhang, Ye Yuan, Chen Chen, and Guoren Wang. 2021. Keyword-centric community search over large heterogeneous information networks. In Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11--14, 2021, Proceedings, Part I. Springer, 158--173.
[40]
Chuan Shi, Xiangnan Kong, Philip S Yu, Sihong Xie, and Bin Wu. 2012. Relevance search in heterogeneous networks. In EDBT. 180--191.
[41]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2016. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2016), 17--37.
[42]
Chuan Shi, Ran Wang, Yitong Li, Philip S Yu, and Bin Wu. 2014. Ranking-based clustering on general heterogeneous information networks by network projection. In CIKM. 699--708.
[43]
Chuan Shi, Chong Zhou, Xiangnan Kong, Philip S Yu, Gang Liu, and Bai Wang. 2012. Heterecom: a semantic-based recommendation system in heterogeneous networks. In KDD. 1552--1555.
[44]
Mauro Sozio and Aristides Gionis. 2010. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 939--948.
[45]
Yizhou Sun, Charu C Aggarwal, and Jiawei Han. 2012. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. arXiv preprint arXiv:1201.6563 (2012).
[46]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Path-Sim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks. Proc. VLDB Endow. 4, 11 (2011), 992--1003.
[47]
Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In EDBT. 565--576.
[48]
Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 3 (2013), 1--23.
[49]
Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD. 797--806.
[50]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet-Miner: Extraction and Mining of Academic Social Networks. In KDD'08. 990--998.
[51]
Ruby W Wang and Y Ye Fred. 2019. Simplifying Weighted Heterogeneous networks by extracting h-Structure via s-Degree. Scientific reports 9, 1 (2019), 1--8.
[52]
Wikipedia. 2020. Citation impact. https://en.wikipedia.org/wiki/Citation_impact.
[53]
Yanping Wu, Jun Zhao, Renjie Sun, Chen Chen, and Xiaoyang Wang. 2021. Efficient personalized influential community search in large networks. Data Science and Engineering 6, 3 (2021), 310--322.
[54]
Xiaoliang Xu, Jun Liu, Yuxiang Wang, and Xiangyu Ke. 2022. Academic Expert Finding via (K, P)-Core based Embedding over Heterogeneous Graphs. In ICDE.
[55]
Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and efficient truss computation over large heterogeneous information networks. In ICDE. IEEE, 901--912.
[56]
Kai Yao and Lijun Chang. 2021. Efficient size-bounded community search over large networks. Proceedings of the VLDB Endowment 14, 8 (2021), 1441--1453.
[57]
Xiao Yu, Xiang Ren, Yizhou Sun, Bradley Sturt, Urvashi Khandelwal, Quanquan Gu, Brandon Norick, and Jiawei Han. 2013. Recommendation in heterogeneous information networks with implicit user feedback. In Proceedings of the 7th ACM conference on Recommender systems. 347--350.
[58]
Long Yuan, Lu Qin, Wenjie Zhang, Lijun Chang, and Jianye Yang. 2017. Index-based densest clique percolation community search in networks. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2017), 922--935.
[59]
Yikai Zhang and Jeffrey Xu Yu. 2019. Unboundedness and efficiency of truss maintenance in evolving graphs. In SIGMOD. 1024--1041.
[60]
Alexander Zhou, Yue Wang, and Lei Chen. 2020. Finding large diverse communities on networks: the edge maximum k*-partite clique. Proceedings of the VLDB Endowment 13, 12 (2020), 2576--2589.
[61]
Yingli Zhou, Yixiang Fang, Wensheng Luo, and Yunming Ye. 2022. Influential Community Search over Large Heterogeneous Information Networks (technical report). https://drive.google.com/file/d/1lUpNkra8mR5natRhtmUEhwlJf732W9x/view?usp=share_link.
[62]
Yang Zhou and Ling Liu. 2013. Social influence based clustering of heterogeneous information networks. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 338--346.

Cited By

View all
  • (2025)Cohesiveness-aware Hierarchical Compressed Index for Community Search on Attributed GraphsProceedings of the ACM on Management of Data10.1145/37096723:1(1-27)Online publication date: 11-Feb-2025
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 8
April 2023
257 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 April 2023
Published in PVLDB Volume 16, Issue 8

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)115
  • Downloads (Last 6 weeks)5
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Cohesiveness-aware Hierarchical Compressed Index for Community Search on Attributed GraphsProceedings of the ACM on Management of Data10.1145/37096723:1(1-27)Online publication date: 11-Feb-2025
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
  • (2024)FCS-HGNN: Flexible Multi-type Community Search in Heterogeneous Information NetworksProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679696(207-217)Online publication date: 21-Oct-2024
  • (2024)A Unified and Scalable Algorithm Framework of User-Defined Temporal (k,X)-Core QueryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334931036:7(2831-2845)Online publication date: Jul-2024
  • (2024)Co-Engaged Location Group Search in Location-Based Social NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332740536:7(2910-2926)Online publication date: Jul-2024
  • (2024)Top-$L$ Most Influential Community Detection Over Social Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.10639540(5767-5779)Online publication date: 13-May-2024
  • (2024)SACH: Significant-Attributed Community Search in Heterogeneous Information Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00254(3283-3296)Online publication date: 13-May-2024
  • (2024)Efficient Cross-layer Community Search in Large Multilayer Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00230(2959-2971)Online publication date: 13-May-2024
  • (2024)Scalable Community Search with Accuracy Guarantee on Attributed Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00214(2737-2750)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media