Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Effective and efficient community search over large heterogeneous information networks

Published: 01 February 2020 Publication History

Abstract

Recently, the topic of community search (CS) has gained plenty of attention. Given a query vertex, CS looks for a dense subgraph that contains it. Existing studies mainly focus on homogeneous graphs in which vertices are of the same type, and cannot be directly applied to heterogeneous information networks (HINs) that consist of multi-typed, interconnected objects, such as the bibliographic networks and knowledge graphs. In this paper, we study the problem of community search over large HINs; that is, given a query vertex q, find a community from an HIN containing q, in which all the vertices are with the same type of q and have close relationships.
To model the relationship between two vertices of the same type, we adopt the well-known concept of meta-path, which is a sequence of relations defined between different types of vertices. We then measure the cohesiveness of the community by extending the classic minimum degree metric with a meta-path. We further propose efficient query algorithms for finding communities using these cohesiveness metrics. We have performed extensive experiments on five real large HINs, and the results show that the proposed solutions are effective for searching communities. Moreover, they are much faster than the baseline solutions.

References

[1]
https://en.wildpedia.org/wiki/Disjoint-set_data_structure.
[2]
A. Amelio and C. Pizzuti. Overlapping community discovery methods: A survey. In Social Networks: Analysis and Case Studies, pages 105--125. Springer, 2014.
[3]
H. Balakrishnan and N. Deo. Discovering communities in complex networks. In Proceedings of the 44st Annual Southeast Regional Conference, pages 280--285, 2006.
[4]
X. Bao and L. Wang. A clique-based approach for co-location pattern mining. Information Sciences, 490:244--264, 2019.
[5]
V. Batagelj and M. Zaversnik. An o(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049, 2003.
[6]
F. Bonchi, A. Khan, and L. Severini. Distance-generalized core decomposition. In SIGMOD, pages 1006--1023, 2019.
[7]
L. Chang, X. Lin, L. Qin, J. X. Yu, and W. Zhang. Index-based optimal algorithms for computing steiner components with maximum connectivity. In SIGMOD, pages 459--474. ACM, 2015.
[8]
L. Chen, Y. Gao, Y. Zhang, C. S. Jensen, and B. Zheng. Efficient and incremental clustering algorithms on star-schema heterogeneous graphs. In ICDE, pages 256--267. IEEE, 2019.
[9]
L. Chen, C. Liu, R. Zhou, J. Li, X. Yang, and B. Wang. Maximum co-located community search in large scale social networks. PVLDB, 11(9):1233--1246, 2018.
[10]
Y. Chen, Y. Fang, R. Cheng, Y. Li, X. Chen, and J. Zhang. Exploring communities in large profiled graphs. TKDE, 31(8):1624--1629, 2018.
[11]
J. Cohen. Trusses: Cohesive subgraphs for social network analysis. National security agency technical report, 16:3--1, 2008.
[12]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Section 26.2: The ford-fulkerson method. Introduction to algorithms, pages 651--664, 2001.
[13]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009.
[14]
M. Coscia, F. Giannotti, and D. Pedreschi. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(5):512--546, 2011.
[15]
W. Cui, Y. Xiao, H. Wang, and W. Wang. Local search of communities in large graphs. In SIGMOD, pages 991--1002. ACM, 2014.
[16]
W. Cui et al. Online search of overlapping communities. In SIGMOD, pages 277--288, 2013.
[17]
B. Du, T. Xinyao, Z. Wang, L. Zhang, and D. Tao. Robust graph-based semisupervised learning for noisy labeled data via maximum correntropy criterion. IEEE transactions on cybernetics, 49(4):1440--1453, 2018.
[18]
J. T. Dudley, T. Deshpande, and A. J. Butte. Exploiting drug-disease relationships for computational drug repositioning. Briefings in bioinformatics, 12(4):303--311, 2011.
[19]
H. Enomoto. Graph partition problems into cycles and paths. Discrete Mathematics, 233(1):93 -- 101, 2001.
[20]
M. A. Erce et al. Interactions affected by arginine methylation in the yeast protein-protein interaction network. Molecular & Cellular Proteomics, 12(11):3184--3198, 2013.
[21]
Y. Fang and R. Cheng. On attributed community search. In International Workshop on Mobility Analytics for Spatio-temporal and Social Data of PVLDB 2017, pages 1--21. Springer, 2017.
[22]
Y. Fang, R. Cheng, Y. Chen, S. Luo, and J. Hu. Effective and efficient attributed community search. The VLDB Journal, 26(6):803--828, 2017.
[23]
Y. Fang, R. Cheng, X. Li, S. Luo, and J. Hu. Effective community search over large spatial graphs. PVLDB, 10(6):709--720, 2017.
[24]
Y. Fang, R. Cheng, S. Luo, and J. Hu. Effective community search for large attributed graphs. In PVLDB, pages 1233--1244, 2016.
[25]
Y. Fang, R. Cheng, S. Luo, J. Hu, and K. Huang. C-explorer: Browsing communities in large graphs. PVLDB, 10(12):1885--1888, 2017.
[26]
Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin. A survey of community search over big graphs. The VLDB Journal, 2019.
[27]
Y. Fang, Z. Wang, R. Cheng, X. Li, S. Luo, J. Hu, and X. Chen. On spatial-aware community search. TKDE, 31(4):783--798, 2019.
[28]
Y. Fang, Z. Wang, R. Cheng, H. Wang, and J. Hu. Effective and efficient community search over large directed graphs. TKDE, 31(11):2093--2107, 2019.
[29]
Y. Fang, K. Yu, R. Cheng, L. V. Lakshmanan, and X. Lin. Efficient algorithms for densest subgraph discovery. PVLDB, 12(11):1719--1732, 2019.
[30]
Y. Fang, H. Zhang, Y. Ye, and X. Li. Detecting hot topics from twitter: A multiview approach. Journal of Information Science, 40(5):578--593, 2014.
[31]
S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.
[32]
C. Giatsidis, D. M. Thilikos, and M. Vazirgiannis. D-cores: Measuring collaboration of directed graphs based on degeneracy. In IEEE International Conference on Data Mining, pages 201--210, Dec 2011.
[33]
J. Hu, X. Wu, R. Cheng, S. Luo, and Y. Fang. Querying minimal steiner maximum-connected subgraphs in large graphs. In CIKM, pages 1241--1250. ACM, 2016.
[34]
J. Hu, X. Wu, R. Cheng, S. Luo, and Y. Fang. On minimal steiner maximum-connected subgraph queries. TKDE, 29(11):2455--2469, 2017.
[35]
X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, pages 1311--1322, 2014.
[36]
X. Huang and L. V. S. Lakshmanan. Attribute-driven community search. PVLDB, 10(9):949--960, May 2017.
[37]
X. Huang, L. V. S. Lakshmanan, J. X. Yu, and H. Cheng. Approximate closest community search in networks. PVLDB, 9(4):276--287, 2015.
[38]
Z. Huang, Y. Zheng, R. Cheng, Y. Sun, N. Mamoulis, and X. Li. Meta structure: Computing relevance in large heterogeneous information networks. In KDD, pages 1595--1604. ACM, 2016.
[39]
B. S. Khan and M. A. Niazi. Network community detection: A review and visual survey. arXiv preprint arXiv.1708.00977, 2017.
[40]
J. Leskovec, K. J. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In WWW, pages 631--640. ACM, 2010.
[41]
R.-H. Li, L. Qin, F. Ye, J. X. Yu, X. Xiao, N. Xiao, and Z. Zheng. Skyline community search in multi-valued networks. In SIGMOD, pages 457--472. ACM, 2018.
[42]
R.-H. Li, L. Qin, J. X. Yu, and R. Mao. Influential community search in large networks. PVLDB, 8(5):509--520, 2015.
[43]
Z. Li, Y. Fang, Q. Liu, J. Cheng, R. Cheng, and J. C. Lui. Walking in the cloud: Parallel simrank at scale. PVLDB, 9(1):24--35, 2015.
[44]
B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation: An index-based approach. In The World Wide Web Conference, pages 1130--1141, 2019.
[45]
F. Luo, B. Du, L. Zhang, L. Zhang, and D. Tao. Feature learning using spatial-spectral hypergraph discriminant analysis for hyperspectral image. IEEE transactions on cybernetics, 49(7):2406--2419, 2018.
[46]
C. Ma, R. Cheng, L. V. Lakshmanan, T. Grubenmann, Y. Fang, and X. Li. Linc: a motif counting algorithm for uncertain graphs. PVLDB, 13(2):155--168, 2019.
[47]
C. Ma, Y. Fang, R. Cheng, L. V. Lakshmanan, W. Zhang, and X. Lin. Efficient algorithms for densest subgraph discovery on large directed graphs. In SIGMOD. ACM, 2020.
[48]
C. Meng, R. Cheng, S. Maniu, P. Senellart, and W. Zhang. Discovering meta-paths in large heterogeneous information networks. In WWW, pages 754--764, 2015.
[49]
M. E. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026--113, 2004.
[50]
T. Opsahl. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2):159--167, 2013.
[51]
J. B. Orlin. Max flows in o (nm) time, or better. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 765--774. ACM, 2013.
[52]
Y. Peng, Y. Zhang, X. Lin, W. Zhang, L. Qin, and J. Zhou. Towards bridging theory and practice: hop-constrained st simple path enumeration. PVLDB, 13(4):463--476, 2019.
[53]
Y. Peng, Y. Zhang, W. Zhang, X. Lin, and L. Qin. Efficient probabilistic k-core computation on uncertain graphs. In ICDE, pages 1192--1203. IEEE, 2018.
[54]
P. Pesantez-Cabrera and A. Kalyanaraman. Efficient detection of communities in biological bipartite networks. IEEE/ACM transactions on computational biology and bioinformatics, 16(1):258--271, 2017.
[55]
X. Qiu, W. Cen, Z. Qian, Y. Peng, Y. Zhang, X. Lin, and J. Zhou. Real-time constrained cycle detection in large dynamic graphs. PVLDB, 11(12):1876--1888, 2018.
[56]
J. Scott. Social Network Analysis: A Handbook. Sage Publications, 2000.
[57]
S. B. Seidman. Network structure and minimum degree. Social networks, 5(3):269--287, 1983.
[58]
C. Shi, Y. Li, J. Zhang, Y. Sun, and P. S. Yu. A survey of heterogeneous information network analysis. IEEE Trans. on Knowl. and Data Eng., 29(1):17--37, Jan. 2017.
[59]
C. Shi, R. Wang, Y. Li, P. S. Yu, and B. Wu. Ranking-based clustering on general heterogeneous information networks by network projection. In CIKM, pages 699--708. ACM, 2014.
[60]
M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In KDD, pages 939--948, 2010.
[61]
Y. Sun, C. C. Aggarwal, and J. Han. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. PVLDB, 5(5):394--405, Jan. 2012.
[62]
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, 4(11):992--1003, 2011.
[63]
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In EDBT, pages 565--576. ACM, 2009.
[64]
Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In KDD, pages 1348--1356. ACM, 2012.
[65]
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD, pages 797--806. ACM, 2009.
[66]
S. M. Van Dongen. Graph clustering by flow simulation. PhD thesis, University of Utrecht, 2000.
[67]
G. Wan, B. Du, S. Pan, and J. Wu. Adaptive knowledge subgraph ensemble for robust and trustworthy knowledge graph completion. World Wide Web, pages 1--20, 2019.
[68]
K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In ICDE, pages 233--244. IEEE, 2018.
[69]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB, 12(10):1139--1152, 2019.
[70]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Efficient bitruss decomposition for large-scale bipartite graphs. In ICDE. IEEE, 2020.
[71]
L. Wang, X. Bao, H. Chen, and L. Cao. Effective lossless condensed representation and discovery of spatial co-location patterns. Information Sciences, 436:197--213, 2018.
[72]
L. Wang, X. Bao, and L. Zhou. Redundancy reduction for prevalent co-location patterns. TKDE, 30(1):142--155, 2017.
[73]
L. Wang, X. Bao, L. Zhou, and H. Chen. Mining maximal sub-prevalent co-location patterns. World Wide Web, 22(5):1971--1997, 2019.
[74]
Y. Wu, R. Jin, J. Li, and X. Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798--809, 2015.
[75]
Y. Fang, Y. Yang, W. Zhang, X. Lin, X. Cao. Effective and efficient community search over large heterogeneous information networks (technical report). http://www.cse.unsw.edu.au/~z3525370/csh.pdf.
[76]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In International Conference on Data Mining, pages 745--754, 2012.
[77]
Y. Yang, Y. Fang, X. Lin, and W. Zhang. Effective and efficient truss computation over large heterogeneous information networks. In ICDE. IEEE, 2020.
[78]
L. Yuan, L. Qin, W. Zhang, L. Chang, and J. Yang. Index-based densest clique percolation community search in networks. IEEE Transactions on Knowledge and Data Engineering, 30(5):922--935, May 2018.
[79]
B. Zhang, T. Nie, D. Shen, Y. Kou, G. Yu, and Z. Zhou. A graph clustering algorithm for citation networks. In APWeb, pages 414--418. Springer, 2016.
[80]
Y. Zhang and J. X. Yu. Unboundedness and efficiency of truss maintenance in evolving graphs. In SIGMOD, pages 1024--1041, 2019.
[81]
Y. Zhou and L. Liu. Social influence based clustering of heterogeneous information networks. In KDD, pages 338--346. ACM, 2013.

Cited By

View all
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2024)Evolution Forest Index: Towards Optimal Temporal k-Core Component Search via Time-Topology Isomorphic ComputationProceedings of the VLDB Endowment10.14778/3681954.368196717:11(2840-2853)Online publication date: 30-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 6
February 2020
170 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2020
Published in PVLDB Volume 13, Issue 6

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)9
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2024)Evolution Forest Index: Towards Optimal Temporal k-Core Component Search via Time-Topology Isomorphic ComputationProceedings of the VLDB Endowment10.14778/3681954.368196717:11(2840-2853)Online publication date: 30-Aug-2024
  • (2024)Efficient Parallel D-Core Decomposition at ScaleProceedings of the VLDB Endowment10.14778/3675034.367505417:10(2654-2667)Online publication date: 6-Aug-2024
  • (2024)QTCS: Efficient Query-Centered Temporal Community SearchProceedings of the VLDB Endowment10.14778/3648160.364816317:6(1187-1199)Online publication date: 3-May-2024
  • (2024)Densest Multipartite Subgraph Search in Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3636218.363622617:4(699-711)Online publication date: 5-Mar-2024
  • (2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
  • (2024)FCS-HGNN: Flexible Multi-type Community Search in Heterogeneous Information NetworksProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679696(207-217)Online publication date: 21-Oct-2024
  • (2024)Scalable Community Search over Large-scale Graphs based on Graph TransformerProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657771(1680-1690)Online publication date: 10-Jul-2024
  • (2024)Semantic community query in a large‐scale attributed graph based on an attribute cohesiveness optimization strategyExpert Systems10.1111/exsy.1370441:11Online publication date: 14-Aug-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media