Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1557019.1557107acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Ranking-based clustering of heterogeneous information networks with star network schema

Published: 28 June 2009 Publication History

Abstract

A heterogeneous information network is an information network
composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been addressed until recently.
A recent study proposed a new algorithm, RankClus, for clustering on bi-typed heterogeneous networks. However, a real-world network may consist of more than two types, and the interactions among multi-typed objects play a key role at disclosing the rich semantics that a network carries. In this paper, we study clustering of multi-typed heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters. An iterative enhancement method is developed that leads to effective ranking-based clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each net-cluster.

Supplementary Material

JPG File (p797-sun.jpg)
MP4 File (p797-sun.mp4)

References

[1]
A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In Proceedings of the 7th SIAM International Conference on Data Mining SIAM'07, 2007.
[2]
R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML '05: Proceedings of the 22nd international conference on Machine learning ICML'05, pages 41--48, 2005.
[3]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998.
[4]
C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01) ICDM'01, pages 107--114. IEEE Computer Society, 2001.
[5]
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communicationSIGCOMM'99, pages 251--262, 1999.
[6]
T. Hofmann. Probabilistic latent semantic analysis. In In Proc. of Uncertainty in Artificial Intelligence (UAI'99)UAI'99, pages 289--296, 1999.
[7]
G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD conference (KDD'02)KDD'02, pages 538--543. ACM, 2002.
[8]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
[9]
B. Long, Z. M. Zhang, X. Wú, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML '06: Proceedings of the 23rd international conference on Machine learning ICML'06, pages 585--592, 2006.
[10]
Q. Mei, D. Zhang, and C. Zhai. A general optimization framework for smoothing language models on graph structures. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval SIGIR'08SIGIR'08, pages 611--618, 2008.
[11]
M. E. J. Newman. The structure of scientific collaboration networks. Working Papers 00-07-037, Santa Fe Institute, July 2000.
[12]
M. E. J. Newman. Assortative mixing in networks. Physical Review Letters, 89(20):208701, October 2002.
[13]
Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level ranking: Bringing order to web objects. In Proceedings of the fourteenth International World Wide Web Conference (WWW'05)WWW'05, pages 567--574. ACM, May 2005.
[14]
J. Shi and J. Malik. Normalized cuts and image segmentation. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR'97)CVPR'97, page 731. IEEE Computer Society, 1997.
[15]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'04)KDD'04, pages 306--315, 2004.
[16]
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogenous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology Conference (EDBT'09)EDBT'09, 2009.
[17]
Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD'08)SIGMOD'08, pages 567--580, 2008.
[18]
U. von Luxburg. A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics, 2006.
[19]
S. White and P. Smyth. A spectral clustering approach to finding communities in graph. In Proceedings of the Fifth SIAM International Conference on Data Mining (SDM'05)SDM'05, 2005.
[20]
X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'07)KDD'07, pages 824--833, 2007.
[21]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.
[22]
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningKDD'04, pages 743--748, 2004.
[23]
N. Wang, S. Parthasarathy, K.-L. Tan, and A. K. H. Tung. Csv: visualizing and mining cohesive subgraphs. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD'08)SIGMOD'08, pages 445--458, 2008.

Cited By

View all
  • (2024)Clustering on heterogeneous IoT information network based on meta pathScience Progress10.1177/00368504241257389107:2Online publication date: 17-Jun-2024
  • (2024)Enhancing Sequential Recommendation System For MOOCs Based On Heterogeneous Information Networks2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)10.1109/MAPR63514.2024.10660698(1-6)Online publication date: 15-Aug-2024
  • (2024)Graph neural network recommendation algorithm based on improved dual tower modelScientific Reports10.1038/s41598-024-54376-314:1Online publication date: 15-Feb-2024
  • Show More Cited By

Index Terms

  1. Ranking-based clustering of heterogeneous information networks with star network schema

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. heterogeneous information network

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Clustering on heterogeneous IoT information network based on meta pathScience Progress10.1177/00368504241257389107:2Online publication date: 17-Jun-2024
    • (2024)Enhancing Sequential Recommendation System For MOOCs Based On Heterogeneous Information Networks2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)10.1109/MAPR63514.2024.10660698(1-6)Online publication date: 15-Aug-2024
    • (2024)Graph neural network recommendation algorithm based on improved dual tower modelScientific Reports10.1038/s41598-024-54376-314:1Online publication date: 15-Feb-2024
    • (2024)Heterogeneous network influence maximization algorithm based on multi-scale propagation strength and repulsive force of propagation fieldKnowledge-Based Systems10.1016/j.knosys.2024.111580291(111580)Online publication date: May-2024
    • (2024)Interest-driven community detection on attributed heterogeneous information networksInformation Fusion10.1016/j.inffus.2024.102525111(102525)Online publication date: Nov-2024
    • (2024)Attribute-sensitive community search over attributed heterogeneous information networksExpert Systems with Applications10.1016/j.eswa.2023.121153235(121153)Online publication date: Jan-2024
    • (2024)MC-Det: Multi-channel representation fusion for malicious domain name detectionComputer Networks10.1016/j.comnet.2024.110847(110847)Online publication date: Oct-2024
    • (2024)A node clustering algorithm for heterogeneous information networks based on node embeddingsMultimedia Tools and Applications10.1007/s11042-023-15245-983:2(3745-3766)Online publication date: 1-Jan-2024
    • (2024)Similarity enhancement of heterogeneous networks by weighted incorporation of informationKnowledge and Information Systems10.1007/s10115-023-02050-x66:5(3133-3156)Online publication date: 27-Jan-2024
    • (2024)Heterogeneous Graph Contrastive Learning with Dual Aggregation Scheme and Adaptive AugmentationWeb and Big Data10.1007/978-981-97-2421-5_9(124-138)Online publication date: 12-May-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media