Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering

Published: 24 May 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been demonstrated an effective way to achieve better clustering results. Despite the previous success, existing multi-view graph clustering methods usually assume that different views are available for the same set of instances. Thus, instances in different domains can be treated as having strict one-to-one relationship. In many real-life applications, however, data instances in one domain may correspond to multiple instances in another domain. Moreover, relationships between instances in different domains may be associated with weights based on prior (partial) knowledge. In this article, we propose a flexible and robust framework, Co-regularized Graph Clustering (CGC), based on non-negative matrix factorization (NMF), to tackle these challenges. CGC has several advantages over the existing methods. First, it supports many-to-many cross-domain instance relationship. Second, it incorporates weight on cross-domain relationship. Third, it allows partial cross-domain mapping so that graphs in different domains may have different sizes. Finally, it provides users with the extent to which the cross-domain instance relationship violates the in-domain clustering structure, and thus enables users to re-evaluate the consistency of the relationship. We develop an efficient optimization method that guarantees to find the global optimal solution with a given confidence requirement. The proposed method can automatically identify noisy domains and assign smaller weights to them. This helps to obtain optimal graph partition for the focused domain. Extensive experimental results on UCI benchmark datasets, newsgroup datasets, and biological interaction networks demonstrate the effectiveness of our approach.

    References

    [1]
    A. Asuncion and D. Newman. 2007. UCI machine learning repository. (2007).
    [2]
    Sitaram Asur, Duygu Ucar, and Srinivasan Parthasarathy. 2007. An ensemble framework for clustering protein-protein interaction networks. In Proceedings of the Annual International Conference on Intelligent Systems for Molecular Biology (ISMB). Cambridge Press, Vienna, Austria, 29--40.
    [3]
    Arindam Banerjee, Sugato Basu, and Srujana Merugu. 2007. Multi-way clustering on relation graphs. In Proceedings of the SIAM International Conference on Data Mining (SIAM SDM’07). SIAM, Minneapolis, Minnesota, 145--156.
    [4]
    Ron Bekkerman and Andrew Mccallum. 2005. Multi-way distributional clustering via pairwise interactions. In Proceedings of the International Conference on Machine Learning (ICML). New York, NY, 41--48.
    [5]
    Steffen Bickel and Tobias Scheffer. 2004. Multi-view clustering. In Proceedings of the 9th IEEE International Conference on Data Mining (IEEE ICDM’04). Brighton, UK, 19--26.
    [6]
    Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
    [7]
    Kamalika Chaudhuri, Sham M. Kakade, Karen Livescu, and Karthik Sridharan. 2009. Multi-view clustering via canonical correlation analysis. In Proceedings of the International Conference on Machine Learning (ICML). Montreal, Canada, 129--136.
    [8]
    Wei Cheng, Xiang Zhang, Yubao Wu, Xiaolin Yin, Jing Li, David Heckerman, and Wei Wang. 2012. Inferring novel associations between SNP sets and gene sets in eQTL study using sparse graphical model. In Proceedings of the Third ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM-BCB’12). Orlando, Florida, 466--473.
    [9]
    H. J. Cordell. 2009. Detecting gene-gene interactions that underlie human diseases. Nat. Rev. Genet. 10 (2009), 392--404.
    [10]
    Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2008. Self-taught clustering. In Proceedings of the International Conference on Machine Learning (ICML). Helsinki, Finland, 200--207.
    [11]
    Ian Davidson, Buyue Qian, Xiang Wang, and Jieping Ye. 2013. Multi-objective multi-view spectral clustering via pareto optimization. In Proceedings of the SIAM International Conference on Data Mining (SIAM SDM’13). SIAM, Austin, Texas, USA, 234--242.
    [12]
    Chris Ding, Tao Li, Wei Peng, and Haesun Park. 2006. Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’06). Philadelphia, USA, 126--135.
    [13]
    Marco Dorigo, Marco Antonio Montes de Oca, and Andries Petrus Engelbrecht. 2008. Particle swarm optimization. Scholarpedia 3 (2008), 1486.
    [14]
    T. Feng and X. Zhu. 2010. Genome-wide searching of rare genetic variants in WTCCC data. Hum. Genet. 128 (2010), 269--280.
    [15]
    D. Fenyo (Ed.). 2010. Methods in Molecular Biology: Topics in Computational Biology. Springer Science+Business Media LLC, New York.
    [16]
    Xiaoli Zhang Fern and Carla E. Brodley. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning (ICML’04). Banff, Alberta, Canada, 36--45.
    [17]
    Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, and Jiawei Han. 2009. Graph-based consensus maximization among multiple supervised and unsupervised models. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’09). Vancouver, B.C., Canada, 585--593.
    [18]
    Fred Glover and Claude McMillan. 1986. The general employee scheduling problem. An integration of MS and AI. Computers & OR 13 (1986), 563--573.
    [19]
    Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Learning a kernel for multi-task clustering. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). San Francisco, California, USA, 74--80.
    [20]
    Steve Horvath and Jun Dong. 2008. Geometric interpretation of gene coexpression network analysis. PLoS Computational Biology 4, 8 (2008), e1000117.
    [21]
    Jochen S. Hub and Bert L. de Groot. 2009. Detection of functional modes in protein dynamics. PLoS Computational Biology 5, 8 (2009), e1000480.
    [22]
    Da Kuang, Haesun Park, and Chris H. Q. Ding. 2012. Symmetric nonnegative matrix factorization for graph clustering. In Proceedings of the SIAM International Conference on Data Mining (SIAM SDM’12). SIAM, Los Angeles, California, USA, 106--117.
    [23]
    Abhishek Kumar and Hal Daumé III. 2011. A co-training approach for multi-view spectral clustering. In Proceedings of the International Conference on Machine Learning (ICML). Bellevue, Washington, USA, 393--400.
    [24]
    Abhishek Kumar, Piyush Rai, and Hal Daumé III. 2011. Co-regularized multi-view spectral clustering. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’11). Vancouver, Granada Spain, 1413--1421.
    [25]
    Pedro Larraanaga and Jose A. Lozano. 2001. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers.
    [26]
    Daniel D. Lee and H. Sebastian Seung. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’00). Vancouver, Breckenridge, CO, USA, 556--562.
    [27]
    Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne M. VanBriesen, and Natalie S. Glance. 2007. Cost-effective outbreak detection in networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’07). San Jose, California, USA, 420--429.
    [28]
    Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang. 2008. A general model for multiple view unsupervised learning. In Proceedings of the SIAM International Conference on Data Mining (SIAM SDM’08). SIAM, Atlanta, Georgia, USA, 822--833.
    [29]
    V. K. Mootha, C. M. Lindgren, K. F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laurila, N. Houstis, M. J. Daly, N. Patterson, J. P. Mesirov, T. R. Golub, P. Tamayo, B. Spiegelman, E. S. Lander, J. N. Hirschhorn, D. Altshuler, and L. C. Groop. 2003. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 3 (2003), 267--273.
    [30]
    Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’01). Vancouver, British Columbia, Canada, 849--856.
    [31]
    H. Späth. 1985. Cluster Dissection and Analysis. Theory, FORTRAN programs, examples. Ellis Horwood.
    [32]
    Alexander Strehl, Joydeep Ghosh, and Claire Cardie. 2002. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (2002), 583--617.
    [33]
    Yizhou Sun and Jiawei Han. 2012. Mining Heterogeneous Information Networks: Principles and Methodologies.
    [34]
    Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009a. RankClus: Integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT’09). Saint-Petersburg, Russia, 565--576.
    [35]
    Yizhou Sun, Yintao Yu, and Jiawei Han. 2009b. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’09). Paris, France, 797--806.
    [36]
    Lei Tang, Xufei Wang, and Huan Liu. 2012. Community detection via heterogeneous interaction analysis. Data Min. Knowl. Discov 25 (2012), 1--33.
    [37]
    Wei Tang, Zhengdong Lu, and Inderjit S. Dhillon. 2009. Clustering with multiple graphs. In Proceedings of the 9th IEEE International Conference on Data Mining (IEEE ICDM’09). Miami, Florida, USA, 1016--1021.
    [38]
    The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nature Genetics 25(1) (2000), 25--29.
    [39]
    Stijn van Dongen. 2000. A cluster algorithm for graphs. In Centrum voor Wiskunde en Informatica (CWI). 40.
    [40]
    Xiang Wang and Ian Davidson. 2010. Flexible constrained spectral clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’10). Washington, DC, USA, 563--572.
    [41]
    P. H. Westfall and S. S. Young. 1993. Resampling-Based Multiple Testing. Wiley, New York.
    [42]
    Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the ACM SIGIR Conference. Toronto, Canada, 267--273.
    [43]
    Guo-Xian Yu, Huzefa Rangwala, Carlotta Domeniconi, Guoji Zhang, and Zili Zhang. 2013. Protein function prediction by integrating multiple kernels. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China, 74--80.
    [44]
    X. Zhang, S. Huang, F. Zou, and W. Wang. 2010. TEAM: Efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26(12) (2010), i217--227.
    [45]
    Dengyong Zhou and Christopher J. C. Burges. 2007. Spectral clustering and transductive learning with multiple views. In Proceedings of the International Conference on Machine Learning (ICML’07). Corvallis, Oregon, 1159--1166.
    [46]
    Yang Zhou and Ling Liu. 2013. Social influence based clustering of heterogeneous information networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’13). Chicago, USA, 338--346.

    Cited By

    View all

    Index Terms

    1. CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 4
      Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
      July 2016
      417 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2936311
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 May 2016
      Accepted: 01 March 2016
      Revised: 01 August 2015
      Received: 01 September 2014
      Published in TKDD Volume 10, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Graph clustering
      2. co-regularization
      3. nonnegative matrix factorization

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • National Science Foundation
      • National Institutes of Health

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)15
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Bayesian Multi-View Clustering given complex inter-view structureF1000Research10.12688/f1000research.126215.211(1460)Online publication date: 29-Feb-2024
      • (2022)Bayesian Multi-View Clustering given complex inter-view structureF1000Research10.12688/f1000research.126215.111(1460)Online publication date: 9-Dec-2022
      • (2022)Transferable discriminative non-negative matrix factorization for cross-database facial expression recognitionDigital Signal Processing10.1016/j.dsp.2022.103424123:COnline publication date: 30-Apr-2022
      • (2022)A new clustering algorithm for genes with multiple cancer diseases by self-consistent field iteration methodNetwork Modeling Analysis in Health Informatics and Bioinformatics10.1007/s13721-022-00362-611:1Online publication date: 7-Apr-2022
      • (2020)Multi-view clustering by exploring complex mapping relationship between viewsPattern Recognition Letters10.1016/j.patrec.2020.07.031Online publication date: Jul-2020
      • (2019)Semi-Supervised Non-Negative Matrix Factorization With Dissimilarity and Similarity RegularizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2019.2933223(1-12)Online publication date: 2019
      • (2019)$k$ -Context Technique: A Method for Identifying Dense Subgraphs in a Heterogeneous Information NetworkIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29423236:6(1190-1205)Online publication date: Dec-2019
      • (2019)Multi-Mode Social Network Clustering via Non-Negative Tri-Matrix Factorization With Cluster Indicator Similarity RegularizationIEEE Access10.1109/ACCESS.2019.29467447(151713-151723)Online publication date: 2019
      • (2018)Multi-domain Networks Association for Biological Data Using Block Signed Graph ClusteringIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2018.2848904(1-1)Online publication date: 2018

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media