Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1031171.1031234acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

On combining multiple clusterings

Published: 13 November 2004 Publication History
  • Get Citation Alerts
  • Abstract

    Many problems can be reduced to the problem of combining multiple clusterings. In this paper, we first summarize different application scenarios of combining multiple clusterings and provide a new perspective of viewing the problem as a categorical clustering problem. We then show the connections between various consensus and clustering criteria and discuss the complexity results of the problem. Finally we propose a new method to determine the final clustering. Experiments on kinship terms and clustering popular music from heterogeneous feature sets show the effectiveness of combining multiple clusterings.

    References

    [1]
    Arabie, P., Carroll, J. D., & Desarbo, W. (1987). Three-way scaling and clustering. Newbury Park, CA: Sage publications.
    [2]
    Argamon, S., Saric, M., & Stein, S. S. (2003). Style mining of electronic messages for multiple authorship discrimination: first results. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 475--480). Washington, D.C.: ACM Press.
    [3]
    Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105--139.
    [4]
    Bill, E. (1994). Some advances in transformation-based parts of speech tagging. Proceedings of the twelfth national conference on Artificial intelligence (vol. 1) (pp. 722--727). American Association for Artificial Intelligence.
    [5]
    Brucker, P. (1977). On the complexity of clustering problems. Optimization and Operations Research (pp. 45--54). Springer-Verlag.
    [6]
    Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37--46.
    [7]
    David, A., & Panchanathan, S. (2000). Wavelet-histogram method for face recognition. Journal of Electronic Imaging, 9, 217--225.
    [8]
    Day, W. H. E. (1986). Foreword: Comparison and consensus of classifications. Journal of Classification, 3, 183--185.
    [9]
    Duran, B. S., & Odell, P. L. Cluster analysis: a survey. New York, NY: Springer.
    [10]
    Everitt, B. S. (1987). Introduction to optimization methods and their application in statistics. Chapman and Hall.
    [11]
    Ferligoj, A. (1992). Direct multicriteria clustering algorithm. Journal of Classification, 9, 43--61.
    [12]
    Ferligoj, A., & Batagelj, V. (1983). Some types of clustering with relational constraints. Psychometrika, 48, 541--552.
    [13]
    Fern, X. Z., & Brodley, C. E. (2003). Random projection for high dimensional data clustering: A cluster ensemble approach. Proceedings of the Twentieth International Conference on Machine Learning(ICML 2003) (pp. 186--193). Morgan Kaufmann Publishers.
    [14]
    Golub, G. H., & Loan, C. F. V. (1991). Matrix computations. The Johns Hopkins University Press.
    [15]
    Goodman, L. A., & Kruskal, W. H. (1954). Measures of associations for cross classification. Journal of the American Statistical Association, 49, 732--764.
    [16]
    Gordan, A. D., & Vichi, M. (1998). Partitions of partitions. journal of classification, 15, 265--285.
    [17]
    Gordan, A. D., & Vichi, M. (2002). Obtaining partitions of a set of hard or fuzzy partitions. Classification, Clustering and Data Analysis: recent advances and applications (pp. 75--79). Springer.
    [18]
    Hubert, L. J., & Arabie, P. (1985). Comparing partitions. journal of classification, 2, 193--218.
    [19]
    Hubert, L. J., & Baker, F. B. (1978). Evaluating the conformity of sociometric measurements. Psychometrika, 43, 31--41.
    [20]
    Kargupta, H., Huang, W., Sivakumar, K., & Johnson, E. L. (2001). Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3, 422--448.
    [21]
    Katz, L., & Powell, J. H. (1953). A proposed index of the conformity of one sociometric measurement to another. Psychometrika, 18, 249--256.
    [22]
    Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. John Wiley.
    [23]
    Li, T., Ma, S., & Ogihara, M. (2004). Document clustering via adaptive subspace iteration. Proceedings of Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004). To appear.
    [24]
    Li, T., & Ogihara, M. (2004). Music artist style identification by semisupervised learning from both lyrics and content. Proceedings of the ACM Conference on Multimeda. To appear.
    [25]
    Li, T., Ogihara, M., & Li, Q. (2003a). A comparative study on content-based music genre classification. SIGIR'03 (pp. 282--289). ACM Press.
    [26]
    Li, T., Zhu, S., & Ogihara, M. (2003b). Algorithms for clustering high dimensional and distributed data. Intelligent Data Analysis Journal, 7. 305--326.
    [27]
    Messatfa, H. (1992). An algorithm to maximize the agreement. Journal of Classification, 9, 5--15.
    [28]
    Mirkin, B. (20001). Reinterpreting the category utility function. Machine Learning, 45, 219--228.
    [29]
    Mitton, R. (1987). Spelling checkers, spelling correctors and the misspellings of poor spellers. Information Processing and Management, 23, 103--209.
    [30]
    Monti, S., Tamayo, P., Mesirov, J., & Gloub, T. (2003). Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning Journal, 52, 91--118.
    [31]
    Moret, B. M. (1998). The theory of computation. Addison-Wesley.
    [32]
    P.W. Ellis, D., Whitman, B., Berenzweig, A., & Lawrence, S. (2002). The quest for ground truth in musical artist similarity. Proceedings of 3rd International Conference on Music Information Retrieval (pp. 170--177).
    [33]
    Rosenberg, S., & Kim, M. P. (1975). The method of sorting as a data gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489--502.
    [34]
    Stamatatos, E., Fakotakis, N., & Kokkinakis, G. (2000). Automatic text categorization in terms of genre and author. Computational Linguistics, 26, 471--496.
    [35]
    Strehl, A., & Ghosh, J. (2003). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3, 583--617.
    [36]
    Tweedie, F. J., & Baayen, R. H. (1998). How variable may a constant be? Measure of lexical richness in perspective. Computers and the Humanities, 32, 323--352.
    [37]
    Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10.
    [38]
    Vichi, M. (1999). One-mode classification of a three-way data matrix. journal of classification, 16, 27--44.
    [39]
    Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis (Technical Report). Department of Computer Science, University of Minnesota.

    Cited By

    View all
    • (2023)An Ensemble and Multi-View Clustering Method Based on Kolmogorov ComplexityEntropy10.3390/e2502037125:2(371)Online publication date: 17-Feb-2023
    • (2023)LSEC: Large-scale spectral ensemble clusteringIntelligent Data Analysis10.3233/IDA-21624027:1(59-77)Online publication date: 30-Jan-2023
    • (2020)A Guide to Conquer the Biological Network Era Using Graph TheoryFrontiers in Bioengineering and Biotechnology10.3389/fbioe.2020.000348Online publication date: 31-Jan-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
    November 2004
    678 pages
    ISBN:1581138741
    DOI:10.1145/1031171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. categorical
    2. combining
    3. multiple clusterings

    Qualifiers

    • Article

    Conference

    CIKM04
    Sponsor:
    CIKM04: Conference on Information and Knowledge Management
    November 8 - 13, 2004
    D.C., Washington, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)An Ensemble and Multi-View Clustering Method Based on Kolmogorov ComplexityEntropy10.3390/e2502037125:2(371)Online publication date: 17-Feb-2023
    • (2023)LSEC: Large-scale spectral ensemble clusteringIntelligent Data Analysis10.3233/IDA-21624027:1(59-77)Online publication date: 30-Jan-2023
    • (2020)A Guide to Conquer the Biological Network Era Using Graph TheoryFrontiers in Bioengineering and Biotechnology10.3389/fbioe.2020.000348Online publication date: 31-Jan-2020
    • (2020)A New Information Theory Based Clustering Fusion Method for Multi-view Representations of Text DocumentsSocial Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis10.1007/978-3-030-49570-1_11(156-167)Online publication date: 10-Jul-2020
    • (2018)The Influence of the Zonation Effect on a System of Hierarchical Functional RegionsBusiness Systems Research Journal10.2478/bsrj-2018-00189:2(45-54)Online publication date: 28-Jul-2018
    • (2017)Cross-Domain Recommendation via Clustering on Multi-Layer GraphsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080774(195-204)Online publication date: 7-Aug-2017
    • (2015)Spectral Ensemble ClusteringProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783287(715-724)Online publication date: 10-Aug-2015
    • (2014)A Framework for Hierarchical Ensemble ClusteringACM Transactions on Knowledge Discovery from Data10.1145/26113809:2(1-23)Online publication date: 23-Sep-2014
    • (2011)Weighted Co-clustering Based Clustering EnsembleProceedings of the 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics10.1109/NCVPRIPG.2011.17(46-49)Online publication date: 15-Dec-2011
    • (2010)The effect of cooling functions on ensemble clustering using simulated annealingIntelligent Data Analysis10.5555/1890496.189050314:6(701-730)Online publication date: 15-Nov-2010
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media