research-article

Combining Multiple Clusterings Using Evidence Accumulation

Authors:

Ana L. N. Fred,

Anil K. JainAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 27, Issue 6

Pages 835 - 850

https://doi.org/10.1109/TPAMI.2005.113

Published: 01 June 2005 Publication History

Abstract

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n \times n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.

References

[1]

D. Fasulo, “An Analysis of Recent Work on Clustering,” technical report, Univ. of Washington, Seatle, http://www.cs.washington.edu/homes/dfasulo/clustering.ps, http://citeseer.nj.nec.com/fasulo99analysi.html, 1999.]]

[2]

D. Judd P. Mckinley and A.K. Jain, “Large-Scale Parallel Data Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.]]

Digital Library

[3]

S.K. Bhatia and J.S. Deogun, “Conceptual Clustering in Information Retrieval,” IEEE Trans. Systems, Man, and Cybernetics, vol. 28, no. 3, pp. 427-536, 1998.]]

Digital Library

[4]

C. Carpineto and G. Romano, “A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval,” Machine Learning, vol. 24, no. 2, pp. 95-122, 1996.]]

Digital Library

[5]

E.J. Pauwels and G. Frederix, “Finding Regions of Interest for Content-Extraction,” Proc. IS&T/SPIE Conf. Storage and Retrieval for Image and Video Databases VII, vol. 3656, pp. 501-510, Jan. 1999.]]

[6]

H. Frigui and R. Krishnapuram, “A Robust Competitive Clustering Algorithm with Applications in Computer Vision,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp.nbsp450-466, May 1999.]]

Digital Library

[7]

A.K. Jain M.N. Murty and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sept. 1999.]]

Digital Library

[8]

R.O. Duda P.E. Hart and D.G. Stork, Pattern Classification, second ed. Wiley, 2001.]]

Digital Library

[9]

L. Kaufman and P.J. Rosseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc., 1990.]]

[10]

B. Everitt, Cluster Analysis. John Wiley and Sons, 1993.]]

[11]

S. Theodoridis and K. Koutroumbas, Pattern Recognition. Academic Press, 1999.]]

Digital Library

[12]

A.K. Jain and J.V. Moreau, “Bootstrap Technique in Cluster Analysis,” Pattern Recognition, vol. 20, no. 5, pp. 547-568, 1987.]]

Digital Library

[13]

R. Kothari and D. Pitts, “On Finding the Number of Clusters,” Pattern Recognition Letters, vol. 20, pp. 405-416, 1999.]]

Digital Library

[14]

J. Buhmann and M. Held, “Unsupervised Learning without Overfitting: Empirical Risk Approximation as an Induction Principle for Reliable Clustering,” Proc. Int'l Conf. Advances in Pattern Recognition, S. Singh, ed., pp. 167-176, 1999.]]

[15]

D. Stanford and A.E. Raftery, “Principal Curve Clustering with Noise,” technical Report, Univ. of Washington, http://www. stat.washington.edu/raftery, 1997.]]

[16]

Y. Man and I. Gath, “Detection and Separation of Ring-Shaped Clusters Using Fuzzy Clusters,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 8, pp. 855-861, Aug. 1994.]]

Digital Library

[17]

R. Dubes and A.K. Jain, “Validity Studies in Clustering Methodologies,” Pattern Recognition, vol. 11, pp. 235-254, 1979.]]

[18]

T.A. Bailey and R. Dubes, “Cluster Validity Profiles,” Pattern Recognition, vol. 15, no. 2, pp. 61-83, 1982.]]

[19]

M. Har-Even and V.L. Brailovsky, “Probabilistic Validation Approach for Clustering,” Pattern Recognition, vol. 16, pp. 1189-1196, 1995.]]

Digital Library

[20]

N.R. Pal and J.C. Bezdek, “On Cluster Validity for the Fuzzy C-Means Model,” IEEE Trans. Fuzzy Systems, vol. 3, pp. 370-379, 1995.]]

Digital Library

[21]

A. Fred and J. Leitão, “Clustering under a Hypothesis of Smooth Dissimilarity Increments,” Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 190-194, 2000.]]

[22]

A. Fred, “Clustering Based on Dissimilarity First Derivatives,” Proc. Second Int'l Workshop Pattern Recognition in Information Systems, J. Inesta and L. Micó, eds. pp. 257-266, 2002.]]

[23]

G. McLachlan and K. Basford, Mixture Models: Inference and Application to Clustering. New York: Marcel Dekker, 1988.]]

[24]

S. Roberts D. Husmeier I. Rezek and W. Penny, “Bayesian Approaches to Gaussian Mixture Modeling,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1133-1142, Nov. 1998.]]

Digital Library

[25]

M. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar. 2002.]]

Digital Library

[26]

J.D. Banfield and A.E. Raftery, “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, vol. 49, pp. 803-821, Sept. 1993.]]

[27]

B. Mirkin, “Concept Learning and Feature Selection Based on Square-Error Clustering,” Machine Learning, vol. 35, pp. 25-39, 1999.]]

Digital Library

[28]

A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall, 1988.]]

Digital Library

[29]

H. Tenmoto M. Kudo and M. Shimbo, “MDL-Based Selection of the Number of Components in Mixture Models for Pattern Recognition,” Proc. Advances in Pattern Recognition, A. Amin, D.nbspDori, P. Pudil, and H. Freeman, eds., pp. 831-836, 1998.]]

Digital Library

[30]

H. Bischof and A. Leonardis, “Vector Quantization and Minimum Description Length,” Proc. Int'l Conf. Advances on Pattern Recognition, S. Singh, ed., pp. 355-364, 1999.]]

[31]

B. Fischer T. Zoller and J. Buhmann, “Path Based Pairwise Data Clustering with Application to Texture Segmentation,” Proc. Third Int'l Workshop Energy Minimization Methods in Computer Vision and Pattern Recognition, M. Figueiredo, J. Zerubia, and A.K. Jain, eds., pp. 235-266, 2001.]]

Digital Library

[32]

K. Fukunaga, Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990.]]

Digital Library

[33]

E. Gokcay and J.C. Principe, “Information Theoretic Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 158-171, Feb. 2002.]]

Digital Library

[34]

C. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Structures,” IEEE Trans. Computers, vol. 20, no. 1, pp.nbsp68-86, Jan. 1971.]]

Digital Library

[35]

Y. El-Sonbaty and M.A. Ismail, “On-Line Hierarchical Clustering,” Pattern Recognition Letters, pp. 1285-1291, 1998.]]

Digital Library

[36]

M. Chavent, “A Monothetic Clustering Method,” Pattern Recognition Letters, vol. 19, pp. 989-996, 1998.]]

Digital Library

[37]

A. Fred and J. Leitão, “A Comparative Study of String Dissimilarity Measures in Structural Clustering,” Proc. Int'l Conf. Advances in Pattern Recognition, S. Singh, ed., pp. 385-394, 1998.]]

[38]

S. Guha R. Rastogi and K. Shim, “CURE: An Efficient Clustering Algorithm for Large Databases,” Proc. 1998 ACM-SIGMOID Int'l Conf. Management of Data, 1998.]]

Digital Library

[39]

E.W. Tyree and J.A. Long, “The Use of Linked Line Segments for Cluster Representation and Data Reduction,” Pattern Recognition Letters, vol. 20, pp. 21-29, 1999.]]

Digital Library

[40]

Y. Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, pp. 790-799, 1995.]]

Digital Library

[41]

D. Comaniciu and P. Meer, “Distribution Free Decomposition of Multivariate Data,” Pattern Analysis and Applications, vol. 2, pp. 22-30, 1999.]]

Digital Library

[42]

G. Karypis E.-H. Han and V. Kumar, “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” Computer, vol. 32, no. 8, pp. 68-75, Aug. 1999.]]

Digital Library

[43]

P. Bajcsy and N. Ahuja, “Location- and Density-Based Hierarchical Clustering Using Similarity Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 9, pp. 1011-1015, Sept. 1998.]]

Digital Library

[44]

J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.]]

Digital Library

[45]

A.Y. Ng M.I. Jordan and Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm,” Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., Cambridge, Mass.: MIT Press, 2002.]]

[46]

N. Cristianini J. Shawe-Taylor and J. Kandola, “Spectral Kernel Methods for Clustering,” Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., Cambridge, Mass.: MIT Press, 2002.]]

[47]

P.-Y. Yin, “Algorithms for Straight Line Fitting Using k-Means,” Pattern Recognition Letters, vol. 19, pp. 31-41, 1998.]]

Digital Library

[48]

C. Fraley and A.E. Raftery, “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis,” The Computer J., vol. 41, no. 8, pp. 578-588, 1998.]]

[49]

R. Dubes and A.K. Jain, “Clustering Tecnhiques: The User's Dilemma,” Pattern Recognition, vol. 8, pp. 247-260, 1976.]]

[50]

J. Kittler M. Hatef R.P Duin and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.]]

Digital Library

[51]

T. Dietterich, “Ensemble Methods in Machine Learning,” Proc. First Int'l Workshop Multiple Classifier Systems, J. Kittler and F.nbspRoli,nbspeds., pp. 1-15, 2000.]]

Digital Library

[52]

L. Lam, “Classifier Combinations: Implementations and Theoretical Issues,” Proc. First Int'l Workshop Multiple Classifier Systems, J.nbspKittler and F. Roli, eds., pp. 78-86, 2000.]]

Digital Library

[53]

A. Fred, “Finding Consistent Clusters in Data Partitions,” Proc. Second Int'l Workshop Multiple Classifier Systems, J. Kittler and F.nbspRoli, eds., pp. 309-318, 2001.]]

Digital Library

[54]

A. Fred and A.K. Jain, “Data Clustering Using Evidence Accumulation,” Proc. 16th Int'l Conf. Pattern Recognition, pp. 276-280, 2002.]]

[55]

A. Fred and A.K. Jain, “Evidence Accumulation Clustering Based on the k-Means Algorithm,” Proc. Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR Int'l Workshops SSPR 2002 and SPR 2002, T. Caelli, et al., eds., pp. 442-451, 2002.]]

Digital Library

[56]

A. Strehl and J. Ghosh, “Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions,” J. Machine Learning Research, vol. 3, pp. 583-617, Dec. 2002.]]

Digital Library

[57]

B. Kamgar-Parsi and L.N. Kanal, “An Improved Branch and Bound Algorithm for Computing k-Nearest Neighbors,” Pattern Recognition Letters, vol. I, pp. 195-205, 1985.]]

[58]

T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, 1991.]]

Digital Library

[59]

A. Raftery K. Yeung C. Fraley and W. Ruzzo, “Model-Based Clustering and Data Transformation for Gene Expression Data,” Technical Report UW-CSE-01-04-02, Dept. of Computer Science and Eng., Univ. of Washington, 2001.]]

[60]

R.A. Jarvis and E.A. Patrick, “Clustering Using a Similarity Measure Based on Shared Nearest Neighbors,” IEEE Trans. Computers, vol. 22, no. 11, Nov. 1973.]]

Digital Library

[61]

L. Ertoz M. Steinbach and V. Kumar, “A New Shared Nearest Neighbor Clustering Algorithm and Its Applications,” Proc. Workshop Clustering High Dimensional Data and Its Applications at Second SIAM Int'l Conf. Data Mining, http://www-users.cs.umn.edu/ kumar/papers/papers.html, 2002.]]

Cited By

Hao ZLu ZLi GNie FWang RLi X(2024)Ensemble Clustering With Attentional RepresentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329257336:2(581-593)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TKDE.2023.3292573
Du YLu GJi G(2024)Robust Least Squares Regression for Subspace Clustering: A Multi-View Clustering PerspectiveIEEE Transactions on Image Processing10.1109/TIP.2023.332756433(216-227)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2023.3327564
Yu JDo H(2024)Proximity-based density description with regularized reconstruction algorithm for anomaly detectionInformation Sciences: an International Journal10.1016/j.ins.2023.119816654:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.ins.2023.119816
Show More Cited By

Index Terms

Combining Multiple Clusterings Using Evidence Accumulation

Recommendations

Combining multiple clusterings using similarity graph

Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that ...
On the Scalability of Evidence Accumulation Clustering
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern Recognition

This work focuses on the scalability of the Evidence Accumulation Clustering (EAC) method. We first address the space complexity of the co-association matrix. The sparseness of the matrix is related to the construction of the clustering ensemble. Using ...
Probabilistic consensus clustering using evidence accumulation

Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 27, Issue 6

June 2005

176 pages

ISSN:0162-8828

Issue’s Table of Contents

Copyright © 2005.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 June 2005

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

267
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hao ZLu ZLi GNie FWang RLi X(2024)Ensemble Clustering With Attentional RepresentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329257336:2(581-593)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TKDE.2023.3292573
Du YLu GJi G(2024)Robust Least Squares Regression for Subspace Clustering: A Multi-View Clustering PerspectiveIEEE Transactions on Image Processing10.1109/TIP.2023.332756433(216-227)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2023.3327564
Yu JDo H(2024)Proximity-based density description with regularized reconstruction algorithm for anomaly detectionInformation Sciences: an International Journal10.1016/j.ins.2023.119816654:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.ins.2023.119816
Bian ZQu JZhou JJiang ZWang S(2024)Weighted adaptively ensemble clustering method based on fuzzy Co-association matrixInformation Fusion10.1016/j.inffus.2023.102099103:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.inffus.2023.102099
Cheng DLiu SXia SWang G(2024)Granular-ball computing-based manifold clustering algorithms for ultra-scalable dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123313247:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123313
Gu QWang YWang PLi XChen LXiong NLiu D(2024)An improved weighted ensemble clustering based on two-tier uncertainty measurementExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121672238:PAOnline publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121672
Xu JLi TZhang DWu J(2024)Ensemble clustering via fusing global and local structure informationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121557237:PBOnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121557
Shan YLi SLi FCui YLi SChen MHe X(2024)Fuzzy self-consistent clustering ensembleApplied Soft Computing10.1016/j.asoc.2023.111151151:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.asoc.2023.111151
Zhou SDuan RChen ZSong W(2024)Weighted ensemble clustering with multivariate randomness and random walk strategyApplied Soft Computing10.1016/j.asoc.2023.111015150:COnline publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.asoc.2023.111015
Zhang HWang YChen YSun J(2024)Consistency-oriented clustering ensemble via data reconstructionApplied Intelligence10.1007/s10489-024-05654-054:20(9641-9654)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10489-024-05654-0
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents