Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Combining Multiple Clusterings Using Evidence Accumulation

Published: 01 June 2005 Publication History

Abstract

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n \times n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.

References

[1]
D. Fasulo, “An Analysis of Recent Work on Clustering,” technical report, Univ. of Washington, Seatle, http://www.cs.washington.edu/homes/dfasulo/clustering.ps, http://citeseer.nj.nec.com/fasulo99analysi.html, 1999.]]
[2]
D. Judd P. Mckinley and A.K. Jain, “Large-Scale Parallel Data Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.]]
[3]
S.K. Bhatia and J.S. Deogun, “Conceptual Clustering in Information Retrieval,” IEEE Trans. Systems, Man, and Cybernetics, vol. 28, no. 3, pp. 427-536, 1998.]]
[4]
C. Carpineto and G. Romano, “A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval,” Machine Learning, vol. 24, no. 2, pp. 95-122, 1996.]]
[5]
E.J. Pauwels and G. Frederix, “Finding Regions of Interest for Content-Extraction,” Proc. IS&T/SPIE Conf. Storage and Retrieval for Image and Video Databases VII, vol. 3656, pp. 501-510, Jan. 1999.]]
[6]
H. Frigui and R. Krishnapuram, “A Robust Competitive Clustering Algorithm with Applications in Computer Vision,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp.nbsp450-466, May 1999.]]
[7]
A.K. Jain M.N. Murty and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sept. 1999.]]
[8]
R.O. Duda P.E. Hart and D.G. Stork, Pattern Classification, second ed. Wiley, 2001.]]
[9]
L. Kaufman and P.J. Rosseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc., 1990.]]
[10]
B. Everitt, Cluster Analysis. John Wiley and Sons, 1993.]]
[11]
S. Theodoridis and K. Koutroumbas, Pattern Recognition. Academic Press, 1999.]]
[12]
A.K. Jain and J.V. Moreau, “Bootstrap Technique in Cluster Analysis,” Pattern Recognition, vol. 20, no. 5, pp. 547-568, 1987.]]
[13]
R. Kothari and D. Pitts, “On Finding the Number of Clusters,” Pattern Recognition Letters, vol. 20, pp. 405-416, 1999.]]
[14]
J. Buhmann and M. Held, “Unsupervised Learning without Overfitting: Empirical Risk Approximation as an Induction Principle for Reliable Clustering,” Proc. Int'l Conf. Advances in Pattern Recognition, S. Singh, ed., pp. 167-176, 1999.]]
[15]
D. Stanford and A.E. Raftery, “Principal Curve Clustering with Noise,” technical Report, Univ. of Washington, http://www. stat.washington.edu/raftery, 1997.]]
[16]
Y. Man and I. Gath, “Detection and Separation of Ring-Shaped Clusters Using Fuzzy Clusters,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 8, pp. 855-861, Aug. 1994.]]
[17]
R. Dubes and A.K. Jain, “Validity Studies in Clustering Methodologies,” Pattern Recognition, vol. 11, pp. 235-254, 1979.]]
[18]
T.A. Bailey and R. Dubes, “Cluster Validity Profiles,” Pattern Recognition, vol. 15, no. 2, pp. 61-83, 1982.]]
[19]
M. Har-Even and V.L. Brailovsky, “Probabilistic Validation Approach for Clustering,” Pattern Recognition, vol. 16, pp. 1189-1196, 1995.]]
[20]
N.R. Pal and J.C. Bezdek, “On Cluster Validity for the Fuzzy C-Means Model,” IEEE Trans. Fuzzy Systems, vol. 3, pp. 370-379, 1995.]]
[21]
A. Fred and J. Leitão, “Clustering under a Hypothesis of Smooth Dissimilarity Increments,” Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 190-194, 2000.]]
[22]
A. Fred, “Clustering Based on Dissimilarity First Derivatives,” Proc. Second Int'l Workshop Pattern Recognition in Information Systems, J. Inesta and L. Micó, eds. pp. 257-266, 2002.]]
[23]
G. McLachlan and K. Basford, Mixture Models: Inference and Application to Clustering. New York: Marcel Dekker, 1988.]]
[24]
S. Roberts D. Husmeier I. Rezek and W. Penny, “Bayesian Approaches to Gaussian Mixture Modeling,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1133-1142, Nov. 1998.]]
[25]
M. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar. 2002.]]
[26]
J.D. Banfield and A.E. Raftery, “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, vol. 49, pp. 803-821, Sept. 1993.]]
[27]
B. Mirkin, “Concept Learning and Feature Selection Based on Square-Error Clustering,” Machine Learning, vol. 35, pp. 25-39, 1999.]]
[28]
A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall, 1988.]]
[29]
H. Tenmoto M. Kudo and M. Shimbo, “MDL-Based Selection of the Number of Components in Mixture Models for Pattern Recognition,” Proc. Advances in Pattern Recognition, A. Amin, D.nbspDori, P. Pudil, and H. Freeman, eds., pp. 831-836, 1998.]]
[30]
H. Bischof and A. Leonardis, “Vector Quantization and Minimum Description Length,” Proc. Int'l Conf. Advances on Pattern Recognition, S. Singh, ed., pp. 355-364, 1999.]]
[31]
B. Fischer T. Zoller and J. Buhmann, “Path Based Pairwise Data Clustering with Application to Texture Segmentation,” Proc. Third Int'l Workshop Energy Minimization Methods in Computer Vision and Pattern Recognition, M. Figueiredo, J. Zerubia, and A.K. Jain, eds., pp. 235-266, 2001.]]
[32]
K. Fukunaga, Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990.]]
[33]
E. Gokcay and J.C. Principe, “Information Theoretic Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 158-171, Feb. 2002.]]
[34]
C. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Structures,” IEEE Trans. Computers, vol. 20, no. 1, pp.nbsp68-86, Jan. 1971.]]
[35]
Y. El-Sonbaty and M.A. Ismail, “On-Line Hierarchical Clustering,” Pattern Recognition Letters, pp. 1285-1291, 1998.]]
[36]
M. Chavent, “A Monothetic Clustering Method,” Pattern Recognition Letters, vol. 19, pp. 989-996, 1998.]]
[37]
A. Fred and J. Leitão, “A Comparative Study of String Dissimilarity Measures in Structural Clustering,” Proc. Int'l Conf. Advances in Pattern Recognition, S. Singh, ed., pp. 385-394, 1998.]]
[38]
S. Guha R. Rastogi and K. Shim, “CURE: An Efficient Clustering Algorithm for Large Databases,” Proc. 1998 ACM-SIGMOID Int'l Conf. Management of Data, 1998.]]
[39]
E.W. Tyree and J.A. Long, “The Use of Linked Line Segments for Cluster Representation and Data Reduction,” Pattern Recognition Letters, vol. 20, pp. 21-29, 1999.]]
[40]
Y. Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, pp. 790-799, 1995.]]
[41]
D. Comaniciu and P. Meer, “Distribution Free Decomposition of Multivariate Data,” Pattern Analysis and Applications, vol. 2, pp. 22-30, 1999.]]
[42]
G. Karypis E.-H. Han and V. Kumar, “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” Computer, vol. 32, no. 8, pp. 68-75, Aug. 1999.]]
[43]
P. Bajcsy and N. Ahuja, “Location- and Density-Based Hierarchical Clustering Using Similarity Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 9, pp. 1011-1015, Sept. 1998.]]
[44]
J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.]]
[45]
A.Y. Ng M.I. Jordan and Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm,” Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., Cambridge, Mass.: MIT Press, 2002.]]
[46]
N. Cristianini J. Shawe-Taylor and J. Kandola, “Spectral Kernel Methods for Clustering,” Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., Cambridge, Mass.: MIT Press, 2002.]]
[47]
P.-Y. Yin, “Algorithms for Straight Line Fitting Using k-Means,” Pattern Recognition Letters, vol. 19, pp. 31-41, 1998.]]
[48]
C. Fraley and A.E. Raftery, “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis,” The Computer J., vol. 41, no. 8, pp. 578-588, 1998.]]
[49]
R. Dubes and A.K. Jain, “Clustering Tecnhiques: The User's Dilemma,” Pattern Recognition, vol. 8, pp. 247-260, 1976.]]
[50]
J. Kittler M. Hatef R.P Duin and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.]]
[51]
T. Dietterich, “Ensemble Methods in Machine Learning,” Proc. First Int'l Workshop Multiple Classifier Systems, J. Kittler and F.nbspRoli,nbspeds., pp. 1-15, 2000.]]
[52]
L. Lam, “Classifier Combinations: Implementations and Theoretical Issues,” Proc. First Int'l Workshop Multiple Classifier Systems, J.nbspKittler and F. Roli, eds., pp. 78-86, 2000.]]
[53]
A. Fred, “Finding Consistent Clusters in Data Partitions,” Proc. Second Int'l Workshop Multiple Classifier Systems, J. Kittler and F.nbspRoli, eds., pp. 309-318, 2001.]]
[54]
A. Fred and A.K. Jain, “Data Clustering Using Evidence Accumulation,” Proc. 16th Int'l Conf. Pattern Recognition, pp. 276-280, 2002.]]
[55]
A. Fred and A.K. Jain, “Evidence Accumulation Clustering Based on the k-Means Algorithm,” Proc. Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR Int'l Workshops SSPR 2002 and SPR 2002, T. Caelli, et al., eds., pp. 442-451, 2002.]]
[56]
A. Strehl and J. Ghosh, “Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions,” J. Machine Learning Research, vol. 3, pp. 583-617, Dec. 2002.]]
[57]
B. Kamgar-Parsi and L.N. Kanal, “An Improved Branch and Bound Algorithm for Computing k-Nearest Neighbors,” Pattern Recognition Letters, vol. I, pp. 195-205, 1985.]]
[58]
T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, 1991.]]
[59]
A. Raftery K. Yeung C. Fraley and W. Ruzzo, “Model-Based Clustering and Data Transformation for Gene Expression Data,” Technical Report UW-CSE-01-04-02, Dept. of Computer Science and Eng., Univ. of Washington, 2001.]]
[60]
R.A. Jarvis and E.A. Patrick, “Clustering Using a Similarity Measure Based on Shared Nearest Neighbors,” IEEE Trans. Computers, vol. 22, no. 11, Nov. 1973.]]
[61]
L. Ertoz M. Steinbach and V. Kumar, “A New Shared Nearest Neighbor Clustering Algorithm and Its Applications,” Proc. Workshop Clustering High Dimensional Data and Its Applications at Second SIAM Int'l Conf. Data Mining, http://www-users.cs.umn.edu/ kumar/papers/papers.html, 2002.]]

Cited By

View all
  • (2024)Ensemble Clustering With Attentional RepresentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329257336:2(581-593)Online publication date: 1-Feb-2024
  • (2024)Robust Least Squares Regression for Subspace Clustering: A Multi-View Clustering PerspectiveIEEE Transactions on Image Processing10.1109/TIP.2023.332756433(216-227)Online publication date: 1-Jan-2024
  • (2024)Proximity-based density description with regularized reconstruction algorithm for anomaly detectionInformation Sciences: an International Journal10.1016/j.ins.2023.119816654:COnline publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 27, Issue 6
June 2005
176 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 June 2005

Author Tags

  1. Cluster analysis
  2. Index Terms- Cluster analysis
  3. K-means algorithm
  4. cluster fusion
  5. cluster validity
  6. combining clustering partitions
  7. evidence accumulation
  8. mutual information.
  9. robust clustering
  10. single-link method

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Ensemble Clustering With Attentional RepresentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329257336:2(581-593)Online publication date: 1-Feb-2024
  • (2024)Robust Least Squares Regression for Subspace Clustering: A Multi-View Clustering PerspectiveIEEE Transactions on Image Processing10.1109/TIP.2023.332756433(216-227)Online publication date: 1-Jan-2024
  • (2024)Proximity-based density description with regularized reconstruction algorithm for anomaly detectionInformation Sciences: an International Journal10.1016/j.ins.2023.119816654:COnline publication date: 1-Jan-2024
  • (2024)Weighted adaptively ensemble clustering method based on fuzzy Co-association matrixInformation Fusion10.1016/j.inffus.2023.102099103:COnline publication date: 1-Mar-2024
  • (2024)Granular-ball computing-based manifold clustering algorithms for ultra-scalable dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123313247:COnline publication date: 1-Aug-2024
  • (2024)An improved weighted ensemble clustering based on two-tier uncertainty measurementExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121672238:PAOnline publication date: 15-Mar-2024
  • (2024)Ensemble clustering via fusing global and local structure informationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121557237:PBOnline publication date: 1-Feb-2024
  • (2024)Fuzzy self-consistent clustering ensembleApplied Soft Computing10.1016/j.asoc.2023.111151151:COnline publication date: 17-Apr-2024
  • (2024)Weighted ensemble clustering with multivariate randomness and random walk strategyApplied Soft Computing10.1016/j.asoc.2023.111015150:COnline publication date: 12-Apr-2024
  • (2024)Consistency-oriented clustering ensemble via data reconstructionApplied Intelligence10.1007/s10489-024-05654-054:20(9641-9654)Online publication date: 1-Oct-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media