Abstract
Recently, a large amount of work has been devoted to the study of spectral clustering—a simple yet powerful method for finding structure in a data set using spectral properties of an associated pairwise similarity matrix. Most of the existing spectral clustering algorithms estimate only one cluster number or estimate non-unique cluster numbers based on eigengap criterion. However, the number of clusters not always exists one, and eigengap criterion lacks theoretical justification. In this paper, we propose non-unique cluster numbers determination methods based on stability in spectral clustering (NCNDBS). We first utilize the multiway normalized cut spectral clustering algorithm to cluster data set for a candidate cluster number \(k\). Then the ratio value of the multiway normalized cut criterion of the obtained clusters and the sum of the leading eigenvalues (descending sort) of the stochastic transition matrix is chosen as a standard to decide whether the \(k\) is a reasonable cluster number. At last, by varying the scaling parameter in the Gaussian function, we judge whether the reasonable cluster number \(k\) is also a stability one. By three stages, we can determine non-unique cluster numbers of a data set. The Lumpability theorem concluded by Meil\(\breve{a}\) and Xu provides a theoretical base for our methods. NCNDBS can estimate non-unique cluster numbers of the data set successfully by illustrative experiments.
Similar content being viewed by others
References
Azran A, Ghahramani Z (2006) Spectral methods for automatic multiscale data clustering. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society, Washington
Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17:355–379
Climescu-Haulica A (2007) How to choose the number of clusters: the Cramer multiplicity solution. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 15–22
Li T (2008) Clustering based on matrix spproximation: a unifying view. Knowl Inf Syst 17:1–15
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Meil\(\breve{a}\) M, Shi J, (2000) Learning segmentation by random walks. In: Todd L, Thomas D, Volker T (eds) Neural information processing systems, Denver, USA, December 2000. Advances in neural information processing systems. MIT Press, Cambridge, pp 873–879
Meil\(\breve{a}\) M, Xu L (2003) Multiway cuts and spectral clustering. Technical Report 442: University of Washington.
Milligan G, Cooper M (1985) An examination of procedures for determining number of clusters in a data set. Psychometrika 50:159–179
Nagai A (2007) Inappropriateness of the criterion of K-way normalized cuts for deciding the number of clusters. Pattern Recognit Lett 28:1981–1986
Nascimento M, Carvalho A (2011) Spectral methods for graph clustering: a survey. Eur J Oper Res 211(2):221–231
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Neural information processing systems, Denver, USA, (December 2001). Advances in neural information processing systems. MIT Press, Cambridge, pp 849–856
Sanguinetti G, Laidler J, Lawrence N (2005) A probabilistic approach to spectral clustering: using KL divergence to find good clusters. Statistics and Optimization of Clustering Workshop, London
Sanguinetti G, Laidler J, Lawrence N (2005) Automatic determination of the number of clusters using spectral algorithms. Mystic, Proceedings of machine learning for signal processing
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Stewart G, Sun J (2001) Matrix perturbation theory, 2nd edn. Academic, New York
Sumuya Guo C, Zang Y (2011) Cluster number estimation based on normalized cut criterion in spectral clustering. ICIC Express Lett 5(1):155–161
Takacs B, Demiris Y (2010) Spectral clustering in multi-agent systems. Knowl Inf Syst 25:607–622
Tepper M, Musé P, Almansa A, Mejail M (2011) Automatically finding clusters in normalized cuts. Pattern Recognit 44:1372–1386
Tian Z, Li X, Ju Y (2007) Spectral clustering based on matrix perturbation theory. Sci China Ser F Inf Sci 50(1):63–81
Wang C, Li W, Ding L, Tian J, Chen S (2005) Image segmentation using spectral clustering. Proceedings of the 17th IEEE international conference on tools with artificial intelligence. IEEE Computer Society, Washington, pp 677–678
Xiang T, Gong S (2008) Spectral clustering with eigenvector selection. Pattern Recognit 41(3):1012–1029
Xu R, Donald W (2009) Clustering. Wiley, New Jersey
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Lawrence S, Yair W, Léon B (eds) Neural information processing systems, Vancouver, Canada, December 2004. Advances in neural information processing systems. MIT Press, Cambridge, pp 1601–1608
Zhang X, You Q (2011) An improved spectral clustering algorithm based on random walk. Front Comput Sci China 5(3):268–278
Zheng X, Lin X (2004) Automatic determination of intrinsic cluster number family in spectral clustering using random walk on graph. International conference on image processing, Singapore, October 2004. IEEE Computer Society, Singapore, pp 3471–3474
Acknowledgments
The authors are grateful to the reviewers for these precious comments and suggestions, which led to an improved version of the paper. This work was partly supported by the Natural Science Foundation of China (No. 71171030), the Program for New Century Excellent Talents in University (No. NCET-11-0050) and the Program of Higher-level talents of Inner Mongolia University (No. SPH-IMU-125116).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Borjigin, S., Guo, C. Non-unique cluster numbers determination methods based on stability in spectral clustering. Knowl Inf Syst 36, 439–458 (2013). https://doi.org/10.1007/s10115-012-0547-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0547-0