Abstract
The construction process for a similarity matrix has an important impact on the performance of spectral clustering algorithms. In this paper, we propose a random walk based approach to process the Gaussian kernel similarity matrix. In this method, the pair-wise similarity between two data points is not only related to the two points, but also related to their neighbors. As a result, the new similarity matrix is closer to the ideal matrix which can provide the best clustering result. We give a theoretical analysis of the similarity matrix and apply this similarity matrix to spectral clustering. We also propose a method to handle noisy items which may cause deterioration of clustering performance. Experimental results on real-world data sets show that the proposed spectral clustering algorithm significantly outperforms existing algorithms.
Similar content being viewed by others
References
Ng A Y, Jordan M I, Weiss Y. On spectral clustering: analysis and an algorithm. In: Proceedings of Advances in Neural Information Pressing Systems 14. 2001, 849–856
Wang F, Zhang C S, Shen H C, Wang J D. Semi-supervised classification using linear neighborhood propagation. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2006, 160–167
Wang F, Zhang C S. Robust self-tuning semi-supervised learning. Neurocomputing, 2006, 70(16–18): 2931–2939
Kamvar S D, Klein D, Manning C D. Spectral learning. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence. 2003, 561–566
Lu Z D, Carreira-Perpiňán M A. Constrained spectral clustering through affinity propagation. In: Proceedings of 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2008, 1–8
Meila M, Shi J. A random walks view of spectral segmentation. In: Proceedings of 8th International Workshop on Artificial Intelligence and Statistics. 2001
Azran A, Ghahramani Z. Spectral methods for automatic multiscale data clustering. In: Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2006, 190–197
Meila M. The multicut lemma.UW Statistics Technical Report 417, 2001
Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888–905
Hagen L, Kahng A B. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992, 11(9): 1074–1085
Ding C H Q, He X F, Zha H Y, Gu M, Simon H D. A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings of 1st IEEE International Conference on Data Mining. 2001, 107–114
von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4): 395–416
Zelnik-Manor L, Perona P. Self-tuning spectral clustering. In. Proceedings of Advances in Neural Information Processing Systems 17. 2004, 1601–1608
Huang T, Yang C. Matrix Analysis with Applications. Beijing: Scientific Publishing House, 2007 (in Chinese)
Lovász L, Lov L, Erdos O. Random walks on graphs: a survey. Combinatorics, 1993, 2: 353–398
Gong C H. Matrix Theory and Applications. Beijing: Scientific Publishing House, 2007 (in Chinese)
Tian Z, Li X B, Ju Y W. Spectral clustering based on matrix perturbation theory. Science in China Series F: Information Sciences, 2007, 50(1): 63–81
Fouss F, Pirotte A, Renders J, Saerens M. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 355–369
Banerjee A, Dhillon I, Ghosh J, Sra S. Generative model-based clustering of directional data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 19–28
Wang L, Leckie C, Ramamohanarao K, Bezdek J C. Approximate spectral clustering. In: Proceedings of 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2009, 134–146
Fowlkes C, Belongie S, Chung F, Malik J. Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 214–225
Puzicha J, Belongie S. Model-based halftoning for color image segmentation. In: Proceedings of 15th International Conference on Pattern Recognition. 2000, 629–632
Puzicha J, Held M, Ketterer J, Buhmann J M, Fellner D W. On spatial quantization of color images. IEEE Transactions on Image Processing, 2000, 9(4): 666–682
Author information
Authors and Affiliations
Corresponding author
Additional information
Xianchao Zhang is a full professor at Dalian University of Technology, China. He received his B.S degree in Applied Mathematics and M.S. degree in Computational Mathematics from National University of Defense Technology in 1994 and 1998, respectively. He received his Ph.D. in Computer Theory and Software from University of Science and Technology of China in 2000. He joined Dalian University of Technology in 2003 after 2 years of industrial working experience at international companies. He worked as Visiting Scholar at The Australian National University and The City University of Hong Kong in 2005 and 2009, respectively. His research interests include algorithms, machine learning, data mining and information retrieval.
Quanzeng You received his B.S. degree from School of Software, Dalian University of Technology, China in 2009. He is Current a master candidate at Dalian University of Technology, China. He joined the Lab of Intelligent Information Processing at DUT in 2009, under the supervision of Prof. Xianchao Zhang. His research interests include spectral clustering, clustering, semi-supervised learning and other data mining techniques. He is especially interested in spectral methods. Currently, his research mainly focuses on the improvement of spectral clustering and how to apply spectral clustering to large scale problems.
Rights and permissions
About this article
Cite this article
Zhang, X., You, Q. An improved spectral clustering algorithm based on random walk. Front. Comput. Sci. China 5, 268–278 (2011). https://doi.org/10.1007/s11704-011-0023-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-011-0023-0