research-article

A new formulation of sparse multiple kernel k$$ k $$‐means clustering and its applications

Authors:

Lingchen KongAuthors Info & Claims

Statistical Analysis and Data Mining: The ASA Data Science Journal, Volume 16, Issue 5

Pages 436 - 455

https://doi.org/10.1002/sam.11621

Published: 01 September 2023 Publication History

Abstract

Multiple kernel k$$ k $$‐means (MKKM) clustering has been an important research topic in statistical machine learning and data mining over the last few decades. MKKM combines a group of prespecified base kernels to improve the clustering performance. Although many efforts have been made to improve the performance of MKKM further, the present works do not sufficiently consider the potential structure of the partition matrix. In this paper, we propose a novel sparse multiple kernel k$$ k $$‐means (SMKKM) clustering by introducing a ℓ1$$ {\ell}_1 $$‐norm to induce the sparsity of the partition matrix. We then design an efficient alternating algorithm with curve search technology. More importantly, the convergence and complexity analysis of the designed algorithm are established based on the optimality conditions of the SMKKM. Finally, extensive numerical experiments on synthetic and benchmark datasets demonstrate that the proposed method outperforms the state‐of‐the‐art methods in terms of clustering performance and robustness.

References

[1]

S. Bang, Y. Yu, and W. Wu. Robust multiple kernel k$$ k $$‐means clustering using min‐max optimization, arXiv:1803.02458. 2018.

[2]

N. S. Berry and R. Maitra, TiK‐means: Transformation‐infused K‐means clustering for skewed groups, Stat. Anal. Data min. 12 (2019), 223–233.

Digital Library

[3]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends. Mach. Learn. 3 (2011), 1–122.

Digital Library

[4]

H. Chen, L. Kong, and Y. Li, Nonconvex clustering via l0$$ {\ell}_0 $$ fusion penalized regression, Pattern Recogn. 128 (2022), 108689.

[5]

X. Chen, J. Z. Huang, Q. Wu, and M. Yang, Subspace weighting co‐clustering of gene expression data, IEEE/ACM Trans. Comput. Biol. Bioinform. 16 (2017), 352–364.

[6]

E. C. Chi and K. Lange, Splitting methods for convex clustering, J. Comput. Graph. Stat. 24 (2015), 994–1013.

[7]

D. M. Christopher, R. Prabhakar, and S. Hinrich, Introduction to information retrieval, Cambridge University Press, Cambridge, 2008.

[8]

T. Damoulas and M. A. Girolami, Probabilistic multi‐class multi‐kernel learning: On protein fold recognition and remote homology detection, Bioinformatics 24 (2008), 1264–1270.

Digital Library

[9]

W. E. Donath and A. J. Hoffman, Lower bounds for the partitioning of graphs, IBM J. Res. Dev. 17 (1973), 420–425.

Digital Library

[10]

L. Du, P. Zhou, L. Shi, et al., “Robust multiple kernel k$$ k $$‐means using ℓ2,1$$ {\mathrm{\ell}}_{2,1} $$‐norm,” Proceedings of Twenty‐Fourth International Joint Conference on Artificial Intelligence, AAAI Press, Buenos Aires, Argentina, 2015, pp. 3476–3482.

[11]

E. Elhamifar and R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013), 2765–2781.

Digital Library

[12]

K. Fan, On a theorem of Weyl concerning eigenvalues of linear transformations I, Proc. Natl. Acad. Sci. 35 (1949), 652–655.

[13]

M. R. Garey and D. S. Johnson, Computers and intractability: A guide to the theory of NP‐completeness, W. H. Freeman, New York, 1979.

Digital Library

[14]

M. Girolami, Mercer kernel‐based clustering in feature space, IEEE Trans. Neural Netw. 13 (2002), 780–784.

Digital Library

[15]

M. Gönen and A. A. Margolin, “Localized data fusion for kernel k‐means clustering with application to cancer biology,” Proceedings of Advances in Neural Information Processing Systems, The MIT Press, Montreal, Quebec, Canada, 2014, pp. 1305–1313.

[16]

M. Gong, Y. Liang, J. Shi, W. Ma, and J. Ma, Fuzzy c‐means clustering with local information and kernel metric for image segmentation, IEEE Trans. Image Process. 22 (2012), 573–584.

[17]

M. S. Handcock, A. E. Raftery, and J. M. Tantrum, Model‐based clustering for social networks, J. R. Stat. Soc. Ser. A Stat. Soc. 170 (2007), 301–354.

[18]

H. C. Huang, Y. Y. Chuang, and C. S. Chen, Multiple kernel fuzzy clustering, IEEE Trans. Fuzzy Syst. 20 (2011), 120–134.

Digital Library

[19]

D. Jiang, C. Tang, and A. Zhang, Cluster analysis for gene expression data: A survey, IEEE Trans. Knowl. Data Eng. 16 (2004), 1370–1386.

Digital Library

[20]

C. Jin, P. Netrapalli, and M. Jordan, “What is local optimality in nonconvex‐nonconcave minimax optimization?” Proceedings of International Conference on Machine Learning, PMLR, 2020, pp. 4880–4889.

[21]

S. C. Johnson, Hierarchical clustering schemes, Psychometrika 32 (1967), 241–254.

[22]

X. Liu, SimpleMKKM: Simple multiple kernel k$$ k $$‐means, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2022), 5174–5186.

[23]

X. Liu, Y. Dou, J. Yin, L. Wang, and E. Zhu, “Multiple kernel k‐means clustering with matrix‐induced regularization,” Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, Phoenix, Arizona, USA, 2016, pp. 1888–1894.

[24]

S. Lu, I. Tsaknakis, and M. Hong, “Block alternating optimization for non‐convex min‐max problems: Algorithms and applications in signal processing and communications,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Xplore, Brighton, UK, 2019, pp. 4754–4758.

[25]

J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of 5th Berkeley Symp. Math. Statist. Probability, University of California, Los Angeles, LA, USA, 1967, pp. 281–297.

[26]

G. Menardi, Nonparametric clustering for image segmentation, Stat. Anal. Data min. 13 (2020), 83–97.

Digital Library

[27]

F. Murtagh and P. Contreras, Algorithms for hierarchical clustering: An overview, II, Wiley Interdiscip. Rev. Data min. Knowl. Discov. 7 (2017), e1219.

[28]

A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Proceedings of Advances in Neural Information Processing Systems, The MIT Press, Vancouver, BC, Canada, 2001, pp. 849–856.

[29]

A. Pister, P. Buono, J. D. Fekete, C. Plaisant, and P. Valdivia, Integrating prior knowledge in mixed‐initiative social network clustering, IEEE Trans. Vis. Comput. Graph. 27 (2020), 1775–1785.

[30]

A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, SimpleMKL, J. Mach. Learn. Res. 9 (2008), 2491–2521.

[31]

R. T. Rockafellar and R. J. B. Wets, Variational analysis, Springer, Berlin, 1998.

[32]

B. Schölkopf, A. Smola, and K. R. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10 (1998), 1299–1319.

Digital Library

[33]

J. Shawe‐Taylor and N. Cristianini, Kernel methods for pattern analysis, Cambridge University Press, Cambridge, 2004.

[34]

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000), 888–905.

Digital Library

[35]

D. Sun, K. C. Toh, and Y. Yuan, Convex clustering: Model, theoretical guarantee and efficient algorithm, J. Mach. Learn. Res. 22 (2022), 427–458.

[36]

W. Sun and Y. Yuan, Optimization theory and methods: Nonlinear programming, Springer Science and Business Media, Berlin, 2006.

[37]

J. J. Thiagarajan, K. N. Ramamurthy, and A. Spanias, Multiple kernel sparse representations for supervised and unsupervised learning, IEEE Trans. Image Process. 23 (2014), 2905–2915.

[38]

L. Van der Maaten and G. Hinton, Visualizing data using t‐SNE, J. Mach. Learn. Res. 9 (2008), 2579–2605.

[39]

L. A. Vese and S. J. Osher, Numerical methods for p‐harmonic flows and applications to image processing, SIAM J. Numer. Anal. 40 (2002), 2085–2104.

Digital Library

[40]

R. Wang, J. Lu, Y. Lu, F. Nie, and X. Li, Discrete and parameter‐free multiple kernel k$$ k $$‐means, IEEE Trans. Image Process. 31 (2022), 2796–2808.

[41]

Z. Wen and W. Yin, A feasible method for optimization with orthogonality constraints, Math. Program. 142 (2013), 397–434.

Digital Library

[42]

X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, Top 10 algorithms in data mining, Knowl. Inf. Syst. 14 (2008), 1–37.

Digital Library

[43]

J. Xu, M. Yu, L. Shao, W. Zuo, D. Meng, L. Zhang, and D. Zhang, Scaled simplex representation for subspace clustering, IEEE Trans. Cybern. 51 (2021), 1493–1505.

[44]

Y. Yang, D. Xu, F. Nie, S. Yan, and Y. Zhuang, Image clustering using local discriminant models and global integration, IEEE Trans. Image Process. 19 (2010), 2761–2773.

Digital Library

[45]

Y. Yao, Y. Li, B. Jiang, and H. Chen, Multiple kernel k$$ k $$‐means clustering by selecting representative kernels, IEEE Trans. Neural Networks Learn. Syst. 32 (2021), 4983–4996.

[46]

H. Zha, X. He, C. Ding, M. Gu, and H. Simon, “Spectral relaxation for k$$ k $$‐means clustering,” Proceedings of Advances in Neural Information Processing Systems, The MIT Press, Vancouver, BC, Canada, 2001, pp. 1057–1064.

[47]

L. Zhang, W. Zhou, and L. Jiao, Kernel clustering algorithm, Chinese J. Comput. 25 (2002), 587–590.

[48]

B. Zhao, J. T. Kwok, and C. Zhang, “Multiple kernel clustering,” Proceedings of the 2009 SIAM international conference on data mining, Sparks, Nevada, 2009, pp. 638–649.

Recommendations

A clustering validity criteria-guided unsupervised sparse subset selection algorithm
AIPR '23: Proceedings of the 2023 6th International Conference on Artificial Intelligence and Pattern Recognition

The newly proposed sparse subset selection (DS3) algorithm can effectively perform the clustering of data and the selection of representatives for each cluster (i.e. a subset of the entire data set) simultaneously. It can be formulated as a row-sparsity ...
Multiple kernel k-means clustering with matrix-induced regularization
AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

Multiple kernel k-means (MKKM) clustering aims to optimally combine a group of pre-specified kernels to improve clustering performance. However, we observe that existing MKKM algorithms do not sufficiently consider the correlation among these kernels. ...
A Scale-Invariant Approach for Sparse Signal Recovery

In this paper, we study the ratio of the $L_1 $ and $L_2 $ norms, denoted as $L_1/L_2$, to promote sparsity. Due to the nonconvexity and nonlinearity, there has been little attention to this scale-invariant model. Compared to popular models in the ...

Comments

Information & Contributors

Information

Published In

cover image Statistical Analysis and Data Mining

Statistical Analysis and Data Mining Volume 16, Issue 5

October 2023

100 pages

ISSN:1932-1864

EISSN:1932-1872

DOI:10.1002/sam.v16.5

Issue’s Table of Contents

© 2023 Wiley Periodicals LLC.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 September 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents