Abstract
This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
As the reader will be able to check, the values of \(\nu _{\max }\) and \(\nu _{\min }\) will play an essential role in the expression of the success probability of the method.
- 2.
Notice that \(t_{\min }\) needs to be sufficiently smaller that 2/e in order for the term K to become small for n sufficiently large.
- 3.
Notation-wise, we will identify the Stiefel manifold with the set of matrices whose first R columns form an orthonormal family and the remaining \(n-R\) columns are set to zero.
- 4.
This formula can be obtained using differentiation along the geodesic defined by the exponential map in the direction \(\varDelta \), for all \(\varDelta \in T_O(\mathbb {O}_{d,R})\).
References
Alquier, P., Guedj, B.: An oracle inequality for quasi-Bayesian nonnegative matrix factorization. Math. Methods Stat. 26(1), 55–67 (2017). https://doi.org/10.3103/S1066530717010045
Arias-Castro, E., Verzelen, N.: Community detection in dense random networks. Annal. Stat. 42(3), 940–969 (2014)
Bandeira, A.S.: Random Laplacian matrices and convex relaxations. Found. Comput. Math. 18(2), 345–379 (2018)
Blum, A., Hopcroft, J., Kannan, R.: Foundations of data science. Draft book (2016)
Boumal, N., Voroninski, V., Bandeira, A.: The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In: Advances in Neural Information Processing Systems, pp. 2757–2765 (2016)
Boutsidis, C., Drineas, P., Mahoney, W.: Unsupervised feature selection for the \(k\)-means clustering problem. In: Advances in Neural Information Processing Systems, pp. 153–161 (2009)
Brosse, N., Durmus, A., Moulines, E.: The promises and pitfalls of stochastic gradient Langevin dynamics. In: Advances in Neural Information Processing Systems, pp. 8278–8288 (2018)
Burer, S., Monteiro, R.D.C.: Local minima and convergence in low-rank semidefinite programming. Math. Program. 103(3), 427–444 (2005)
Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003)
Catoni, O.: In: Picard, J. (ed.): Statistical Learning Theory and Stochastic Optimization. LNM, vol. 1851. Springer, Heidelberg (2004). https://doi.org/10.1007/b99352
Catoni, O.: PAC-Bayesian Supervised Classification. Lecture Notes-Monograph Series, IMS (2007)
Chrétien, S., Dombry, S., Faivre, A.: A semi-definite programming approach to low dimensional embedding for unsupervised clustering. arXiv preprint arXiv:1606.09190 (2016)
Cohen, M.B., Elder, S., Musco, C., Musco, C., Persu, M.: Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 163–172. ACM (2015)
Dalalyan, A.S.: Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 79(3), 651–676 (2017)
Dalalyan, A.S., Tsybakov, A.B.: Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72(1–2), 39–61 (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B, 1–38 (1977)
Durmus, A., Moulines, E.: Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Annal. Appl. Prob. 27(3), 1551–1587 (2017)
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Giraud, C., Verzelen, N.: Partial recovery bounds for clustering with the relaxed \(k\) means. arXiv preprint arXiv:1807.07547 (2018)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM) 42(6), 1115–1145 (1995)
Grenander, U.: Tutorial in pattern theory. Report, Division of Applied Mathematics (1983)
Grenander, U., Miller, I.: Representations of knowledge in complex systems. J. R. Stat. Soc. Ser. B (Methodol.), 549–603 (1994)
Guedj, B.: A primer on PAC-Bayesian learning. arXiv preprint arXiv:1901.05353 (2019)
Guédon, O., Vershynin, R.: Community detection in sparse networks via Grothendieck’s inequality. Probab. Theory Relat. Fields 165(3–4), 1025–1049 (2016)
Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The elements of statistical learning, pp. 485–585. Springer (2009)
McAllester, D.: Some PAC-Bayesian theorems. In: COLT, pp. 230–234 (1998)
McAllester, D.: PAC-Bayesian model averaging. In: COLT, pp. 164–171 (1999)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)
Montanari, A., Sen, S.: Semidefinite programs on sparse random graphs and their application to community detection. arXiv preprint arXiv:1504.05910 (2015)
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)
Royer, M.: Adaptive clustering through semidefinite programming. In: Advances in Neural Information Processing Systems, pp. 1795–1803 (2017)
Rudelson, M., Vershynin, R.: Hanson-wright inequality and sub-gaussian concentration. Electron. Commun. Prob. 18 (2013). 9 p
Shawe-Taylor, J., Williamson, R.C.: A PAC analysis of a Bayesian classifier. In: COLT, pp. 2–9 (1997)
Vershynin, R.: High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)
Verzelen, N., Arias-Castro, E.: Community detection in sparse random networks. Annal. Appl. Prob. 25(6), 3465–3510 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chrétien, S., Guedj, B. (2020). Revisiting Clustering as Matrix Factorisation on the Stiefel Manifold. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-64583-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)