Revisiting Clustering as Matrix Factorisation on the Stiefel Manifold

Chrétien, Stéphane; Guedj, Benjamin

doi:10.1007/978-3-030-64583-0_1

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12565))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1729 Accesses
1 Altmetric

Abstract

This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sparse clusterability: testing for cluster structure in high dimensions

Article Open access 31 March 2023

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Article 06 May 2022

Approximating Spectral Clustering via Sampling: A Review

Notes

1.
As the reader will be able to check, the values of $\nu _{\max }$ and $\nu _{\min }$ will play an essential role in the expression of the success probability of the method.
2.
Notice that $t_{\min }$ needs to be sufficiently smaller that 2/e in order for the term K to become small for n sufficiently large.
3.
Notation-wise, we will identify the Stiefel manifold with the set of matrices whose first R columns form an orthonormal family and the remaining $n-R$ columns are set to zero.
4.
This formula can be obtained using differentiation along the geodesic defined by the exponential map in the direction $\varDelta $, for all $\varDelta \in T_O(\mathbb {O}_{d,R})$.

References

Alquier, P., Guedj, B.: An oracle inequality for quasi-Bayesian nonnegative matrix factorization. Math. Methods Stat. 26(1), 55–67 (2017). https://doi.org/10.3103/S1066530717010045
Article MathSciNet MATH Google Scholar
Arias-Castro, E., Verzelen, N.: Community detection in dense random networks. Annal. Stat. 42(3), 940–969 (2014)
Article MathSciNet Google Scholar
Bandeira, A.S.: Random Laplacian matrices and convex relaxations. Found. Comput. Math. 18(2), 345–379 (2018)
Article MathSciNet Google Scholar
Blum, A., Hopcroft, J., Kannan, R.: Foundations of data science. Draft book (2016)
Google Scholar
Boumal, N., Voroninski, V., Bandeira, A.: The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In: Advances in Neural Information Processing Systems, pp. 2757–2765 (2016)
Google Scholar
Boutsidis, C., Drineas, P., Mahoney, W.: Unsupervised feature selection for the $k$-means clustering problem. In: Advances in Neural Information Processing Systems, pp. 153–161 (2009)
Google Scholar
Brosse, N., Durmus, A., Moulines, E.: The promises and pitfalls of stochastic gradient Langevin dynamics. In: Advances in Neural Information Processing Systems, pp. 8278–8288 (2018)
Google Scholar
Burer, S., Monteiro, R.D.C.: Local minima and convergence in low-rank semidefinite programming. Math. Program. 103(3), 427–444 (2005)
Article MathSciNet Google Scholar
Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003)
Article MathSciNet Google Scholar
Catoni, O.: In: Picard, J. (ed.): Statistical Learning Theory and Stochastic Optimization. LNM, vol. 1851. Springer, Heidelberg (2004). https://doi.org/10.1007/b99352
Catoni, O.: PAC-Bayesian Supervised Classification. Lecture Notes-Monograph Series, IMS (2007)
Google Scholar
Chrétien, S., Dombry, S., Faivre, A.: A semi-definite programming approach to low dimensional embedding for unsupervised clustering. arXiv preprint arXiv:1606.09190 (2016)
Cohen, M.B., Elder, S., Musco, C., Musco, C., Persu, M.: Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 163–172. ACM (2015)
Google Scholar
Dalalyan, A.S.: Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 79(3), 651–676 (2017)
Article MathSciNet Google Scholar
Dalalyan, A.S., Tsybakov, A.B.: Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72(1–2), 39–61 (2008)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B, 1–38 (1977)
Google Scholar
Durmus, A., Moulines, E.: Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Annal. Appl. Prob. 27(3), 1551–1587 (2017)
MathSciNet MATH Google Scholar
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Article MathSciNet Google Scholar
Giraud, C., Verzelen, N.: Partial recovery bounds for clustering with the relaxed $k$ means. arXiv preprint arXiv:1807.07547 (2018)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM) 42(6), 1115–1145 (1995)
Article MathSciNet Google Scholar
Grenander, U.: Tutorial in pattern theory. Report, Division of Applied Mathematics (1983)
Google Scholar
Grenander, U., Miller, I.: Representations of knowledge in complex systems. J. R. Stat. Soc. Ser. B (Methodol.), 549–603 (1994)
Google Scholar
Guedj, B.: A primer on PAC-Bayesian learning. arXiv preprint arXiv:1901.05353 (2019)
Guédon, O., Vershynin, R.: Community detection in sparse networks via Grothendieck’s inequality. Probab. Theory Relat. Fields 165(3–4), 1025–1049 (2016)
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The elements of statistical learning, pp. 485–585. Springer (2009)
Google Scholar
McAllester, D.: Some PAC-Bayesian theorems. In: COLT, pp. 230–234 (1998)
Google Scholar
McAllester, D.: PAC-Bayesian model averaging. In: COLT, pp. 164–171 (1999)
Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)
MATH Google Scholar
Montanari, A., Sen, S.: Semidefinite programs on sparse random graphs and their application to community detection. arXiv preprint arXiv:1504.05910 (2015)
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)
Article MathSciNet Google Scholar
Royer, M.: Adaptive clustering through semidefinite programming. In: Advances in Neural Information Processing Systems, pp. 1795–1803 (2017)
Google Scholar
Rudelson, M., Vershynin, R.: Hanson-wright inequality and sub-gaussian concentration. Electron. Commun. Prob. 18 (2013). 9 p
Google Scholar
Shawe-Taylor, J., Williamson, R.C.: A PAC analysis of a Bayesian classifier. In: COLT, pp. 2–9 (1997)
Google Scholar
Vershynin, R.: High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)
Google Scholar
Verzelen, N., Arias-Castro, E.: Community detection in sparse random networks. Annal. Appl. Prob. 25(6), 3465–3510 (2015)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Université Lumière-Lyon-II, 69676, Bron Cedex, France
Stéphane Chrétien
The Alan Turing Institute, London, UK
Stéphane Chrétien
The National Physical Laboratory, Teddington, TW11 0LW, UK
Stéphane Chrétien
Inria, Lille - Nord Europe Research Centre, Lille, France
Benjamin Guedj
Department of Computer Science and Centre for Artificial Intelligence, University College London, London, UK
Benjamin Guedj

Authors

Stéphane Chrétien
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Guedj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stéphane Chrétien .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Giorgio Jansen
Almawave, Rome, Italy
Vincenzo Sciacca
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chrétien, S., Guedj, B. (2020). Revisiting Clustering as Matrix Factorisation on the Stiefel Manifold. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-64583-0_1
Published: 08 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics