Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Revisiting Clustering as Matrix Factorisation on the Stiefel Manifold

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2020)

Abstract

This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    As the reader will be able to check, the values of \(\nu _{\max }\) and \(\nu _{\min }\) will play an essential role in the expression of the success probability of the method.

  2. 2.

    Notice that \(t_{\min }\) needs to be sufficiently smaller that 2/e in order for the term K to become small for n sufficiently large.

  3. 3.

    Notation-wise, we will identify the Stiefel manifold with the set of matrices whose first R columns form an orthonormal family and the remaining \(n-R\) columns are set to zero.

  4. 4.

    This formula can be obtained using differentiation along the geodesic defined by the exponential map in the direction \(\varDelta \), for all \(\varDelta \in T_O(\mathbb {O}_{d,R})\).

References

  • Alquier, P., Guedj, B.: An oracle inequality for quasi-Bayesian nonnegative matrix factorization. Math. Methods Stat. 26(1), 55–67 (2017). https://doi.org/10.3103/S1066530717010045

    Article  MathSciNet  MATH  Google Scholar 

  • Arias-Castro, E., Verzelen, N.: Community detection in dense random networks. Annal. Stat. 42(3), 940–969 (2014)

    Article  MathSciNet  Google Scholar 

  • Bandeira, A.S.: Random Laplacian matrices and convex relaxations. Found. Comput. Math. 18(2), 345–379 (2018)

    Article  MathSciNet  Google Scholar 

  • Blum, A., Hopcroft, J., Kannan, R.: Foundations of data science. Draft book (2016)

    Google Scholar 

  • Boumal, N., Voroninski, V., Bandeira, A.: The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In: Advances in Neural Information Processing Systems, pp. 2757–2765 (2016)

    Google Scholar 

  • Boutsidis, C., Drineas, P., Mahoney, W.: Unsupervised feature selection for the \(k\)-means clustering problem. In: Advances in Neural Information Processing Systems, pp. 153–161 (2009)

    Google Scholar 

  • Brosse, N., Durmus, A., Moulines, E.: The promises and pitfalls of stochastic gradient Langevin dynamics. In: Advances in Neural Information Processing Systems, pp. 8278–8288 (2018)

    Google Scholar 

  • Burer, S., Monteiro, R.D.C.: Local minima and convergence in low-rank semidefinite programming. Math. Program. 103(3), 427–444 (2005)

    Article  MathSciNet  Google Scholar 

  • Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003)

    Article  MathSciNet  Google Scholar 

  • Catoni, O.: In: Picard, J. (ed.): Statistical Learning Theory and Stochastic Optimization. LNM, vol. 1851. Springer, Heidelberg (2004). https://doi.org/10.1007/b99352

  • Catoni, O.: PAC-Bayesian Supervised Classification. Lecture Notes-Monograph Series, IMS (2007)

    Google Scholar 

  • Chrétien, S., Dombry, S., Faivre, A.: A semi-definite programming approach to low dimensional embedding for unsupervised clustering. arXiv preprint arXiv:1606.09190 (2016)

  • Cohen, M.B., Elder, S., Musco, C., Musco, C., Persu, M.: Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 163–172. ACM (2015)

    Google Scholar 

  • Dalalyan, A.S.: Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 79(3), 651–676 (2017)

    Article  MathSciNet  Google Scholar 

  • Dalalyan, A.S., Tsybakov, A.B.: Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72(1–2), 39–61 (2008)

    Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B, 1–38 (1977)

    Google Scholar 

  • Durmus, A., Moulines, E.: Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Annal. Appl. Prob. 27(3), 1551–1587 (2017)

    MathSciNet  MATH  Google Scholar 

  • Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

    Article  MathSciNet  Google Scholar 

  • Giraud, C., Verzelen, N.: Partial recovery bounds for clustering with the relaxed \(k\) means. arXiv preprint arXiv:1807.07547 (2018)

  • Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM) 42(6), 1115–1145 (1995)

    Article  MathSciNet  Google Scholar 

  • Grenander, U.: Tutorial in pattern theory. Report, Division of Applied Mathematics (1983)

    Google Scholar 

  • Grenander, U., Miller, I.: Representations of knowledge in complex systems. J. R. Stat. Soc. Ser. B (Methodol.), 549–603 (1994)

    Google Scholar 

  • Guedj, B.: A primer on PAC-Bayesian learning. arXiv preprint arXiv:1901.05353 (2019)

  • Guédon, O., Vershynin, R.: Community detection in sparse networks via Grothendieck’s inequality. Probab. Theory Relat. Fields 165(3–4), 1025–1049 (2016)

    Article  MathSciNet  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The elements of statistical learning, pp. 485–585. Springer (2009)

    Google Scholar 

  • McAllester, D.: Some PAC-Bayesian theorems. In: COLT, pp. 230–234 (1998)

    Google Scholar 

  • McAllester, D.: PAC-Bayesian model averaging. In: COLT, pp. 164–171 (1999)

    Google Scholar 

  • McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)

    MATH  Google Scholar 

  • Montanari, A., Sen, S.: Semidefinite programs on sparse random graphs and their application to community detection. arXiv preprint arXiv:1504.05910 (2015)

  • Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)

    Article  MathSciNet  Google Scholar 

  • Royer, M.: Adaptive clustering through semidefinite programming. In: Advances in Neural Information Processing Systems, pp. 1795–1803 (2017)

    Google Scholar 

  • Rudelson, M., Vershynin, R.: Hanson-wright inequality and sub-gaussian concentration. Electron. Commun. Prob. 18 (2013). 9 p

    Google Scholar 

  • Shawe-Taylor, J., Williamson, R.C.: A PAC analysis of a Bayesian classifier. In: COLT, pp. 2–9 (1997)

    Google Scholar 

  • Vershynin, R.: High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)

    Google Scholar 

  • Verzelen, N., Arias-Castro, E.: Community detection in sparse random networks. Annal. Appl. Prob. 25(6), 3465–3510 (2015)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stéphane Chrétien .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chrétien, S., Guedj, B. (2020). Revisiting Clustering as Matrix Factorisation on the Stiefel Manifold. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64583-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64582-3

  • Online ISBN: 978-3-030-64583-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics