Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Multi-block Bregman proximal alternating linearized minimization and its application to orthogonal nonnegative matrix factorization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We introduce and analyze BPALM and A-BPALM, two multi-block proximal alternating linearized minimization algorithms using Bregman distances for solving structured nonconvex problems. The objective function is the sum of a multi-block relatively smooth function (i.e., relatively smooth by fixing all the blocks except one) and block separable (nonsmooth) nonconvex functions. The sequences generated by our algorithms are subsequentially convergent to critical points of the objective function, while they are globally convergent under the KL inequality assumption. Moreover, the rate of convergence is further analyzed for functions satisfying the Łojasiewicz’s gradient inequality. We apply this framework to orthogonal nonnegative matrix factorization (ONMF) that satisfies all of our assumptions and the related subproblems are solved in closed forms, where some preliminary numerical results are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The codes are publicly available at https://github.com/MasoudAhoo/BPALM

References

  1. Ahookhosh, M.: Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity. Math. Methods Oper. Res. 89(3), 319–353 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ahookhosh, M., Hien, L.T.K., Gillis, N., Patrinos, P.: A block inertial bregman proximal algorithm for nonsmooth nonconvex problems with application to symmetric nonnegative matrix tri-factorization. J. Optim. Theory Appl. (2021)

  3. Ahookhosh, M., Themelis, A., Patrinos, P.: A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: superlinear convergence to nonisolated local minima. SIAM J. Optim. 31(1), 653–685 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  4. Araújo, U., Saldanha, B., Galvão, R., Yoneyama, T., Chame, H., Visani, V.: The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometr. Intell. Lab. Syst. 57(2), 65–73 (2001)

    Article  Google Scholar 

  5. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  6. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Alternating proximal algorithms for weakly coupled convex minimization problems. applications to dynamical games and PDE’s. J. Convex Anal. 15(3), 485 (2008)

    MathSciNet  MATH  Google Scholar 

  7. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Attouch, H., Redont, P., Soubeyran, A.: A new class of alternating proximal minimization algorithms with costs-to-move. SIAM J. Optim. 18(3), 1061–1081 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  10. Attouch, H., Soubeyran, A.: Inertia and reactivity in decision making as cognitive variational inequalities. J. Conv. Anal. 13(2), 207 (2006)

    MathSciNet  MATH  Google Scholar 

  11. Auslender, A.: Optimisation méthodes numériques. Mason, Paris (1976)

    MATH  Google Scholar 

  12. Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182, 1068–1087 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Bauschke, H.H., Dao, M.N., Lindstrom, S.B.: Regularizing with Bregman–Moreau envelopes. SIAM J. Optim. 28(4), 3208–3228 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  15. Beck, A., Pauwels, E., Sabach, S.: The cyclic block conditional gradient method for convex optimization problems. SIAM J. Optim. 25(4), 2024–2049 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Beck, A., Sabach, S., Teboulle, M.: An alternating semiproximal method for nonconvex regularized structured total least squares problems. SIAM J. Matrix Anal. Appl. 37(3), 1129–1150 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  17. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  18. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., Hoboken (1989)

    MATH  Google Scholar 

  20. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)

    Article  MATH  Google Scholar 

  21. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  22. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    Article  MATH  Google Scholar 

  23. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  24. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  25. Boţ, R.I., Csetnek, E.R.: An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems. J. Optim. Theory Appl. 171(2), 600–616 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  26. Boţ, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45(2), 682–712 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  27. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  28. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  29. Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. John Wiley & Sons, Hoboken (2009)

    Book  Google Scholar 

  30. Combettes, P.L., Pesquet, J.C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25(2), 1221–1248 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  31. Van den Dries, L.: Tame Topology and o-Minimal Structures, vol. 248. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  32. Fercoq, O., Bianchi, P.: A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM J. Optim. 29(1), 100–134 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  33. Fu, X., Huang, K., Sidiropoulos, N.D., Ma, W.K.: Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications. IEEE Signal Process. Mag. 36(2), 59–80 (2019)

    Article  Google Scholar 

  34. Gillis, N.: The why and how of nonnegative matrix factorization. Regular. Optim. Kernels Support Vector Mach. 12(257), 257–291 (2014)

    Google Scholar 

  35. Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 698–714 (2013)

    Article  MATH  Google Scholar 

  36. Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Operat. Res. Lett. 26(3), 127–136 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  37. Hanzely, F., Richtárik, P.: Fastest rates for stochastic mirror descent methods. arXiv:1803.07374 (2018)

  38. Hanzely, F., Richtarik, P., Xiao, L.: Accelerated bregman proximal gradient methods for relatively smooth convex optimization. Comput Optim Appl 22, 1–36 (2021)

    MathSciNet  Google Scholar 

  39. Kimura, K., Tanaka, Y., Kudo, M.: A fast hierarchical alternating least squares algorithm for orthogonal nonnegative matrix factorization. In: D. Phung, H. Li (eds.) Proceedings of the Sixth Asian Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 39, pp. 129–141. PMLR, Nha Trang City, Vietnam (2015). http://proceedings.mlr.press/v39/kimura14.html

  40. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  41. Latafat, P., Freris, N.M., Patrinos, P.: A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Trans. Autom. Cont. 64(10), 4050–4065 (2019)

  42. Latafat, P., Themelis, A., Patrinos, P.: Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems. Math. Program. 1–30. arxiv.org/abs/1906.10053 (2021)

  43. Li, Q., Zhu, Z., Tang, G., Wakin, M.B.: Provable Bregman-divergence based methods for nonconvex and non-Lipschitz problems. arXiv preprint arXiv:1904.09712 (2019)

  44. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles pp. 87–89 (1963)

  45. Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’institut Fourier 43(5), 1575–1595 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  46. Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  47. Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-concave backtracking for inertial bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math. Data Sci. 2(3), 658–682 (2020)

    Article  MathSciNet  Google Scholar 

  48. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  49. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  50. Pauca, V.P., Piper, J., Plemmons, R.J.: Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl. 416(1), 29–47 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  51. Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imag .Sci. 9(4), 1756–1787 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  52. Pompili, F., Gillis, N., Absil, P.A., Glineur, F.: Two algorithms for orthogonal nonnegative matrix factorization with application to clustering. Neurocomputing 141, 15–25 (2014)

    Article  Google Scholar 

  53. Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  54. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  55. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer Science & Business Media, Berlin (2011)

    MATH  Google Scholar 

  56. Choi, S.: Algorithms for orthogonal nonnegative matrix factorization. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1828–1832 (2008)

  57. Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  58. Tam, M.K.: Regularity properties of non-negative sparsity sets. J. Math. Anal. Appl. 447(2), 758–777 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  59. Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170(1), 67–96 (2018)

  60. Themelis, A., Ahookhosh, M., Patrinos, P.: On the acceleration of forward-backward splitting via an inexact Newton method. In: Luke, R., Bauschke, H., Burachik, R. (eds.) Splitting Algorithms, Modern Operator Theory, and Applications, pp. 363–412. Springer, Berlin (2019)

    Chapter  Google Scholar 

  61. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  62. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1–2), 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  63. Wang, X., Yuan, X., Zeng, S., Zhang, J., Zhou, J.: Block coordinate proximal gradient method for nonconvex optimization problems: convergence analysis. http://www.optimization-online.org/DB\_HTML/2018/04/6573.html (2018)

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their insightful comments that helped improve the paper; in particular, one of the reviewers gave a suggestion that leads to the kernel function in Proposition 5.1. The first author is grateful to Andreas Themelis for his useful comments and discussions on the paper. MA and PP acknowledge the support by the Research Foundation Flanders (FWO) research projects G086518N and G086318N; Research Council KU Leuven C1 project No. C14/18/068; Fonds de la Recherche Scientifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) under EOS project no 30468160 (SeLMA). LTKH and NG also acknowledge the support by the European Research Council (ERC starting grant no 679515).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masoud Ahookhosh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 8.1

Let all assumptions of Theorem 4.4 be valid. Then, the following assertions hold:

  1. (i)

    \(\mathbf{lim}_{k\rightarrow \infty }\mathbf{dist}\left( \varvec{x}^k,\omega (\varvec{x}^0)\right) =0\);

  2. (ii)

    \(\omega (\varvec{x}^0)\) is a nonempty, compact, and connected set;

  3. (iii)

    the objective function \(\varphi\) is finite and constant on \(\omega (\varvec{x}^0)\).

Proof

Lemma 8.1(i) is a direct consequence of Theorem 4.4, and Lemma 8.1(ii) and Lemma 8.1(iii) can be proved in the same way as [23, Lemma 5(iii)-(iv)].

Lemma 8.2

Let all assumptions of Theorem 4.5 is satisfied. If \(\varphi (\varvec{x}^k)>\varphi ^\star\), there exists \(\varepsilon ,\eta >0\) and the desingularizing function \(\psi\) such that

$$\begin{aligned} \psi '(\varphi (\varvec{x}^k)-\varphi ^\star )\mathbf{dist}(0,\partial \varphi (\varvec{x}^k))\ge 1 \quad \text { for } k\ge k_0. \end{aligned}$$
(8.1)

Proof

From Lemma 8.1(ii), the set of limit points \(\omega (\varvec{x}^0)\) of \({(\varvec{x}^k)_{{k\in \mathbb {N}}}}\) is nonempty and compact and \(\varphi\) is finite and constant on \(\omega (\varvec{x}^0)\) due to Lemma 8.1(iii). Moreover, \(\varphi (\varvec{x}^k)>\varphi ^\star\) and the sequence \({(\varphi (\varvec{x}^k))_{{k\in \mathbb {N}}}}\) is decreasing (Proposition 4.1(i)), i.e., there exist \(\eta >0\) and \(k_1\in \mathbb {N}\) such that \(\varphi ^\star<\varphi (\varvec{x}^k)<\varphi ^\star +\eta\) for all \(k\ge k_1\). For \(\varepsilon >0\), Proposition 4.1(i) implies that there exists \(k_2\in \mathbb {N}\) such that \(\mathbf{dist}(\varvec{x}^k,\omega (\varvec{x}^0))<\varepsilon\) for \(k\ge k_2\). Setting \(k_0:=\mathbf{max}\{k_1,k_2\}\) and according to Fact 2.2, there exist \(\varepsilon , \eta >0\) and a desingularization function \(\psi\) such that for any element in

$$\begin{aligned} \{\varvec{x}^k~\mid ~\mathbf{dist}(\varvec{x}^k,\omega (\varvec{x}^0))<\varepsilon \}\cap [\varphi ^\star<\varphi (\varvec{x}^k)<\varphi ^\star +\eta ] \quad \text { for } k\ge k_0, \end{aligned}$$

the inequality (8.1) is valid.

We next present the proof of Theorem 4.7.

Proof of Theorem 4.7. The proof has two key parts.

In the first part, we show that there exist \(c>0\) and \(\overline{k}\in \mathbb {N}\) such that for all \(k\ge \overline{k}\) the following inequalities hold for \(i=1,\ldots ,N\):

$$\begin{aligned} \Vert x_i^k-x_i^\star \Vert \le \left\{ \begin{array}{ll} c \mathbf{max}\{1,\tfrac{\kappa }{1-\theta }\} \sqrt{{\mathcal {S}}_{k-1}} &{}~~~ \mathrm {if}\ \theta \in (0,1/2], \\ c \tfrac{\kappa }{1-\theta } {\mathcal {S}}_{k-1}^{1-\theta } &{}~~~ \mathrm {if}\ \theta \in (1/2,1). \end{array} \right. \end{aligned}$$
(8.2)

Let \(\varepsilon >0\) be as described in (4.16) and \(x^k\in \mathbf{B}(x^\star ; \varepsilon )\) for all \(k\ge \tilde{k}\) and \(\tilde{k}\in \mathbb {N}\). By the definitions of \(a_k\) and \(b_k\) in (4.15) and using (4.14), we get \(a_{k+1}\le \tfrac{1}{2} a_k+b_k\) for all \(k\ge \tilde{k}\). Since \({(\varphi (\varvec{x}^k))_{{k\in \mathbb {N}}}}\) is nonincreasing,

$$\begin{aligned} \sum _{j=k}^\infty a_{j+1}\le \tfrac{1}{2} \sum _{j=k}^\infty (a_j-a_{j+1}+a_{j+1})+ \tfrac{\widehat{c} N}{2}\sum _{j=k}^\infty \left( \Delta _j-\Delta _{j+1}\right) = \tfrac{1}{2}\sum _{j=k}^\infty a_{j+1}+\tfrac{1}{2} a_k+\tfrac{\widehat{c} N}{2} \Delta _k. \end{aligned}$$

Together with the arithmetic and quadratic mean inequalities, \(\psi ({\mathcal {S}}_{k})\le \psi ({\mathcal {S}}_{k-1})\), and Proposition 4.1(i), this lead to

$$\begin{aligned} {\begin{matrix} \sum _{j=k}^\infty a_{j+1}&{}\le a_k+\widehat{c} N\Delta _k = \sum _{i=1}^N \Vert x_i^{k}-x_i^{k-1}\Vert +\widehat{c} N\psi ({\mathcal {S}}_k) \le \sqrt{N} \sqrt{\sum _{i=1}^N \Vert x_i^{k}-x_i^{k-1}\Vert ^2}+\widehat{c} N\psi ({\mathcal {S}}_k)\\ &{}\le \sqrt{2N}\mathbf{max}\{\tfrac{1}{\sqrt{\sigma _1}},\ldots ,\tfrac{1}{\sqrt{\sigma _N}}\} \sqrt{\sum _{i=1}^N {\mathbf {D}}_{h}(\varvec{x}^{k-1,i},\varvec{x}^{k-1,i-1})} +\widehat{c} N\psi ({\mathcal {S}}_k)\\ &{}\le \sqrt{\tfrac{2N}{\rho }} \mathbf{max}\{\tfrac{1}{\sqrt{\sigma _1}},\ldots ,\tfrac{1}{\sqrt{\sigma _N}}\} \sqrt{{\mathcal {S}}_{k-1}-{\mathcal {S}}_{k}}+\widehat{c} N\psi ({\mathcal {S}}_{k-1}). \end{matrix}} \end{aligned}$$
(8.3)

On the other hand, for \(i=1,\ldots ,N\), we have

$$\begin{aligned} \Vert x_i^k-x_i^\star \Vert \le \Vert x_i^{k+1}-x_i^k\Vert +\Vert x_i^{k+1}-x_i^\star \Vert \le \ldots \le \sum _{j=k}^\infty \Vert x_i^{j+1}-x_i^j\Vert . \end{aligned}$$

This inequality, together with (8.3), yields

$$\begin{aligned} \sum _{i=1}^N\Vert x_i^k-x_i^\star \Vert \le \sqrt{\tfrac{2N}{\rho }} \mathbf{max}\{\tfrac{1}{\sqrt{\sigma _1}},\ldots ,\tfrac{1}{\sqrt{\sigma _N}}\} \sqrt{{\mathcal {S}}_{k-1}-{\mathcal {S}}_{k}}+\widehat{c} N \psi ({\mathcal {S}}_{k-1}), \end{aligned}$$

leading to

$$\begin{aligned} \Vert x_i^k-x_i^\star \Vert \le c \mathbf{max}\{\sqrt{{\mathcal {S}}_{k-1}},\psi ({\mathcal {S}}_{k-1})\}\quad i=1,\ldots ,N, \end{aligned}$$
(8.4)

where \(c:=\sqrt{\tfrac{2N}{\rho }} \mathbf{max}\left\{\tfrac {1}{\sqrt{\sigma _1}},\ldots ,\tfrac {1}{\sqrt{\sigma _N}}\right\}+\widehat{c} N\) and \(\psi (s):=\frac{\kappa }{1-\theta } s^{1-\theta }\). Let us consider the nonlinear equation

$$\begin{aligned} \sqrt{{\mathcal {S}}_{k-1}}-\frac{\kappa }{1-\theta } {\mathcal {S}}_{k-1}^{1-\theta }=0, \end{aligned}$$

which has a solution at \({\mathcal {S}}_{k-1}=\left( \tfrac {(1-\theta )}{\kappa }\right) ^{\tfrac{2}{1-2\theta }}\). Form the monotonicity of \({\mathcal {S}}_k\), there exists \(\hat{k}\in \mathbb {N}\) such that for \(k\ge \hat{k}\) (8.4) holds and

$$\begin{aligned} {\mathcal {S}}_{k-1}\le \left( \frac{1-\theta }{\kappa }\right) ^{\tfrac{2}{1-2\theta }}. \end{aligned}$$

We now consider two cases: (a) \(\theta \in (0,1/2]\); (b) \(\theta \in (1/2,1)\). In Case (a), if \(\theta \in (0,1/2)\), then \(\psi ({\mathcal {S}}_{k-1})\le \sqrt{{\mathcal {S}}_{k-1}}\). If \(\theta =1/2\), then \(\psi ({\mathcal {S}}_{k-1})=\tfrac{\kappa }{1-\theta }\sqrt{{\mathcal {S}}_{k-1}}\), i.e.,

$$\begin{aligned} \mathbf{max}\{\sqrt{{\mathcal {S}}_{k-1}},\psi ({\mathcal {S}}_{k-1})\}=\mathbf{max}\{1,\tfrac{\kappa }{1-\theta }\} \sqrt{{\mathcal {S}}_{k-1}}. \end{aligned}$$

Therefore, it holds that \(\mathbf{max}\{\sqrt{{\mathcal {S}}_{k-1}},\psi ({\mathcal {S}}_{k-1})\}\le \mathbf{max}\{1,\tfrac{\kappa }{1-\theta }\} \sqrt{{\mathcal {S}}_{k-1}}\). In Case (b), we have that

$$\begin{aligned} \psi ({\mathcal {S}}_{k-1})\ge \sqrt{{\mathcal {S}}_{k-1}}, \end{aligned}$$

i.e., \(\mathbf{max}\{\sqrt{{\mathcal {S}}_{k-1}},\psi ({\mathcal {S}}_{k-1})\}= \tfrac{\kappa }{1-\theta } {\mathcal {S}}_{k-1}^{1-\theta }\). Then, it follows from (8.4) that (8.2) holds for all \(k\ge \overline{k}:=\mathbf{max}\{\tilde{k}, \hat{k}\}\).

In the second part of the proof, we will show the assertions in the statement of the theorem. For \(({\mathcal {G}}_i^{k},\ldots ,{\mathcal {G}}_N^{k})\in \partial \varphi (\varvec{x}^k)\) as defined in Proposition 4.3, by Proposition 4.1(i), we infer

$$\begin{aligned} {\mathcal {S}}_{k-1}&-{\mathcal {S}}_{k}=\varphi (x^{k-1})-\varphi (x^{k}) \ge \rho \sum _{i=1}^N {\mathbf {D}}_{h}(x^{k-1,i},x^{k-1,i-1})\ge \frac{\rho }{2} \sum _{i=1}^N \sigma _i \Vert x_i^{k}-x_i^{k-1}\Vert ^2\\&\ge \frac{\rho }{2N}\min \{\sigma _1,\ldots ,\sigma _N\} \left( \sum _{i=1}^N \Vert x_i^{k}-x_i^{k-1}\Vert \right) ^2 \ge \frac{\rho }{2N\overline{c}^2}\min \{\sigma _1,\ldots ,\sigma _N\} \Vert ({\mathcal {G}}_i^{k},\ldots ,{\mathcal {G}}_N^{k})\Vert ^2\\&\ge \frac{\rho }{2N\overline{c}^2}\min \{\sigma _1,\ldots ,\sigma _N\}\mathbf{dist}(0,\partial \varphi (x^k))^2 \ge \frac{\rho }{2N\overline{c}^2\kappa ^2}\min \{\sigma _1,\ldots ,\sigma _N\} {\mathcal {S}}_{k-1}^{2\theta }=\widetilde{c}~ {\mathcal {S}}_{k-1}^{2\theta }, \end{aligned}$$

with \(\widetilde{c}:=\frac{\rho }{2N\overline{c}^2\kappa ^2}\min \{\sigma _1,\ldots ,\sigma _N\}\) and for all \(k\ge \overline{k}\). Hence, all assumptions of Fact 4.6 hold with \(\alpha =2\theta\). Therefore, our results follows from this fact and (8.2). \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahookhosh, M., Hien, L.T.K., Gillis, N. et al. Multi-block Bregman proximal alternating linearized minimization and its application to orthogonal nonnegative matrix factorization. Comput Optim Appl 79, 681–715 (2021). https://doi.org/10.1007/s10589-021-00286-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-021-00286-3

Keywords

Mathematics Subject Classification