Abstract
We introduce and analyze BPALM and A-BPALM, two multi-block proximal alternating linearized minimization algorithms using Bregman distances for solving structured nonconvex problems. The objective function is the sum of a multi-block relatively smooth function (i.e., relatively smooth by fixing all the blocks except one) and block separable (nonsmooth) nonconvex functions. The sequences generated by our algorithms are subsequentially convergent to critical points of the objective function, while they are globally convergent under the KL inequality assumption. Moreover, the rate of convergence is further analyzed for functions satisfying the Łojasiewicz’s gradient inequality. We apply this framework to orthogonal nonnegative matrix factorization (ONMF) that satisfies all of our assumptions and the related subproblems are solved in closed forms, where some preliminary numerical results are reported.
Similar content being viewed by others
Notes
The codes are publicly available at https://github.com/MasoudAhoo/BPALM
References
Ahookhosh, M.: Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity. Math. Methods Oper. Res. 89(3), 319–353 (2019)
Ahookhosh, M., Hien, L.T.K., Gillis, N., Patrinos, P.: A block inertial bregman proximal algorithm for nonsmooth nonconvex problems with application to symmetric nonnegative matrix tri-factorization. J. Optim. Theory Appl. (2021)
Ahookhosh, M., Themelis, A., Patrinos, P.: A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: superlinear convergence to nonisolated local minima. SIAM J. Optim. 31(1), 653–685 (2021)
Araújo, U., Saldanha, B., Galvão, R., Yoneyama, T., Chame, H., Visani, V.: The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometr. Intell. Lab. Syst. 57(2), 65–73 (2001)
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Alternating proximal algorithms for weakly coupled convex minimization problems. applications to dynamical games and PDE’s. J. Convex Anal. 15(3), 485 (2008)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)
Attouch, H., Redont, P., Soubeyran, A.: A new class of alternating proximal minimization algorithms with costs-to-move. SIAM J. Optim. 18(3), 1061–1081 (2007)
Attouch, H., Soubeyran, A.: Inertia and reactivity in decision making as cognitive variational inequalities. J. Conv. Anal. 13(2), 207 (2006)
Auslender, A.: Optimisation méthodes numériques. Mason, Paris (1976)
Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182, 1068–1087 (2019)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)
Bauschke, H.H., Dao, M.N., Lindstrom, S.B.: Regularizing with Bregman–Moreau envelopes. SIAM J. Optim. 28(4), 3208–3228 (2018)
Beck, A., Pauwels, E., Sabach, S.: The cyclic block conditional gradient method for convex optimization problems. SIAM J. Optim. 25(4), 2024–2049 (2015)
Beck, A., Sabach, S., Teboulle, M.: An alternating semiproximal method for nonconvex regularized structured total least squares problems. SIAM J. Matrix Anal. Appl. 37(3), 1129–1150 (2016)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., Hoboken (1989)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)
Boţ, R.I., Csetnek, E.R.: An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems. J. Optim. Theory Appl. 171(2), 600–616 (2016)
Boţ, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45(2), 682–712 (2020)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. John Wiley & Sons, Hoboken (2009)
Combettes, P.L., Pesquet, J.C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25(2), 1221–1248 (2015)
Van den Dries, L.: Tame Topology and o-Minimal Structures, vol. 248. Cambridge University Press, Cambridge (1998)
Fercoq, O., Bianchi, P.: A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM J. Optim. 29(1), 100–134 (2019)
Fu, X., Huang, K., Sidiropoulos, N.D., Ma, W.K.: Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications. IEEE Signal Process. Mag. 36(2), 59–80 (2019)
Gillis, N.: The why and how of nonnegative matrix factorization. Regular. Optim. Kernels Support Vector Mach. 12(257), 257–291 (2014)
Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 698–714 (2013)
Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Operat. Res. Lett. 26(3), 127–136 (2000)
Hanzely, F., Richtárik, P.: Fastest rates for stochastic mirror descent methods. arXiv:1803.07374 (2018)
Hanzely, F., Richtarik, P., Xiao, L.: Accelerated bregman proximal gradient methods for relatively smooth convex optimization. Comput Optim Appl 22, 1–36 (2021)
Kimura, K., Tanaka, Y., Kudo, M.: A fast hierarchical alternating least squares algorithm for orthogonal nonnegative matrix factorization. In: D. Phung, H. Li (eds.) Proceedings of the Sixth Asian Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 39, pp. 129–141. PMLR, Nha Trang City, Vietnam (2015). http://proceedings.mlr.press/v39/kimura14.html
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998)
Latafat, P., Freris, N.M., Patrinos, P.: A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Trans. Autom. Cont. 64(10), 4050–4065 (2019)
Latafat, P., Themelis, A., Patrinos, P.: Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems. Math. Program. 1–30. arxiv.org/abs/1906.10053 (2021)
Li, Q., Zhu, Z., Tang, G., Wakin, M.B.: Provable Bregman-divergence based methods for nonconvex and non-Lipschitz problems. arXiv preprint arXiv:1904.09712 (2019)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles pp. 87–89 (1963)
Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’institut Fourier 43(5), 1575–1595 (1993)
Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)
Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-concave backtracking for inertial bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math. Data Sci. 2(3), 658–682 (2020)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Pauca, V.P., Piper, J., Plemmons, R.J.: Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl. 416(1), 29–47 (2006)
Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imag .Sci. 9(4), 1756–1787 (2016)
Pompili, F., Gillis, N., Absil, P.A., Glineur, F.: Two algorithms for orthogonal nonnegative matrix factorization with application to clustering. Neurocomputing 141, 15–25 (2014)
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer Science & Business Media, Berlin (2011)
Choi, S.: Algorithms for orthogonal nonnegative matrix factorization. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1828–1832 (2008)
Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)
Tam, M.K.: Regularity properties of non-negative sparsity sets. J. Math. Anal. Appl. 447(2), 758–777 (2017)
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170(1), 67–96 (2018)
Themelis, A., Ahookhosh, M., Patrinos, P.: On the acceleration of forward-backward splitting via an inexact Newton method. In: Luke, R., Bauschke, H., Burachik, R. (eds.) Splitting Algorithms, Modern Operator Theory, and Applications, pp. 363–412. Springer, Berlin (2019)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1–2), 387–423 (2009)
Wang, X., Yuan, X., Zeng, S., Zhang, J., Zhou, J.: Block coordinate proximal gradient method for nonconvex optimization problems: convergence analysis. http://www.optimization-online.org/DB\_HTML/2018/04/6573.html (2018)
Acknowledgements
We would like to thank the anonymous reviewers for their insightful comments that helped improve the paper; in particular, one of the reviewers gave a suggestion that leads to the kernel function in Proposition 5.1. The first author is grateful to Andreas Themelis for his useful comments and discussions on the paper. MA and PP acknowledge the support by the Research Foundation Flanders (FWO) research projects G086518N and G086318N; Research Council KU Leuven C1 project No. C14/18/068; Fonds de la Recherche Scientifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) under EOS project no 30468160 (SeLMA). LTKH and NG also acknowledge the support by the European Research Council (ERC starting grant no 679515).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 8.1
Let all assumptions of Theorem 4.4 be valid. Then, the following assertions hold:
-
(i)
\(\mathbf{lim}_{k\rightarrow \infty }\mathbf{dist}\left( \varvec{x}^k,\omega (\varvec{x}^0)\right) =0\);
-
(ii)
\(\omega (\varvec{x}^0)\) is a nonempty, compact, and connected set;
-
(iii)
the objective function \(\varphi\) is finite and constant on \(\omega (\varvec{x}^0)\).
Proof
Lemma 8.1(i) is a direct consequence of Theorem 4.4, and Lemma 8.1(ii) and Lemma 8.1(iii) can be proved in the same way as [23, Lemma 5(iii)-(iv)].
Lemma 8.2
Let all assumptions of Theorem 4.5 is satisfied. If \(\varphi (\varvec{x}^k)>\varphi ^\star\), there exists \(\varepsilon ,\eta >0\) and the desingularizing function \(\psi\) such that
Proof
From Lemma 8.1(ii), the set of limit points \(\omega (\varvec{x}^0)\) of \({(\varvec{x}^k)_{{k\in \mathbb {N}}}}\) is nonempty and compact and \(\varphi\) is finite and constant on \(\omega (\varvec{x}^0)\) due to Lemma 8.1(iii). Moreover, \(\varphi (\varvec{x}^k)>\varphi ^\star\) and the sequence \({(\varphi (\varvec{x}^k))_{{k\in \mathbb {N}}}}\) is decreasing (Proposition 4.1(i)), i.e., there exist \(\eta >0\) and \(k_1\in \mathbb {N}\) such that \(\varphi ^\star<\varphi (\varvec{x}^k)<\varphi ^\star +\eta\) for all \(k\ge k_1\). For \(\varepsilon >0\), Proposition 4.1(i) implies that there exists \(k_2\in \mathbb {N}\) such that \(\mathbf{dist}(\varvec{x}^k,\omega (\varvec{x}^0))<\varepsilon\) for \(k\ge k_2\). Setting \(k_0:=\mathbf{max}\{k_1,k_2\}\) and according to Fact 2.2, there exist \(\varepsilon , \eta >0\) and a desingularization function \(\psi\) such that for any element in
the inequality (8.1) is valid.
We next present the proof of Theorem 4.7.
Proof of Theorem 4.7. The proof has two key parts.
In the first part, we show that there exist \(c>0\) and \(\overline{k}\in \mathbb {N}\) such that for all \(k\ge \overline{k}\) the following inequalities hold for \(i=1,\ldots ,N\):
Let \(\varepsilon >0\) be as described in (4.16) and \(x^k\in \mathbf{B}(x^\star ; \varepsilon )\) for all \(k\ge \tilde{k}\) and \(\tilde{k}\in \mathbb {N}\). By the definitions of \(a_k\) and \(b_k\) in (4.15) and using (4.14), we get \(a_{k+1}\le \tfrac{1}{2} a_k+b_k\) for all \(k\ge \tilde{k}\). Since \({(\varphi (\varvec{x}^k))_{{k\in \mathbb {N}}}}\) is nonincreasing,
Together with the arithmetic and quadratic mean inequalities, \(\psi ({\mathcal {S}}_{k})\le \psi ({\mathcal {S}}_{k-1})\), and Proposition 4.1(i), this lead to
On the other hand, for \(i=1,\ldots ,N\), we have
This inequality, together with (8.3), yields
leading to
where \(c:=\sqrt{\tfrac{2N}{\rho }} \mathbf{max}\left\{\tfrac {1}{\sqrt{\sigma _1}},\ldots ,\tfrac {1}{\sqrt{\sigma _N}}\right\}+\widehat{c} N\) and \(\psi (s):=\frac{\kappa }{1-\theta } s^{1-\theta }\). Let us consider the nonlinear equation
which has a solution at \({\mathcal {S}}_{k-1}=\left( \tfrac {(1-\theta )}{\kappa }\right) ^{\tfrac{2}{1-2\theta }}\). Form the monotonicity of \({\mathcal {S}}_k\), there exists \(\hat{k}\in \mathbb {N}\) such that for \(k\ge \hat{k}\) (8.4) holds and
We now consider two cases: (a) \(\theta \in (0,1/2]\); (b) \(\theta \in (1/2,1)\). In Case (a), if \(\theta \in (0,1/2)\), then \(\psi ({\mathcal {S}}_{k-1})\le \sqrt{{\mathcal {S}}_{k-1}}\). If \(\theta =1/2\), then \(\psi ({\mathcal {S}}_{k-1})=\tfrac{\kappa }{1-\theta }\sqrt{{\mathcal {S}}_{k-1}}\), i.e.,
Therefore, it holds that \(\mathbf{max}\{\sqrt{{\mathcal {S}}_{k-1}},\psi ({\mathcal {S}}_{k-1})\}\le \mathbf{max}\{1,\tfrac{\kappa }{1-\theta }\} \sqrt{{\mathcal {S}}_{k-1}}\). In Case (b), we have that
i.e., \(\mathbf{max}\{\sqrt{{\mathcal {S}}_{k-1}},\psi ({\mathcal {S}}_{k-1})\}= \tfrac{\kappa }{1-\theta } {\mathcal {S}}_{k-1}^{1-\theta }\). Then, it follows from (8.4) that (8.2) holds for all \(k\ge \overline{k}:=\mathbf{max}\{\tilde{k}, \hat{k}\}\).
In the second part of the proof, we will show the assertions in the statement of the theorem. For \(({\mathcal {G}}_i^{k},\ldots ,{\mathcal {G}}_N^{k})\in \partial \varphi (\varvec{x}^k)\) as defined in Proposition 4.3, by Proposition 4.1(i), we infer
with \(\widetilde{c}:=\frac{\rho }{2N\overline{c}^2\kappa ^2}\min \{\sigma _1,\ldots ,\sigma _N\}\) and for all \(k\ge \overline{k}\). Hence, all assumptions of Fact 4.6 hold with \(\alpha =2\theta\). Therefore, our results follows from this fact and (8.2). \(\square\)
Rights and permissions
About this article
Cite this article
Ahookhosh, M., Hien, L.T.K., Gillis, N. et al. Multi-block Bregman proximal alternating linearized minimization and its application to orthogonal nonnegative matrix factorization. Comput Optim Appl 79, 681–715 (2021). https://doi.org/10.1007/s10589-021-00286-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-021-00286-3
Keywords
- Nonsmooth nonconvex optimization
- Proximal alternating linearized minimization
- Bregman distance
- Multi-block relative smoothness
- KL inequality
- Orthogonal nonnegative matrix factorization