Abstract
In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, \(\phi (x_0,\ldots ,x_p,y)\), subject to coupled linear equality constraints. Our ADMM updates each of the primal variables \(x_0,\ldots ,x_p,y\), followed by updating the dual variable. We separate the variable y from \(x_i\)’s as it has a special role in our analysis. The developed convergence guarantee covers a variety of nonconvex functions such as piecewise linear functions, \(\ell _q\) quasi-norm, Schatten-q quasi-norm (\(0<q<1\)), minimax concave penalty (MCP), and smoothly clipped absolute deviation penalty. It also allows nonconvex constraints such as compact manifolds (e.g., spherical, Stiefel, and Grassman manifolds) and linear complementarity constraints. Also, the \(x_0\)-block can be almost any lower semi-continuous function. By applying our analysis, we show, for the first time, that several ADMM algorithms applied to solve nonconvex models in statistical learning, optimization on manifold, and matrix decomposition are guaranteed to converge. Our results provide sufficient conditions for ADMM to converge on (convex or nonconvex) monotropic programs with three or more blocks, as they are special cases of our model. ADMM has been regarded as a variant to the augmented Lagrangian method (ALM). We present a simple example to illustrate how ADMM converges but ALM diverges with bounded penalty parameter \(\beta \). Indicated by this example and other analysis in this paper, ADMM might be a better choice than ALM for some nonconvex nonsmooth problems, because ADMM is not only easier to implement, it is also more likely to converge for the concerned scenarios.
Similar content being viewed by others
Notes
This is the best that one hope (except for very specific problems) since [62, Section 1] shows a convex 2-block problem, which ADMM fails to converge.
“Globally” here means regardless of where the initial point is.
A nonnegative sequence \({a_k}\) induces its running best sequence \(b_k=\min \{a_i : i\le k\}\); therefore, \({a_k}\) has running best rate of o(1 / k) if \(b_k=o(1/k)\).
References
Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4(1), 1–106 (2012)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, London (2014)
Birgin, E.G., Martínez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization, vol. 10. SIAM, Philadelphia (2014)
Bolte, J., Daniilidis, A., Lewis, A.: The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Bouaziz, S., Tagliasacchi, A., Pauly, M.: Sparse iterative closest point. In: Computer graphics forum, vol. 32, pp. 113–123. Wiley Online Library (2013)
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Chartrand, R.: Nonconvex splitting for regularized low-rank \(+\) sparse decomposition. IEEE Trans. Signal Process. 60(11), 5810–5819 (2012)
Chartrand, R., Wohlberg, B.: A nonconvex ADMM algorithm for group sparsity with sparse groups. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6009–6013. IEEE (2013)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155, 57–79 (2016)
Chen C., Yuan, X., Zeng, S., Zhang, J.: Penalty splitting methods for solving mathematical program with equilibrium constraints. Manuscript (private communication) (2016)
Conn, A.R., Gould, N.I., Toint, P.: A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28(2), 545–572 (1991)
Cottle, R., Dantzig, G.: Complementary pivot theory of mathematical programming. Linear Algebra Appl. 1, 103–125 (1968)
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010)
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science and Engineering. Springer, New York (2016)
Davis, D., Yin, W.: Convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42(3), 783–805 (2017)
Deng, W., Lai, M.J., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o (1/k)\) convergence. J. Sci. Comput. 71, 712–736 (2017)
Ding, C., Sun, D., Sun, J., Toh, K.C.: Spectral operators of matrices. Math. Program. 168(1–2), 509–531 (2018)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer Series in Computational Physics. Springer, New York (1984)
Glowinski, R., Marroco, A.: On the approximation by finite elements of order one, and resolution, penalisation-duality for a class of nonlinear dirichlet problems. ESAIM Math. Model. Numer. Anal. 9(R2), 41–76 (1975)
He, B., Yuan, X.: On the \(o(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–320 (1969)
Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Hu, Y., Chi, E., Allen, G.I.: ADMM algorithmic regularization paths for sparse statistical machine learning. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science and Engineering. Springer, New York (2016)
Ivanov, M., Zlateva, N.: Abstract subdifferential calculus and semi-convex functions. Serdica Math. J. 23(1), 35p–58p (1997)
Iutzeler, F., Bianchi, P., Ciblat, P., Hachem, W.: Asynchronous distributed optimization using a randomized alternating direction method of multipliers. In: 2013 IEEE 52nd Annual Conference On Decision and Control (CDC), pp. 3671–3676. IEEE (2013)
Jiang, B., Ma, S., Zhang, S.: Alternating direction method of multipliers for real and complex polynomial optimization models. Optimization 63(6), 883–898 (2014)
Knopp, K.: Infinite Sequences and Series. Courier Corporation, Chelmsford (1956)
Kryštof, V., Zajíček, L.: Differences of two semiconvex functions on the real line. Preprint (2015)
Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014)
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)
Li, R.C., Stewart, G.: A new relative perturbation theorem for singular subspaces. Linear Algebra Appl. 313(1), 41–51 (2000)
Liavas, A.P., Sidiropoulos, N.D.: Parallel algorithms for constrained tensor factorization via the alternating direction method of multipliers. IEEE Trans. Signal Process. 63(20), 5450–5463 (2015)
Łojasiewicz, S.: Sur la géométrie semi-et sous-analytique. Ann. Inst. Fourier (Grenoble) 43(5), 1575–1595 (1993)
Lu, Z., Zhang, Y.: An augmented lagrangian approach for sparse principal component analysis. Math. Program. 135(1–2), 149–193 (2012)
Magnússon, S., Weeraddana, P.C., Rabbat, M.G., Fischione, C.: On the convergence of alternating direction Lagrangian methods for nonconvex structured optimization problems. IEEE Trans. Control Netw. Syst. 3(3), 296–309 (2015)
Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6), 959–972 (1977)
Miksik, O., Vineet, V., Pérez, P., Torr, P.H., Cesson Sévigné, F.: Distributed non-convex ADMM-inference in large-scale random fields. In: British Machine Vision Conference. BMVC (2014)
Möllenhoff, T., Strekalovskiy, E., Moeller, M., Cremers, D.: The primal-dual hybrid gradient method for semiconvex splittings. SIAM J. Imaging Sci. 8(2), 827–857 (2015)
Oymak, S., Mohan, K., Fazel, M., Hassibi, B.: A simplified approach to recovery conditions for low rank matrices. In: 2011 IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 2318–2322. IEEE (2011)
Peng, Z., Xu, Y., Yan, M., Yin, W.: ARock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)
Poliquin, R., Rockafellar, R.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348(5), 1805–1838 (1996)
Powell, M.J.: A method for non-linear constraints in minimization problems. UKAEA (1967)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer Science & Business Media (2009)
Rosenberg, J., et al.: Applications of analysis on Lipschitz manifolds. In: Proceedings of Miniconferences on Harmonic Analysis and Operator Algebras (Canberra, t987), Proceedings Centre for Mathematical Analysis, vol. 16, pp. 269–283 (1988)
Shen, Y., Wen, Z., Zhang, Y.: Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 29(2), 239–263 (2014)
Slavakis, K., Giannakis, G., Mateos, G.: Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Sig. Process. Mag. 31(5), 18–31 (2014)
Sun, D.L., Fevotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201–6205. IEEE (2014)
Sun, R., Luo, Z.-Q., Ye, Y.: On the expected convergence of randomly permuted ADMM. arXiv preprint arXiv:1503.06387 (2015)
Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. arXiv preprint arXiv:1505.03063 (2015)
Wang, F., Xu, Z., Xu, H.K.: Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems. arXiv preprint arXiv:1410.8625 (2014)
Wang, X., Hong, M., Ma, S., Luo, Z.Q.: Solving multiple-block separable convex minimization problems using two-block alternating direction method of multipliers. arXiv preprint arXiv:1308.5294 (2013)
Wang, Y., Zeng, J., Peng, Z., Chang, X., Xu, Z.: Linear convergence of adaptively iterative thresholding algorithm for compressed sensing. IEEE Trans. Signal Process. 63(11), 2957–2971 (2015)
Watson, G.A.: Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33–45 (1992)
Wen, Z., Peng, X., Liu, X., Sun, X., Bai, X.: Asset allocation under the basel accord risk measures. arXiv preprint arXiv:1308.1321 (2013)
Wen, Z., Yang, C., Liu, X., Marchesini, S.: Alternating direction methods for classical and ptychographic phase retrieval. Inverse Prob. 28(11), 115010 (2012)
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)
Wikipedia: Schatten norm—Wikipedia, the free encyclopedia (2015). (Online; Accessed 18 Oct 2015)
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012)
Yan, M., Yin, W.: Self equivalence of the alternating direction method of multipliers. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science and Engineering, pp. 165–194. Springer, New York (2016)
Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for nonconvex background/foreground extraction. SIAM J. Imaging Sci. 10(1), 74–110 (2017)
You, S., Peng, Q.: A non-convex alternating direction method of multipliers heuristic for optimal power flow. In: 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm), pp. 788–793. IEEE (2014)
Zeng, J., Lin, S., Xu, Z.: Sparse regularization: convergence of iterative jumping thresholding algorithm. IEEE Trans. Signal Process. 64(19), 5106–5117 (2016)
Zeng, J., Peng, Z., Lin, S.: A Gauss–Seidel iterative thresholding algorithm for \(\ell_q\) regularized least squares regression. J. Comput. Appl. Math. 319, 220–235 (2017)
Zeng, J., Lin, S., Wang, Y., Xu, Z.: \(L_{1/2}\) regularization: convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process. 62(9), 2317–2329 (2014)
Acknowledgements
We would like to thank Drs. Wei Shi, Ting Kei Pong, and Qing Ling for their insightful comments, and Drs. Xin Liu and Yangyang Xu for helpful discussions. We thank the three anonymous reviewers for their review and helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of W. Yin was supported in part by NSF Grants DMS-1720237 and ECCS-1462397, and ONR Grant N00014171216. The work of J. Zeng was supported in part by the NSFC Grants (61603162, 11501440, 61772246, 61603163) and the doctoral start-up foundation of Jiangxi Normal Univerity.
Appendix
Appendix
Proof of Proposition 1
The fact that convex functions and the \(C^1\) regular functions are prox-regular has been proved in the previous literature, for example, see [43]. Here we only prove the second part of the proposition.
(1): For functions \(r( x) = \sum _{i} |x_i|^q\) where \(0< q < 1\), the set of general subgradient of \(r(\cdot )\) is
For any two positive constants \(C>0\) and \(M>1\), take \(\gamma = \max \left\{ \frac{4({n}C^q+MC)}{c^2},q(1-q)c^{q-2}\right\} \), where
\(c \triangleq \frac{1}{3}(\frac{q}{M})^{\frac{1}{1-q}}\). The exclusion set \(S_{M}\) contains the set \(\{ x|\min _{x_i\ne 0} |x_i|\le 3c\}\). For any point \(z\in \mathbb {B}(0,C)/S_{M}\) and \(y\in \mathbb {B}(0,C)\), if \(\Vert z-y\Vert \le c\), then \(\mathrm {supp}(z) \subset \mathrm {supp}(y)\) and \(\Vert z\Vert _0 \le \Vert y\Vert _0\), where \(\mathbb {B}(0,C) \triangleq \{x| \Vert x\Vert <C\}\), \(\mathrm {supp}(z)\) denotes the index set of all non-zero elements of z and \(\Vert z\Vert _0\) denotes the cardinality of \(\mathrm {supp}(z)\). Define
Then for any \(d\in \partial r(z)\), the following line of proof holds,
where (a) holds for \(\Vert y\Vert _q^q = \Vert y'\Vert _q^q + \Vert y-y'\Vert _q^q\) by the definition of \(y'\),
(b) holds because r(x) is twice differentiable along the line segment connecting z and \(y'\), and the second order derivative is no bigger than \(q(1-q)c^{q-2}\), and (c) holds because \(\Vert z-y\Vert \ge \Vert z-y'\Vert \). While if \(\Vert z-y\Vert > c\), then for any \(d\in \partial r(z)\), we have
Combining (51) and (52) yields the result.
(2): We are going to verify that Schatten-q quasi-norm \({\Vert \cdot \Vert }_q\) is restricted prox-regular. Without loss of generality, suppose \(A\in \mathbb {R}^{n\times n}\) is a square matrix.
Suppose the singular value decomposition (SVD) of A is
where \(U,V\in \mathbb {R}^{n\times n}\) are orthogonal matrices, and \(\varSigma _1\in \mathbb {R}^{K\times K}\) is diagonal whose diagonal elements are \(\sigma _i(A)\), \(i=1,\ldots ,K\). Then the general subgradient of \({\Vert A \Vert }_q^q\) [55] is
where \(D\in \mathbb {R}^{K\times K}\) is a diagonal matrix whose ith diagonal element is \( d_i = q\sigma _i(A)^{q-1}\).
Now we are going to prove \({\Vert \cdot \Vert }_q^q\) is restricted prox-regular, i.e., for any positive parameters \(M, P>0\), there exists \(\gamma >0\) such that for any \({\Vert B \Vert }_F<P\), \({\Vert A \Vert }_F<P\), \(A\not \in S_M = \{A| \forall ~X\in \partial {\Vert A \Vert }_q^q,~{\Vert X \Vert }_F>M\}\), and \(T = U_{1} D V_{1}^T + U_{2}\varGamma V_{2}^T\in \partial {\Vert A \Vert }_q^q, {\Vert T \Vert }_F\le M\), we hope to show
Let \(\epsilon _0 = \frac{1}{3}(M/q)^{1/(q - 1)}\). If \({\Vert B - A \Vert } > \epsilon _0\), we have
If \({\Vert B-A \Vert }_F<\epsilon _0\), consider the decomposition of \(B = U_B \varSigma ^B V_B^T = B_1 + B_2\) where \(B_1 = U_B \varSigma ^B_1 V_B^T\), \(\varSigma ^B_1\) is the diagonal matrix preserving elements of \(\varSigma ^B\) bigger than \(\frac{1}{3}(M/q)^{1/(q - 1)}\), and \(B_2 = U_B \varSigma ^B_2 V_B^T\) where \(\varSigma ^B_2 = \varSigma ^B - \varSigma ^B_1\).
Define a set \(S' \triangleq \{T\in {\mathbb {R}}^{n \times n}|{\Vert T \Vert }_F \le P,~ \min _{\sigma _i>0} \sigma _i(T) \ge \epsilon _0\}\). Let’s prove \(A, B_1\in S'\). If \(\min _{\sigma _i >0} \sigma _i(A) < (M/q)^{1/(q - 1)}\), then for any \(X\in \partial {\Vert A \Vert }_q^q\), \(X = U_1DV_1^T + U_2\varGamma V_2^T\) and
which contradicts with the face that \(A\not \in S_M\). As for \(B_1\), because of \({\Vert A - B \Vert }_F\le \epsilon _0\) and \(\min _{\sigma _i >0} \sigma _i(A) < (M/q)^{1/(q - 1)}\), using Weyl inequalities will we get \(B_1\in S'\).
Define the function \(F:S'\subset \mathbb {R}^{n\times n}\rightarrow \mathbb {R}^{n\times n}\), for \(A = U_1\varSigma V_1^T\), \(F(A) = U_1 D V_1^T\), where
and (\(0^{q-1} = 0\)). Based on [18, Theorem 4.1] and the compactness of \(S'\), F(A) is Lipschitz continuous in \(S'\), i.e., there exists \(L>0\), for any two matrices \(A, B\in S'\) , \({\Vert F(A) - F(B) \Vert }_F\le L{\Vert A - B \Vert }_F\). This implies
In addition, because \({\Vert U_{2}^TU_B \Vert }_F< {\Vert B_1 - A \Vert }_F/\epsilon _0\) and \({\Vert V_{2}^TV_B \Vert }_F < {\Vert B_1 - A \Vert }_F/\epsilon _0\) (see [33]),
Furthermore, \({\Vert B_2 \Vert }_q^q - \big <T,B_2\big > \ge 0\) and \({\Vert B_1 - A \Vert }_F \le {\Vert B - A \Vert }_F+{\Vert B - B_1 \Vert }_F\le 2{\Vert B - A \Vert }_F\), together with (56) and (57) we have
Combining (55) and (58), we finally prove (54) with appropriate \(\gamma \).
(3): We need to show that the indicator function \(\iota _S\) of a p-dimensional compact \(C^2\) manifold S is restricted prox-regular. First of all, by definition, the exclusion set \(S_M\) of \(\iota _S\) is empty for any \(M>0\). Since S is compact and \(C^2\), there are a series of \(C^2\) homeomorphisms \(h_\eta : \mathbb {R}^{p} \mapsto \mathbb {R}^n\), \(\eta \in \{1,\ldots , m\}\) and \(\delta >0\) such that for any x, there exist an \(\eta \) and an \(\alpha _x\) satisfying \(x = h_\eta (\alpha _x)\in S\). Furthermore, for any \(\Vert y - x\Vert \le \delta \), we can find an \(\alpha _y\) satisfying \(y = h_\eta (\alpha _y)\).
Note that \(\partial \iota _{S}(x)= \mathrm {Im}(J_{h_\eta }(x))^\perp \), where \(J_{h_\eta }\) is the Jacobian of \(h_\eta \). For any \(d\in \partial \iota _S(x)\), \(\Vert d\Vert \le M\) and \(\Vert x-y\Vert \le \delta \),
where \(\gamma \) and C are the Lipschitz constants of \(\nabla h_\eta \) and \( h^{-1}_\eta \), respectively. For any \(\Vert y-x\Vert \ge \delta \),
where M is the maximum of \(\Vert d\Vert \) over \(\partial \iota _S(x)\). Combining (59) and (60) shows that \(\iota _{S}\) is restricted prox-regular. \(\square \)
Proof
(Lemma1) By the definitions of H in A3(a) and \(y^k\), we have \(y^k = H(By^k)\). Therefore, \(\Vert y^{k_1} - y^{k_2}\Vert =\Vert H(By^{k_1}) - H(By^{k_2})\Vert \le {\bar{M}} \Vert By^{k_1} - By^{k_2}\Vert .\) Similarly, by the optimality of \(x^k_i\), we have \(x^k_i = F_i(A_ix_i^k)\). Therefore, \(\Vert x^{k_1}_i - x_i^{k_2}\Vert =\Vert F_i(A_ix_i^{k_1}) - F_i(A_ix_i^{k_2})\Vert \le {\bar{M}} \Vert A_ix_i^{k_1} - A_ix_i^{k_2}\Vert .\)\(\square \)
Proof
(Lemma 2) Let us first show that the y-subproblem is well defined. To begin with, we will show that h(y) is lower bounded by a quadratic function of By:
By A3, we know h(y) is lower bounded by h(H(By)):
Because of A5 and A3, h(H(By)) is lower bounded by a quadratic function of By:
Therefore h(y) is also bounded by the quadratic function:
Recall that y-subproblem is to minimize the Lagrangian function w.r.t. y, by neglecting other constants, it is equivalent to minimize:
Because h(y) is lower bounded by \(-\frac{L_h{\bar{M}}^2}{2}\Vert By\Vert ^2\), when \(\beta > L_h{\bar{M}}\), \(P(y)\rightarrow \infty \) as \(\Vert By\Vert \rightarrow \infty \). This shows that y-subproblem is coercive with respect to By. Because P(y) is lower semi-continuous and \({{\mathrm{argmin}}}h(y) \ \text {s.t.} \ By = u\) has a unique solution for each u, the minimal point of P(y) must exist and the y-subproblem is well defined.
As for the \(x_i\)-subproblem, \(i = 0,\ldots , p\), ignoring the constants yields
where \(u = H(-A_{<i}x^+_{<i} - A_{>i}x^k_{>i} - A_ix_i)\). The first two terms are coercive bounded because \(A_{<i}x^+_{<i} + A_{>i}x^k_{>i} + A_ix_i + Bu = 0\) and A1. The third and fourth terms are lower bounded because h is Lipschitz differentiable. Because the function is lower semi-continuous, all the subproblems are well defined. \(\square \)
Proof
(Proposition 1) Define the augmented Lagrangian function to be
It is clear that when \(\beta =0\), \({\mathcal L}_{\beta }\) is not lower bounded for any w. We are going to show that for any \(\beta >2\), the duality gap is not zero.
On one hand, because \(\sup _{w\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w) = +\infty \) when \(x\ne y\) and \(\sup _{w\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w) = 0\) when \(x=y\), we have
On the other hand, let \(t=x-y\),
This shows the duality gap is not zero (but it goes to 0 as \(\beta \) tends to \(\infty \)).
Then let us show that ALM does not converge if \(\beta ^k\) is bounded, i.e., there exists \(\beta >0\) such that \(\beta ^k\le \beta \) for any \(k\in {\mathbb {N}}\). Without loss of generality, we assume that \(\beta ^k\) equals to the constant \(\beta \) for all \(k\in {\mathbb {N}}\). This will not affect the proof. ALM consists of two steps
-
1)
\((x^{k+1},y^{k+1}) = \text {argmin}_{x,y} {\mathcal L}_{\beta }(x,y,w^k),\)
-
2)
\(w^{k+1} = w^k + \tau (x^{k+1} - y^{k+1}).\)
Since \((x^{k+1} - y^{k+1})\in \partial \psi (w^k)\) where \(\psi (w) = \inf _{x,y} {\mathcal L}_{\beta }(x,y,w)\), and we already know
we have
Note that when \(w^k = 0\), the optimization problem \(\inf _{x,y} L(x,y,0)\) has two distinct minimal points which lead to two different values. This shows no matter how small \(\tau \) is, \(w^k\) will oscillate around 0 and never converge.
However, although the duality gap is not zero, ADMM still converges in this case. There are two ways to prove it. The first way is to check all the conditions in Theorem 1. Another way is to check the iterates directly. The ADMM iterates are
The second equality shows that \(w^{k} = -2y^k\), substituting it into the first and second equalities, we have
Here \(|y^{k+1}| \le \frac{\beta }{\beta -2} + \frac{2}{\beta -2}|y^k|\). Thus after finite iterations, \(|y^{k}| \le 2\) (assume \(\beta >4\)). If \(|y^k| \le 1\), the ADMM sequence converges obviously. If \(|y^k| > 1\), without loss of generality, we could assume \(2>y^k>1\). Then \(x^{k+1} = 1\). It means \(0<y^{k+1}<1\), so the ADMM sequence converges. Thus, we know for any initial point \(y^0\) and \(w^0\), ADMM converges. \(\square \)
Proof
(Theorem 2) Similar to the proof of Theorem 1, we only need to verify P1–P4 in Proposition 2. Proof of P2: Similar to Lemmas 4 and 5, we have
Since \(B^Tw^k = - \partial _y \phi ({\mathbf {x}}^k,y^k)\) for any \(k\in {\mathbb {N}}\), we have
where \(C_1 = \sigma _{\min }(B)\), \(\sigma _{\min }(B)\) is the smallest positive singular value of B, and \(L_\phi \) is the Lipschitz constant for \(\phi \). Therefore, we have
When \(\beta > \max \{1,L_\phi {\bar{M}} + 2C_1L_\phi {\bar{M}}\}\), P2 holds.
Proof of P1: First of all, we have already shown \({\mathcal L}_\beta ({\mathbf {x}}^{k},y^k,w^k)\ge {\mathcal L}_\beta ({\mathbf {x}}^{k+1},y^{k+1},w^{k+1})\), which means \({\mathcal L}_\beta ({\mathbf {x}}^{k},y^k,w^k)\) decreases monotonically. There exists \(y'\) such that \({\mathbf {A}}{\mathbf {x}}^k + By' = 0\) and \(y' = H(By')\). In order to show \({\mathcal L}_\beta ({\mathbf {x}}^k,y^k,w^k)\) is lower bounded, we apply A1–A3 to get
for some \(d_y^k \in \partial _y \phi ({\mathbf {x}}^{k},y^k)\). This shows that \(\mathcal{L}_{\beta }({\mathbf {x}}^{k},y^k,w^k)\) is lower bounded. If we view (72) from the opposite direction, it can be observed that
is upper bounded by \({\mathcal L}_\beta ({\mathbf {x}}^0,y^0,w^0)\). Then A1 ensures that \(\{{\mathbf {x}}^k,y^k\}\) is bounded. Therefore, \(w^k\) is bounded too.
Proof of P3, P4: This part is trivial as \(\phi \) is Lipschitz differentiable. Hence we omit it.
\(\square \)
Rights and permissions
About this article
Cite this article
Wang, Y., Yin, W. & Zeng, J. Global Convergence of ADMM in Nonconvex Nonsmooth Optimization. J Sci Comput 78, 29–63 (2019). https://doi.org/10.1007/s10915-018-0757-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-018-0757-z