Abstract
By exploiting double-penalty terms for the primal subproblem, we develop a novel relaxed augmented Lagrangian method for solving a family of convex optimization problems subject to equality or inequality constraints. The method is then extended to solve a general multi-block separable convex optimization problem, and two related primal-dual hybrid gradient algorithms are also discussed. Convergence results about the sublinear and linear convergence rates are established by variational characterizations for both the saddle-point of the problem and the first-order optimality conditions of involved subproblems. A large number of experiments on testing the linear support vector machine problem and the robust principal component analysis problem arising from machine learning indicate that our proposed algorithms perform much better than several state-of-the-art algorithms.
Similar content being viewed by others
Data Availability
The link https://github.com/pzheng4218/new-ALM-in-ML provides the matlab codes for experiments.
Notes
The database can be downloaded at http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html.
References
Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. Adv. Neural Inform. Process. Syst. 19, 41–48 (2007)
Banert, S., Upadhyaya, M., Giselsson, P.: The Chambolle–Pock method converges weakly with \(\theta >1/2\) and \(\tau \sigma \Vert L\Vert ^2<4/(1+2\theta )\), (2023), arXiv:2309.03998v1
Bai, J., Li, J., Xu, F., Zhang, H.: Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129–170 (2018)
Bai, J., Guo, K., Chang, X.: A family of multi-parameterized proximal point algorithms. IEEE Access 7, 164021–164028 (2019)
Bai, J., Li, J., Wu, Z.: Several variants of the primal-dual hybrid gradient algorithm with applications. Numer. Math. Theor. Meth. Appl. 13, 176–199 (2020)
Bai, J., Hager, W., Zhang, H.: An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81, 479–518 (2022)
Bai, J., Ma, Y., Sun, H., Zhang, M.: Iteration complexity analysis of a partial LQP-based alternating direction method of multipliers. Appl. Numer. Math. 165, 500–518 (2021)
Bai, J., Chang, X., Li, J., Xu, F.: Convergence revisit on generalized symmetric ADMM. Optimization 70, 149–168 (2021)
Brunton, S., Nathan Kutz, J.: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, Cambridge (2019)
Cui, J., Yan, X., Pu, X., et al.: Aero-engine fault diagnosis based on dynamic PCA and improved SVM. J. Vib. Meas. Dian. 35, 94–99 (2015)
Chambolle, A., Pock, T.: A first-order primal-dual algorithms for convex problem with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Candes, E., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)
Facchinei, F., Pang, J.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer-Verlag, Berlin (2003)
Gu, G., He, B., Yuan, X.: Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach. Comput. Optim. Appl. 59, 135–161 (2014)
Hao, Y., Sun, J., Yang, G., Bai, J.: The application of support vector machines to gas turbine performance diagnosis. Chin. J. Aeronaut. 18, 15–19 (2005)
Hestenes, M.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)
He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
He, B., Yuan, X., Zhang, W.: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56, 559–572 (2013)
He, B., Yuan, X.: A class of ADMM-based algorithms for three-block separable convex programming. Comput. Optim. Appl. 70, 791–826 (2018)
He, B., You, Y., Yuan, X.: On the convergence of primal-dual hybrid gradient algorithms. SIAM J. Imaging Sci. 7, 2526–2537 (2014)
He, B.: On the convergence properties of alternating direction method of multipliers. Numer. Math. J. Chin. Univ. (Chinese Series) 39, 81–96 (2017)
He, B., Yuan, X.: Balanced augmented Lagrangian method for convex programming (2021) arXiv:2108.08554v1
He, B., Xu, S., Yuan, J.: Indefinite linearized augmented Lagrangian method for convex programming with linear inequality constraints, (2021) arXiv:2105.02425v1
He, H., Desai, J., Wang, K.: A primal-dual prediction-correction algorithm for saddle point optimization. J. Glob. Optim. 66, 573–583 (2016)
He, B., Ma, F., Xu, S., Yuan, X.: A generalized primal-dual algorithm with improved convergence condition for saddle point problems. SIAM J. Imaging Sci. 15, 1157–1183 (2022)
Jiang, F., Zhang, Z., He, H.: Solving saddle point problems: a landscape of primal-dual algorithm with larger stepsizes. J. Global Optim. 85, 821–846 (2023)
Jiang, F., Wu, Z., Cai, X., Zhang, H.: A first-order inexact primal-dual algorithm for a class of convex-concave saddle point problems. Numer. Algor. 88, 1109–1136 (2021)
Li, L.: Selected Applications of Convex Optimization, pp. 17–18. Tsinghua University Press, Beijing (2015)
Li, Q., Xu, Y., Zhang, N.: Two-step fixed-point proximity algorithms for multi-block separable convex problems. J. Sci. Comput. 70, 1204–1228 (2017)
Liu, Z., Li, J., Li, G., et al.: A new model for sparse and low rank matrix decomposition. J. Appl. Anal. Comput. 7, 600–616 (2017)
Ma, F., Ni, M.: A class of customized proximal point algorithms for linearly constrained convex optimization. Comput. Appl. Math. 37, 896–911 (2018)
Osher, S., Heaton, H., Fung, S.: A HamiltonšCJacobi-based proximal operator. PANS 120, e2220469120 (2023)
Powell, M.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, pp. 283–298. Academic Press, New York (1969)
Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing, a probabilistic analysis. J. Comput. Syst. Sci. 61, 217–235 (2000)
Robinson, S.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–241 (1981)
Shen, Y., Zuo, Y., Yu, A.: A partially proximal S-ADMM for separable convex optimization with linear constraints. Appl. Numer. Math. 160, 65–83 (2021)
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Xu, S.: A dual-primal balanced augmented Lagrangian method for linearly constrained convex programming. J. Appl. Math. Comput. 69, 1015–1035 (2023)
Zhu, Y., Wu, J., Yu, G.: A fast proximal point algorithm for \(l_1\)-minimization problem in compressed sensing. Appl. Math. Comput. 270, 777–784 (2015)
Zhang, X.: Bregman Divergence and Mirror Descent, Lecture Notes, (2013) http://users.cecs.anu.edu.au/~xzhang/teaching/bregman.pdf
Zhu, M., Chan, T.F.: An Efficient Primal-dual Hybrid Gradient Algorithm for Total Variation Image Restoration, CAM Report 08–34. UCLA, Los Angeles (2008)
Acknowledgements
The authors would like to thank the editor and anonymous referees for their valuable comments and suggestions, which have significantly improved the quality of the paper.
Funding
This work was supported by Guangdong Basic and Applied Basic Research Foundation (2023A1515012405), Shaanxi Fundamental Science Research Project for Mathematics and Physics (22JSQ001), National Key Laboratory of Aircraft Configuration Design (D5150240011) and National Natural Science Foundation of China (Grants 12071398 and 52372397).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that, they do not have any commercial or associative interest that represents a Conflict of interest in connection with this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Discussions on Two New PDHG
Appendix: Discussions on Two New PDHG
In this appendix, we discuss two new types of PDHG algorithm without relaxation step for solving the following convex-concave saddle-point problem
or, equivalently, the composite problem \( \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \big \{\theta _1(\mathbf{{x}}) + \theta _2^*(-A\mathbf{{x}})\big \}, \) where \(\mathcal{{X}}\subseteq \mathcal{{R}}^n, \mathcal{{Y}}\subseteq \mathcal{{R}}^m\) are closed convex sets, both \(\theta _1: {\mathcal{{X}}}\rightarrow \mathcal{{R}}\) and \( \theta _2: {\mathcal{{Y}}}\rightarrow \mathcal{{R}}\) are convex but possibly nonsmooth functions, \(\theta _2^*\) is the conjugate function of \(\theta _2\), and \(A\in \mathcal{{R}}^{m\times n}\) is a given data. A lot of practical examples can be reformulated as special cases of (41), see e.g. [27, Section 5]. Note that Problem (41) can reduce to the dual of (1) by letting \(\theta _2=-\lambda ^\mathsf{{T}}b,\mathbf{{y}}=\lambda \) and \(\mathcal{{Y}}=\Lambda .\) Hence, the convergence results also hold for the previous P-rALM. Throughout the forthcoming discussions, the solution set of (41) is assumed to be nonempty.
The original PDHG proposed in [41] is to solve some TV image restoration problems. Extending it to the problem (41), we get the following scheme:
where r, s are positive scalars. He, et al. [20] pointed out that convergence of the above PDHG can be ensured if \(\theta _1\) is strongly convex and \(rs>\rho (A^\mathsf{{T}}A)\). To weaken these convergence conditions, (e.g., the function \(\theta _1\) is only convex and the parameters r, s do not depend on \(\rho (A^\mathsf{{T}}A)\)), we develop a novel PDHG (N-PDHG1) as follows.
Another related algorithm, called N-PDHG2, is just to modify the final subproblem of PDHG, whose framework is described in the next box. A similar quadratic term was adopted in [24] to solve a special case of (41). We can observe that N-PDHG1 has certain connections with P-rALM, since the first-order optimality conditions of their involved subproblems are reformulated as similar variational inequalities with the same block matrix H, see (7) and the next (42). Actually, their \(\mathbf{{x}}\)-subproblems enjoy the same proximal term. Another observation is that N-PDHG2 is developed from N-PDHG1 by just modifying the involved proximal parameters, and one of their subproblems could enjoy a proximity operator directly.
1.1 Sublinear Convergence Under General Convex Assumption
Due to the close similarities between the two algorithms mentioned above, we will only analyze the convergence properties of N-PDHG1 under general convex assumptions in the following section before proceeding over the second algorithm’s convergence. For convenience, we denote \(\mathcal{{U}}:=\mathcal{{X}}\times \mathcal{{Y}}\) and
Lemma 5.1
The sequence \(\{ \mathbf{{u}}^k \}\) generated by N-PDHG1 satisfies
for any \(\mathbf{{u}}\in \mathcal{{U}}\), where H is given by (8). Moreover, we have
Proof. According to the first-order optimality condition of the \(\mathbf{{x}}\)-subproblem in N-PDHG1, we have \(\mathbf{{x}}^{k+1}\in \mathcal{{X}}\) and
that is,
Similarly, we have \(\mathbf{{y}}^{k+1}\in \mathcal{{Y}}\) and
that is,
Combine the inequalities (45)–(47) and the structure of H given by (8) to have
which together with the the property \(\left\langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M( \mathbf{{u}}-\mathbf{{u}}^{k+1}) \right\rangle =0\) confirms (42). Then, the inequality (43) is obtained by applying (42) and the identity in (21). \(\blacksquare \)
Now, we discuss the global convergence and sublinear convergence rate of N-PDHG1. Let \(\mathbf{{u}}^*=(\mathbf{{x}}^*;\mathbf{{y}}^*)\in \mathcal{{U}}\) be a solution point of the problem (41). Then, it holds
namely,
So, finding a solution point of (41) amounts to finding \(\mathbf{{u}}^*\in \mathcal{{U}}\) such that
Setting \(\mathbf{{u}}:=\mathbf{{u}}^*\) in (43) together with (48) gives
that is, the sequence generated by N-PDHG1 is contractive, and thus N-PDHG1 converges globally. The last inequality together with the analysis of P-rALM indicates that N-PDHG1 with a relaxation step also converges, and the sublinear convergence rate of N-PDHG1 is similar to the proof of P-rALM. Note that the convergence of N-PDHG1 does not need the strong convexity of \(\theta _1\) and allows more flexibility for choosing the proximal parameter r.
Finally, it is not difficult from the first-order optimality conditions of the involved subproblems in N-PDHG2 that
for any \(\mathbf{{u}}\in \mathcal{{U}}\), where
and \(\widetilde{H}\) is positive definite for any \(r>0\) and \(Q\succ \mathbf{{0}}\). So, N-PDHG2 also converges globally with a sublinear convergence rate. This matrix \(\widetilde{H}\) is what we discussed in Sect. 1 and could reduce to that in [22] with \(Q=\delta \mathbf{{I}}\) for any \(\delta >0\).
1.2 Linear Convergence Under Strongly Convexity Assumption
The linear convergence rate of N-PDHG1 will be investigated in this subsection under the following assumptions:
-
(a1)
The matrix A has full row rank and \(\mathcal{{X}}=\mathcal{{R}}^n;\)
-
(a2)
The function \(\theta _1\) is strongly convex with modulus \(\nu >0\) and \(\nabla \theta _1\) is Lipschitz continuous with constant \(L_{\theta _1}>0\).
From the second part of (a2) and the first-order optimality condition of \(\mathbf{{x}}^{k+1}\)-subproblem in N-PDHG1, we have
Together with this equation and the first part of (a2), it holds
which implies that \(\frac{\nu }{2}\left\| \mathbf{{x}}-\mathbf{{x}}^{k+1} \right\| ^2\) will be added to the right-hand-side of (42) and finally
Note that the equation (50) can be equivalently rewritten as
Besides, the solution \((\mathbf{{x}}^*;\mathbf{{y}}^*)\) satisfies
Combining the equations (52)–(53) together with (a1)-(a2) is to obtain
where \(\sigma _A>0\) denotes the smallest eigenvalue of \(AA^\mathsf{{T}}\) due to (a1). So, we have
By the structure of H and the Young’s inequality, it holds that
where \( \delta _0 \in ( r\Vert A^\mathsf{{T}}A\Vert , \Vert rA^\mathsf{{T}}A+Q\Vert ^2)\) exists for proper choices of r and Q. Now, let
Then, combining the above inequalities (51) and (54)-(55), we can deduce
Observing from the definition of \(\delta ^k\), it holds
and finally ensures the following Q-linear convergence rate:
The above analysis also indicates that our proposed P-rALM for solving the problem (1) will converge Q-linearly under the similar assumptions that \(\theta \) is strongly convex, its gradient \(\nabla \theta \) is Lipschitz continuous, the matrix A has full row rank and \(\mathcal{{X}}=\mathcal{{R}}^n\).
1.3 Linear Convergence Under The Error Bound Condition
In this section, we use \(\partial f(x)\) to denote the sub-differential of the convex function f at x. f is said to be a piecewise linear multifunction if its graph \(Gr(f):=\{(x,y)\mid y\in f(x)\}\) is a union of finitely many polyhedra. The projection operator \(\mathcal{{P}}_{\mathcal{{C}}}(x)\) is nonexpansive, i.e.,
Given \(H\succ \mathbf{{0}}\), we define \({\text {dist}}_{H}(x, \mathcal{{C}}):=\min \limits _{z\in \mathcal{{C}}}\Vert x-z\Vert _H\). When \(H=\mathbf{{I}}\), we simply denote it by \({\text {dist}}(x, \mathcal{{C}})\). For any \(\mathbf{{u}}\in \mathcal{{U}}\) and \(\alpha >0\), we define
where \(\xi _{\mathbf{{x}}}\in \partial \theta _1(\mathbf{{x}}),\xi _{\mathbf{{y}}}\in \partial \theta _2(\mathbf{{y}}).\) Note that a point
is the solution of (41) if and only if \( e_{\mathcal{{U}}}(\mathbf{{u}}^*,\alpha )=\mathbf{{0}}\). Different from the assumptions (a1)-(a2), we next investigate the linear convergence rate of N-PDHG1 under an error bound condition in terms of the mapping \(e_{\mathcal{{U}}}(\mathbf{{u}},1)\):
-
(a3)
Assume that there exists a constant \(\zeta >0\) such that
$$\begin{aligned} {\text {dist}}\left( \mathbf{{u}},\mathcal{{U}}^*\right) \le \zeta {\text {dist}}\left( \textbf{0},e_{\mathcal{{U}}}(\mathbf{{u}},1)\right) , ~~\forall \mathbf{{u}}\in \mathcal{{U}}. \end{aligned}$$(59)
The condition (59) is generally weaker than the strong convexity assumption and hence can be satisfied by some problems that have non-strongly convex objective functions. Note that if the sub-differentials \(\partial \theta _1(\mathbf{{x}})\) and \(\partial \theta _2(\mathbf{{y}})\) are piecewise linear multifunctions and the constraint sets \(\mathcal{{X}}, \mathcal{{Y}}\) are polyhedral, then both \(\mathcal{{P}}_{\mathcal{{X}}}\) and \(\mathcal{{P}}_{\mathcal{{Y}}}\) are piecewise linear multifunctions by [13, Prop. 4.1.4] and hence \(e_{\mathcal{{U}}}(\mathbf{{u}},\alpha )\) is also a piecewise linear multifunction. Followed by Robinson’s continuity property [35] for polyhedral multifunctions, the assumption (a3) holds automatically. For convenience of the sequel analysis, we denote
It is easy to check that \(\mathcal{{Q}}\) is symmetric positive definite because \(\Vert \mathbf{{u}}\Vert ^2_{\mathcal{{Q}}}>0\) for any \(\mathbf{{u}}\ne \mathbf{{0}}\). By equivalent expressions for the first-order optimality conditions (44) and (46) together with the structure of \(\mathcal{{Q}}\), we have the following estimation on the distance between \(\mathbf{{0}}\) and \(e_{\mathcal{{U}}}(\mathbf{{u}}^{k+1},1)\), which follows the similar proof as that in [8, Sec. 2.2].
Lemma 5.2
Let \(\mathcal{{Q}}\) be given in (60). Then, the iterates generated by N-PDHG1 satisfy
Proof. The first-order optimality condition in (44) implies
Combine it with the definition of \({\text {dist}}_H(\cdot ,\cdot )\) and the property in (57) to obtain
where \(\mathcal{{Q}}_1={\text {diag}}\left( (rA^\mathsf{{T}}A+Q)^\mathsf{{T}}(rA^\mathsf{{T}}A+Q),AA^\mathsf{{T}}\right) .\) Similarly, we have from (46) that
and
where \(\mathcal{{Q}}_2={\text {diag}}\left( A^\mathsf{{T}}A,\frac{1}{r}\mathbf{{I}}\right) .\) The inequalities (62)–(63) immediately ensure (61) due to the relation \(\mathcal{{Q}}=\mathcal{{Q}}_1+\mathcal{{Q}}_2.\) \(\blacksquare \)
Based on Lemma 5.2 and the conclusion (49), we next provide a global linear convergence rate of N-PDHG1 with the aid of the notations \(\lambda _{\min }(H)\) and \( \lambda _{\max }(H)\) which denote the smallest and largest eigenvalue of the positive definite matrix H, respectively.
Theorem 5.1
Let \(\mathcal{{Q}}\) be given in (60). Then, there exists a constant \(\zeta >0\) such that the iterates generated by N-PDHG1 satisfy
where the constant \( \hat{\zeta }= \frac{\lambda _{\min }(H)}{2\zeta ^2\lambda _{\max }(\mathcal{{Q}})\lambda _{\max }(H)}>0. \)
Proof Because \( \mathcal{{U}}^*\) is a closed convex set, there exists a \(\mathbf{{u}}^*_k\in \mathcal{{U}}^*\) satisfying
By the condition (59) and Lemma 5.2 there exists a constant \(\zeta >0\) such that
By the definition of \({\text {dist}}_{H}(\cdot ,\cdot )\), we have
Combine (66)–(67) and (49) to have
Rearranging the above inequality is to confirm (64). \(\blacksquare \)
Corollary 5.1
Let \(\hat{\zeta }>0\) be given in Theorem 5.1 and the sequence \(\{\mathbf{{u}}^k\}\) be generated by N-PDHG1. Then, there exists a point \(\mathbf{{u}}^\infty \in \mathcal{{U}}^*\) such that
where
Proof Let \({\mathbf{{u}}^*}\in \mathcal{{U}}^*\) such that (65) holds and let
Then, it follows from (49) that \( \left\| \mathbf{{u}}^{k+1}-{\mathbf{{u}}^*}\right\| _{H}\le \left\| \mathbf{{u}}^k-{\mathbf{{u}}^*}\right\| _{H} \) which further implies
where the final inequality follows from (64). Because the sequence \(\{\mathbf{{u}}^k\} \) generated by N-PDHG1 converges to a \( \mathbf{{u}}^\infty \in \mathcal{{U}}^*\), we have from (69) that \( \mathbf{{u}}^\infty =\mathbf{{u}}^k+\sum _{j=k}^{\infty }\mathbf{{d}}^j \), which by (70) indicates
So, the inequality (68) holds, that is, \(\mathbf{{u}}^k\) converges \(\mathbf{{u}}^\infty \) R-linearly. \(\blacksquare \)
Remark 5.1
Consider the following general saddle-point problem
or, equivalently, the composite problem \( \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \big \{f(\mathbf{{x}}) + \theta _1(\mathbf{{x}}) + \theta _2^*(-A\mathbf{{x}})\big \},\) where \(f(\mathbf{{x}}): {\mathcal{{X}}}\rightarrow \mathcal{{R}}\) is a smooth convex function and its gradient is Lipschitz continuous with constant \(L_f\), and the remaining notations have the same meanings as before. For this problem, similar to the previous case 2 in Sect. 2.3 we can develop the following iterative scheme
Its global convergence and linear convergence rate can be also established by the above analysis.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bai, J., Jia, L. & Peng, Z. A New Insight on Augmented Lagrangian Method with Applications in Machine Learning. J Sci Comput 99, 53 (2024). https://doi.org/10.1007/s10915-024-02518-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-024-02518-0