A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Bai, Jianchao; Jia, Linyuan; Peng, Zheng

doi:10.1007/s10915-024-02518-0

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Published: 13 April 2024

Volume 99, article number 53, (2024)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

481 Accesses
4 Citations
Explore all metrics

Abstract

By exploiting double-penalty terms for the primal subproblem, we develop a novel relaxed augmented Lagrangian method for solving a family of convex optimization problems subject to equality or inequality constraints. The method is then extended to solve a general multi-block separable convex optimization problem, and two related primal-dual hybrid gradient algorithms are also discussed. Convergence results about the sublinear and linear convergence rates are established by variational characterizations for both the saddle-point of the problem and the first-order optimality conditions of involved subproblems. A large number of experiments on testing the linear support vector machine problem and the robust principal component analysis problem arising from machine learning indicate that our proposed algorithms perform much better than several state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On a new approach for Lagrangian support vector regression

Article 02 September 2016

Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data

Article Open access 14 June 2017

Alternating Proximal Gradient Method for Convex Minimization

Article 18 December 2015

Data Availability

The link https://github.com/pzheng4218/new-ALM-in-ML provides the matlab codes for experiments.

Notes

The database can be downloaded at http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html.

References

Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. Adv. Neural Inform. Process. Syst. 19, 41–48 (2007)
Google Scholar
Banert, S., Upadhyaya, M., Giselsson, P.: The Chambolle–Pock method converges weakly with $\theta >1/2$ and $\tau \sigma \Vert L\Vert ^2<4/(1+2\theta )$, (2023), arXiv:2309.03998v1
Bai, J., Li, J., Xu, F., Zhang, H.: Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129–170 (2018)
Article MathSciNet Google Scholar
Bai, J., Guo, K., Chang, X.: A family of multi-parameterized proximal point algorithms. IEEE Access 7, 164021–164028 (2019)
Article Google Scholar
Bai, J., Li, J., Wu, Z.: Several variants of the primal-dual hybrid gradient algorithm with applications. Numer. Math. Theor. Meth. Appl. 13, 176–199 (2020)
Article MathSciNet Google Scholar
Bai, J., Hager, W., Zhang, H.: An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81, 479–518 (2022)
Article MathSciNet Google Scholar
Bai, J., Ma, Y., Sun, H., Zhang, M.: Iteration complexity analysis of a partial LQP-based alternating direction method of multipliers. Appl. Numer. Math. 165, 500–518 (2021)
Article MathSciNet Google Scholar
Bai, J., Chang, X., Li, J., Xu, F.: Convergence revisit on generalized symmetric ADMM. Optimization 70, 149–168 (2021)
Article MathSciNet Google Scholar
Brunton, S., Nathan Kutz, J.: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, Cambridge (2019)
Google Scholar
Cui, J., Yan, X., Pu, X., et al.: Aero-engine fault diagnosis based on dynamic PCA and improved SVM. J. Vib. Meas. Dian. 35, 94–99 (2015)
Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithms for convex problem with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Article MathSciNet Google Scholar
Candes, E., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)
Article MathSciNet Google Scholar
Facchinei, F., Pang, J.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer-Verlag, Berlin (2003)
Google Scholar
Gu, G., He, B., Yuan, X.: Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach. Comput. Optim. Appl. 59, 135–161 (2014)
Article MathSciNet Google Scholar
Hao, Y., Sun, J., Yang, G., Bai, J.: The application of support vector machines to gas turbine performance diagnosis. Chin. J. Aeronaut. 18, 15–19 (2005)
Article Google Scholar
Hestenes, M.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)
Article MathSciNet Google Scholar
He, B., Yuan, X.: On the $O(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet Google Scholar
He, B., Yuan, X., Zhang, W.: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56, 559–572 (2013)
Article MathSciNet Google Scholar
He, B., Yuan, X.: A class of ADMM-based algorithms for three-block separable convex programming. Comput. Optim. Appl. 70, 791–826 (2018)
Article MathSciNet Google Scholar
He, B., You, Y., Yuan, X.: On the convergence of primal-dual hybrid gradient algorithms. SIAM J. Imaging Sci. 7, 2526–2537 (2014)
Article MathSciNet Google Scholar
He, B.: On the convergence properties of alternating direction method of multipliers. Numer. Math. J. Chin. Univ. (Chinese Series) 39, 81–96 (2017)
MathSciNet Google Scholar
He, B., Yuan, X.: Balanced augmented Lagrangian method for convex programming (2021) arXiv:2108.08554v1
He, B., Xu, S., Yuan, J.: Indefinite linearized augmented Lagrangian method for convex programming with linear inequality constraints, (2021) arXiv:2105.02425v1
He, H., Desai, J., Wang, K.: A primal-dual prediction-correction algorithm for saddle point optimization. J. Glob. Optim. 66, 573–583 (2016)
Article MathSciNet Google Scholar
He, B., Ma, F., Xu, S., Yuan, X.: A generalized primal-dual algorithm with improved convergence condition for saddle point problems. SIAM J. Imaging Sci. 15, 1157–1183 (2022)
Article MathSciNet Google Scholar
Jiang, F., Zhang, Z., He, H.: Solving saddle point problems: a landscape of primal-dual algorithm with larger stepsizes. J. Global Optim. 85, 821–846 (2023)
Article MathSciNet Google Scholar
Jiang, F., Wu, Z., Cai, X., Zhang, H.: A first-order inexact primal-dual algorithm for a class of convex-concave saddle point problems. Numer. Algor. 88, 1109–1136 (2021)
Article MathSciNet Google Scholar
Li, L.: Selected Applications of Convex Optimization, pp. 17–18. Tsinghua University Press, Beijing (2015)
Google Scholar
Li, Q., Xu, Y., Zhang, N.: Two-step fixed-point proximity algorithms for multi-block separable convex problems. J. Sci. Comput. 70, 1204–1228 (2017)
Article MathSciNet Google Scholar
Liu, Z., Li, J., Li, G., et al.: A new model for sparse and low rank matrix decomposition. J. Appl. Anal. Comput. 7, 600–616 (2017)
MathSciNet Google Scholar
Ma, F., Ni, M.: A class of customized proximal point algorithms for linearly constrained convex optimization. Comput. Appl. Math. 37, 896–911 (2018)
Article MathSciNet Google Scholar
Osher, S., Heaton, H., Fung, S.: A HamiltonšCJacobi-based proximal operator. PANS 120, e2220469120 (2023)
Article Google Scholar
Powell, M.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, pp. 283–298. Academic Press, New York (1969)
Google Scholar
Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing, a probabilistic analysis. J. Comput. Syst. Sci. 61, 217–235 (2000)
Article MathSciNet Google Scholar
Robinson, S.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–241 (1981)
Article MathSciNet Google Scholar
Shen, Y., Zuo, Y., Yu, A.: A partially proximal S-ADMM for separable convex optimization with linear constraints. Appl. Numer. Math. 160, 65–83 (2021)
Article MathSciNet Google Scholar
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Article MathSciNet Google Scholar
Xu, S.: A dual-primal balanced augmented Lagrangian method for linearly constrained convex programming. J. Appl. Math. Comput. 69, 1015–1035 (2023)
Article MathSciNet Google Scholar
Zhu, Y., Wu, J., Yu, G.: A fast proximal point algorithm for $l_1$-minimization problem in compressed sensing. Appl. Math. Comput. 270, 777–784 (2015)
MathSciNet Google Scholar
Zhang, X.: Bregman Divergence and Mirror Descent, Lecture Notes, (2013) http://users.cecs.anu.edu.au/~xzhang/teaching/bregman.pdf
Zhu, M., Chan, T.F.: An Efficient Primal-dual Hybrid Gradient Algorithm for Total Variation Image Restoration, CAM Report 08–34. UCLA, Los Angeles (2008)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and anonymous referees for their valuable comments and suggestions, which have significantly improved the quality of the paper.

Funding

This work was supported by Guangdong Basic and Applied Basic Research Foundation (2023A1515012405), Shaanxi Fundamental Science Research Project for Mathematics and Physics (22JSQ001), National Key Laboratory of Aircraft Configuration Design (D5150240011) and National Natural Science Foundation of China (Grants 12071398 and 52372397).

Author information

Authors and Affiliations

Research & Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen, 518057, China
Jianchao Bai
School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an, 710129, China
Jianchao Bai
School of Power and Energy, Northwestern Polytechnical University, Xi’an, 710129, China
Linyuan Jia
School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
Zheng Peng

Authors

Jianchao Bai
View author publications
You can also search for this author in PubMed Google Scholar
Linyuan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Peng.

Ethics declarations

Conflict of interest

The authors declared that, they do not have any commercial or associative interest that represents a Conflict of interest in connection with this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Discussions on Two New PDHG

In this appendix, we discuss two new types of PDHG algorithm without relaxation step for solving the following convex-concave saddle-point problem

$$\begin{aligned} \min \limits _{\mathbf{{x}}\in \mathcal{{X}}}\max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (\mathbf{{x,y}}):= \theta _1(\mathbf{{x}})-\mathbf{{y}}^\mathsf{{T}}A\mathbf{{x}}-\theta _2(\mathbf{{y}}), \end{aligned}$$

(41)

or, equivalently, the composite problem $ \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \big \{\theta _1(\mathbf{{x}}) + \theta _2^*(-A\mathbf{{x}})\big \}, $ where $\mathcal{{X}}\subseteq \mathcal{{R}}^n, \mathcal{{Y}}\subseteq \mathcal{{R}}^m$ are closed convex sets, both $\theta _1: {\mathcal{{X}}}\rightarrow \mathcal{{R}}$ and $ \theta _2: {\mathcal{{Y}}}\rightarrow \mathcal{{R}}$ are convex but possibly nonsmooth functions, $\theta _2^*$ is the conjugate function of $\theta _2$, and $A\in \mathcal{{R}}^{m\times n}$ is a given data. A lot of practical examples can be reformulated as special cases of (41), see e.g. [27, Section 5]. Note that Problem (41) can reduce to the dual of (1) by letting $\theta _2=-\lambda ^\mathsf{{T}}b,\mathbf{{y}}=\lambda $ and $\mathcal{{Y}}=\Lambda .$ Hence, the convergence results also hold for the previous P-rALM. Throughout the forthcoming discussions, the solution set of (41) is assumed to be nonempty.

The original PDHG proposed in [41] is to solve some TV image restoration problems. Extending it to the problem (41), we get the following scheme:

$$\begin{aligned}\left\{ \begin{array}{l} \mathbf{{x}}^{k+1}=\arg \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \Phi (\mathbf{{x}},\mathbf{{y}}^k)+\frac{r}{2}\Vert \mathbf{{x}}-\mathbf{{x}}^k\Vert ^2, \\ \mathbf{{y}}^{k+1}=\arg \max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (\mathbf{{x}}^{k+1},\mathbf{{y}})-\frac{s}{2}\Vert \mathbf{{y}}-\mathbf{{y}}^k\Vert ^2, \end{array}\right. \end{aligned}$$

where r, s are positive scalars. He, et al. [20] pointed out that convergence of the above PDHG can be ensured if $\theta _1$ is strongly convex and $rs>\rho (A^\mathsf{{T}}A)$. To weaken these convergence conditions, (e.g., the function $\theta _1$ is only convex and the parameters r, s do not depend on $\rho (A^\mathsf{{T}}A)$), we develop a novel PDHG (N-PDHG1) as follows.

Another related algorithm, called N-PDHG2, is just to modify the final subproblem of PDHG, whose framework is described in the next box. A similar quadratic term was adopted in [24] to solve a special case of (41). We can observe that N-PDHG1 has certain connections with P-rALM, since the first-order optimality conditions of their involved subproblems are reformulated as similar variational inequalities with the same block matrix H, see (7) and the next (42). Actually, their $\mathbf{{x}}$-subproblems enjoy the same proximal term. Another observation is that N-PDHG2 is developed from N-PDHG1 by just modifying the involved proximal parameters, and one of their subproblems could enjoy a proximity operator directly.

1.1 Sublinear Convergence Under General Convex Assumption

Due to the close similarities between the two algorithms mentioned above, we will only analyze the convergence properties of N-PDHG1 under general convex assumptions in the following section before proceeding over the second algorithm’s convergence. For convenience, we denote $\mathcal{{U}}:=\mathcal{{X}}\times \mathcal{{Y}}$ and

$$\begin{aligned} \theta (\mathbf{{u}})=\theta _1(\mathbf{{x}})+\theta _2(\mathbf{{y}}),\quad \mathbf{{u}}=\left( \begin{array}{c} \mathbf{{x}} \\ \mathbf{{y}} \end{array}\right) ,\quad \mathbf{{u}}^k=\left( \begin{array}{c} \mathbf{{x}}^k \\ \mathbf{{y}}^k \end{array}\right) \quad \text {and} \quad M=\left[ \begin{array}{ccccccc} \mathbf{{0}} &{}&{} -A^\mathsf{{T}}\\ A &{}&{} \mathbf{{0}} \end{array}\right] . \end{aligned}$$

Lemma 5.1

The sequence $\{ \mathbf{{u}}^k \}$ generated by N-PDHG1 satisfies

$$\begin{aligned} \mathbf{{u}}^{k+1}\in \mathcal{{U}},~ \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}\big \rangle \ge \big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, H(\mathbf{{u}}^k-\mathbf{{u}}^{k+1}) \big \rangle \end{aligned}$$

(42)

for any $\mathbf{{u}}\in \mathcal{{U}}$, where H is given by (8). Moreover, we have

$$\begin{aligned}{} & {} \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}\big \rangle \nonumber \\{} & {} \ge \frac{1}{2}\Big ( \big \Vert \mathbf{{u}}-\mathbf{{u}}^{k+1}\big \Vert ^2_H-\big \Vert \mathbf{{u}}-\mathbf{{u}}^k\big \Vert ^2_H\Big )+\frac{1}{2} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H. \end{aligned}$$

(43)

Proof. According to the first-order optimality condition of the $\mathbf{{x}}$-subproblem in N-PDHG1, we have $\mathbf{{x}}^{k+1}\in \mathcal{{X}}$ and

$$\begin{aligned} \theta _1(\mathbf{{x}})- \theta _1(\mathbf{{x}}^{k+1}) + \left\langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, -A^\mathsf{{T}}\mathbf{{y}}^k+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\right\rangle \ge 0, ~ \forall \mathbf{{x}} \in \mathcal{{X}},\qquad \end{aligned}$$

(44)

that is,

$$\begin{aligned}{} & {} \theta _1(\mathbf{{x}})- \theta _1(\mathbf{{x}}^{k+1}) + \left\langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, -A^\mathsf{{T}}\mathbf{{y}}^{k+1} \right\rangle \nonumber \\{} & {} \quad \ge \Big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})+ A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) \Big \rangle . \end{aligned}$$

(45)

Similarly, we have $\mathbf{{y}}^{k+1}\in \mathcal{{Y}}$ and

$$\begin{aligned} \theta _2(\mathbf{{y}})- \theta _2(\mathbf{{y}}^{k+1}) + \Big \langle \mathbf{{y}}-\mathbf{{y}}^{k+1}, A (2\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)+\frac{1}{r} (\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)\Big \rangle \ge 0, ~ \forall \mathbf{{y}} \in \mathcal{{Y}}, \end{aligned}$$

(46)

that is,

$$\begin{aligned} \theta _2(\mathbf{{y}})- \theta _2(\mathbf{{y}}^{k+1}) + \Big \langle \mathbf{{y}}-\mathbf{{y}}^{k+1}{,} A^\mathsf{{T}}\mathbf{{x}}^{k+1} \Big \rangle \ge \Big \langle \mathbf{{y}}-\mathbf{{y}}^{k+1}, A (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})+ \frac{1}{r} (\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) \Big \rangle .\nonumber \\ \end{aligned}$$

(47)

Combine the inequalities (45)–(47) and the structure of H given by (8) to have

$$\begin{aligned} \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}^{k+1}\Big \rangle \ge \Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, H(\mathbf{{u}}^k-\mathbf{{u}}^{k+1}) \Big \rangle , \end{aligned}$$

which together with the the property $\left\langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M( \mathbf{{u}}-\mathbf{{u}}^{k+1}) \right\rangle =0$ confirms (42). Then, the inequality (43) is obtained by applying (42) and the identity in (21). $\blacksquare $

Now, we discuss the global convergence and sublinear convergence rate of N-PDHG1. Let $\mathbf{{u}}^*=(\mathbf{{x}}^*;\mathbf{{y}}^*)\in \mathcal{{U}}$ be a solution point of the problem (41). Then, it holds

$$\begin{aligned} {\Phi (\mathbf{{x}}^*,\mathbf{{y}})\le \Phi (\mathbf{{x}}^*,\mathbf{{y}}^*)\le \Phi (\mathbf{{x}},\mathbf{{y}}^*), \quad \forall \mathbf{{x}}\in \mathcal{{X}}, \mathbf{{y}}\in \mathcal{{Y}}}, \end{aligned}$$

namely,

$$\begin{aligned} \left\{ \begin{array}{lllll} \mathbf{{x}}^*\in \mathcal{{X}}, &{}\theta _1(\mathbf{{x}})- \theta _1(\mathbf{{x}}^*) &{}+ &{}\langle \mathbf{{x}}-\mathbf{{x}}^*, -A^\mathsf{{T}}\mathbf{{y}}^*\rangle \ge 0, &{}\forall \mathbf{{x}} \in \mathcal{{X}},\\ \mathbf{{y}}^*\in \mathcal{{Y}}, &{}\theta _2(\mathbf{{y}})- \theta _2(\mathbf{{y}}^*) &{}+ &{}\langle \mathbf{{y}}-\mathbf{{y}}^*, A \mathbf{{x}}^*\rangle \ge 0, &{}\forall \mathbf{{x}} \in \mathcal{{Y}}. \end{array}\right. \end{aligned}$$

So, finding a solution point of (41) amounts to finding $\mathbf{{u}}^*\in \mathcal{{U}}$ such that

$$\begin{aligned} \mathbf{{u}}^*\in \mathcal{{U}},~~~ \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^*) +\left\langle \mathbf{{u}}-\mathbf{{u}}^*, M\mathbf{{u}}^*\right\rangle \ge 0, \quad \forall \mathbf{{u}}\in \mathcal{{U}}. \end{aligned}$$

(48)

Setting $\mathbf{{u}}:=\mathbf{{u}}^*$ in (43) together with (48) gives

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H - \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H, \end{aligned}$$

(49)

that is, the sequence generated by N-PDHG1 is contractive, and thus N-PDHG1 converges globally. The last inequality together with the analysis of P-rALM indicates that N-PDHG1 with a relaxation step also converges, and the sublinear convergence rate of N-PDHG1 is similar to the proof of P-rALM. Note that the convergence of N-PDHG1 does not need the strong convexity of $\theta _1$ and allows more flexibility for choosing the proximal parameter r.

Finally, it is not difficult from the first-order optimality conditions of the involved subproblems in N-PDHG2 that

$$\begin{aligned} \mathbf{{u}}^{k+1}\in \mathcal{{U}},~~ \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}\Big \rangle \ge \Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, \widetilde{H}(\mathbf{{u}}^k-\mathbf{{u}}^{k+1}) \Big \rangle \end{aligned}$$

for any $\mathbf{{u}}\in \mathcal{{U}}$, where

$$\begin{aligned} \widetilde{H}=\left[ \begin{array}{ccccccc} r\mathbf{{I}} &{}&{} A^\mathsf{{T}}\\ A &{}&{} \frac{1}{r}AA^\mathsf{{T}}+Q \end{array}\right] \end{aligned}$$

and $\widetilde{H}$ is positive definite for any $r>0$ and $Q\succ \mathbf{{0}}$. So, N-PDHG2 also converges globally with a sublinear convergence rate. This matrix $\widetilde{H}$ is what we discussed in Sect. 1 and could reduce to that in [22] with $Q=\delta \mathbf{{I}}$ for any $\delta >0$.

1.2 Linear Convergence Under Strongly Convexity Assumption

The linear convergence rate of N-PDHG1 will be investigated in this subsection under the following assumptions:

(a1)
The matrix A has full row rank and $\mathcal{{X}}=\mathcal{{R}}^n;$
(a2)
The function $\theta _1$ is strongly convex with modulus $\nu >0$ and $\nabla \theta _1$ is Lipschitz continuous with constant $L_{\theta _1}>0$.

From the second part of (a2) and the first-order optimality condition of $\mathbf{{x}}^{k+1}$-subproblem in N-PDHG1, we have

$$\begin{aligned} -\nabla \theta _1(\mathbf{{x}}^{k+1})=-A^\mathsf{{T}}\mathbf{{y}}^k+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k). \end{aligned}$$

(50)

Together with this equation and the first part of (a2), it holds

$$\begin{aligned}{} & {} \theta _1(\mathbf{{x}})-\theta _1(\mathbf{{x}}^{k+1})\ge \big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, \nabla \theta _1(\mathbf{{x}}^{k+1})\big \rangle +\frac{\nu }{2}\big \Vert \mathbf{{x}}-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \Rightarrow \theta _1(\mathbf{{x}})-\theta _1(\mathbf{{x}}^{k+1}) +\big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, -A^\mathsf{{T}}\mathbf{{y}}^{k+1}\big \rangle \ge \frac{\nu }{2}\big \Vert \mathbf{{x}}-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \qquad +\big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})+ A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) \big \rangle , \end{aligned}$$

which implies that $\frac{\nu }{2}\left\| \mathbf{{x}}-\mathbf{{x}}^{k+1} \right\| ^2$ will be added to the right-hand-side of (42) and finally

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H - \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H - \nu \big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1} \big \Vert ^2. \end{aligned}$$

(51)

Note that the equation (50) can be equivalently rewritten as

$$\begin{aligned} A^\mathsf{{T}}\mathbf{{y}}^{k+1}= \nabla \theta _1(\mathbf{{x}}^{k+1})+A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k). \end{aligned}$$

(52)

Besides, the solution $(\mathbf{{x}}^*;\mathbf{{y}}^*)$ satisfies

$$\begin{aligned} \nabla \theta _1(\mathbf{{x}}^*)= A^\mathsf{{T}}\mathbf{{y}}^*. \end{aligned}$$

(53)

Combining the equations (52)–(53) together with (a1)-(a2) is to obtain

$$\begin{aligned}{} & {} \sigma _A\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^*\big \Vert ^2 \le \big \Vert A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^*)\big \Vert ^2 \\{} & {} \quad = \big \Vert \nabla \theta _1(\mathbf{{x}}^{k+1})- \nabla \theta _1(\mathbf{{x}}^*) + A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2 \\{} & {} \quad \le 3\Big \{\big \Vert \nabla \theta _1(\mathbf{{x}}^{k+1})- \nabla \theta _1(\mathbf{{x}}^*) \big \Vert ^2 +\big \Vert A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k) \big \Vert ^2+\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2\Big \}\\{} & {} \quad \le 3\Big \{L_{\theta _1}^2\big \Vert \mathbf{{x}}^{k+1} - \mathbf{{x}}^* \big \Vert ^2 +\Vert A \Vert ^2\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2+\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2\Big \}, \end{aligned}$$

where $\sigma _A>0$ denotes the smallest eigenvalue of $AA^\mathsf{{T}}$ due to (a1). So, we have

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le & {} \Vert H\Vert \Big \{\big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1}\big \Vert ^2+\big \Vert \mathbf{{y}}^*-\mathbf{{y}}^{k+1}\big \Vert ^2\Big \}\nonumber \\\le & {} \Vert H\Vert \Big \{(1+3L_{\theta _1}^2\sigma _A^{-1})\big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1}\big \Vert ^2+3\sigma _A^{-1}\Vert A \Vert ^2\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2 \nonumber \\{} & {} \quad ~ + 3\sigma _A^{-1}\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2\Big \}. \end{aligned}$$

(54)

By the structure of H and the Young’s inequality, it holds that

$$\begin{aligned}{} & {} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H\nonumber \\{} & {} \quad = \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2+ \frac{1}{r} \big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2+2\big \langle \mathbf{{x}}^{k+1}-\mathbf{{x}}^k, A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)\big \rangle \nonumber \\{} & {} \quad \ge \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2+ \frac{1}{r} \big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2\nonumber \\{} & {} \quad \quad -\Big \{ \delta _0\big \Vert \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \big \Vert ^2+\frac{1}{\delta _0}\Vert A^\mathsf{{T}}A\Vert \big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2\Big \}, \nonumber \\{} & {} \quad \ge \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2- \delta _0\big \Vert \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \big \Vert ^2+\Big ( \frac{1}{r}-\frac{ \Vert A^\mathsf{{T}}A\Vert }{\delta _0}\Big )\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2,\quad \nonumber \\ \end{aligned}$$

(55)

where $ \delta _0 \in ( r\Vert A^\mathsf{{T}}A\Vert , \Vert rA^\mathsf{{T}}A+Q\Vert ^2)$ exists for proper choices of r and Q. Now, let

$$\begin{aligned} \delta ^k= \min \left\{ \frac{\nu }{(1+3L_{\theta _1}^2\sigma _A^{-1})\Vert H\Vert },~ \frac{\delta _0-r\Vert A^\mathsf{{T}}A\Vert }{3r\delta _0\sigma _A^{-1}\Vert A\Vert ^2\Vert H\Vert },~ \frac{\Vert rA^\mathsf{{T}}A+Q\Vert ^2-\delta _0}{3 \sigma _A^{-1}\Vert H\Vert \left\| \left( rA^\mathsf{{T}}A+Q\right) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\right\| ^2} \right\} . \end{aligned}$$

Then, combining the above inequalities (51) and (54)-(55), we can deduce

$$\begin{aligned}{} & {} (1+\delta ^k)\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H-\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H \nonumber \\{} & {} \quad \le \delta ^k\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H- \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H - \nu \big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \quad \le \Big \{\delta ^k(1+3L_{\theta _1}^2 \sigma _A^{-1})\Vert H\Vert -\nu \Big \}\big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \quad +\Big \{3\delta ^k\sigma _A^{-1}\Vert A \Vert ^2\Vert H\Vert -\frac{1}{r}+\frac{ \Vert A^\mathsf{{T}}A\Vert }{\delta _0}\Big \}\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2\nonumber \\{} & {} \qquad +\big (3\delta ^k\sigma _A^{-1}\Vert H\Vert -1\big )\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2+\delta _0\big \Vert \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \big \Vert ^2. \end{aligned}$$

(56)

Observing from the definition of $\delta ^k$, it holds

$$\begin{aligned} \left\{ \begin{array}{lllll} \delta ^k(1+3L_{\theta _1}^2\sigma _A^{-1})\Vert H\Vert -\nu \le 0,\\ 3\delta ^k\sigma _A^{-1}\Vert A \Vert ^2\Vert H\Vert -\frac{1}{r}+\frac{\Vert A^\mathsf{{T}}A\Vert }{\delta _0}\le 0,\\ \left( 3\delta ^k\sigma _A^{-1}\Vert H\Vert -1\right) \left\| \left( rA^\mathsf{{T}}A+Q\right) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\right\| ^2+\delta _0\left\| \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \right\| ^2\le 0, \end{array}\right. \end{aligned}$$

and finally ensures the following Q-linear convergence rate:

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le \frac{1}{ 1+\delta ^k }\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H. \end{aligned}$$

The above analysis also indicates that our proposed P-rALM for solving the problem (1) will converge Q-linearly under the similar assumptions that $\theta $ is strongly convex, its gradient $\nabla \theta $ is Lipschitz continuous, the matrix A has full row rank and $\mathcal{{X}}=\mathcal{{R}}^n$.

1.3 Linear Convergence Under The Error Bound Condition

In this section, we use $\partial f(x)$ to denote the sub-differential of the convex function f at x. f is said to be a piecewise linear multifunction if its graph $Gr(f):=\{(x,y)\mid y\in f(x)\}$ is a union of finitely many polyhedra. The projection operator $\mathcal{{P}}_{\mathcal{{C}}}(x)$ is nonexpansive, i.e.,

$$\begin{aligned} \Vert \mathcal{{P}}_{\mathcal{{C}}}(x)-\mathcal{{P}}_{\mathcal{{C}}}(z)\Vert \le \Vert x-z\Vert ,\quad \forall x,z\in \mathcal{{R}}^n. \end{aligned}$$

(57)

Given $H\succ \mathbf{{0}}$, we define ${\text {dist}}_{H}(x, \mathcal{{C}}):=\min \limits _{z\in \mathcal{{C}}}\Vert x-z\Vert _H$. When $H=\mathbf{{I}}$, we simply denote it by ${\text {dist}}(x, \mathcal{{C}})$. For any $\mathbf{{u}}\in \mathcal{{U}}$ and $\alpha >0$, we define

$$\begin{aligned} e_{\mathcal{{U}}}(\mathbf{{u}},\alpha ):=\left( \begin{array}{c} e_{\mathcal{{X}}}(\mathbf{{u}},\alpha ):= \mathbf{{x}}-\mathcal{{P}}_{\mathcal{{X}}}\left[ \mathbf{{x}} - \alpha (\xi _{\mathbf{{x}}} -A^\mathsf{{T}}\mathbf{{y}}{)}\right] \\ e_{\mathcal{{Y}}}(\mathbf{{u}},\alpha ):= \mathbf{{y}}-\mathcal{{P}}_{\mathcal{{Y}}}\left[ \mathbf{{y}} - \alpha (\xi _{\mathbf{{y}}} +A \mathbf{{x}}{)}\right] \end{array}\right) , \end{aligned}$$

(58)

where $\xi _{\mathbf{{x}}}\in \partial \theta _1(\mathbf{{x}}),\xi _{\mathbf{{y}}}\in \partial \theta _2(\mathbf{{y}}).$ Note that a point

$$\begin{aligned} \mathbf{{u}}^*\in \mathcal{{U}}^* = \big \{\hat{\mathbf{{u}}}\in \mathcal{{U}}\mid {\text {dist}}\left( \textbf{0},e_{\mathcal{{U}}}(\hat{\mathbf{{u}}},\alpha )\right) =0\big \} \end{aligned}$$

is the solution of (41) if and only if $ e_{\mathcal{{U}}}(\mathbf{{u}}^*,\alpha )=\mathbf{{0}}$. Different from the assumptions (a1)-(a2), we next investigate the linear convergence rate of N-PDHG1 under an error bound condition in terms of the mapping $e_{\mathcal{{U}}}(\mathbf{{u}},1)$:

(a3)
Assume that there exists a constant $\zeta >0$ such that
$$\begin{aligned} {\text {dist}}\left( \mathbf{{u}},\mathcal{{U}}^*\right) \le \zeta {\text {dist}}\left( \textbf{0},e_{\mathcal{{U}}}(\mathbf{{u}},1)\right) , ~~\forall \mathbf{{u}}\in \mathcal{{U}}. \end{aligned}$$
(59)

The condition (59) is generally weaker than the strong convexity assumption and hence can be satisfied by some problems that have non-strongly convex objective functions. Note that if the sub-differentials $\partial \theta _1(\mathbf{{x}})$ and $\partial \theta _2(\mathbf{{y}})$ are piecewise linear multifunctions and the constraint sets $\mathcal{{X}}, \mathcal{{Y}}$ are polyhedral, then both $\mathcal{{P}}_{\mathcal{{X}}}$ and $\mathcal{{P}}_{\mathcal{{Y}}}$ are piecewise linear multifunctions by [13, Prop. 4.1.4] and hence $e_{\mathcal{{U}}}(\mathbf{{u}},\alpha )$ is also a piecewise linear multifunction. Followed by Robinson’s continuity property [35] for polyhedral multifunctions, the assumption (a3) holds automatically. For convenience of the sequel analysis, we denote

$$\begin{aligned} \mathcal{{Q}} =\left[ \begin{array}{ccccccc} (rA^\mathsf{{T}}A+Q)^\mathsf{{T}}(rA^\mathsf{{T}}A+Q)+ A^\mathsf{{T}}A &{}&{} \mathbf{{0}} \\ \mathbf{{0}} &{}&{} \frac{1}{r}\mathbf{{I}}+AA^\mathsf{{T}}\end{array}\right] . \end{aligned}$$

(60)

It is easy to check that $\mathcal{{Q}}$ is symmetric positive definite because $\Vert \mathbf{{u}}\Vert ^2_{\mathcal{{Q}}}>0$ for any $\mathbf{{u}}\ne \mathbf{{0}}$. By equivalent expressions for the first-order optimality conditions (44) and (46) together with the structure of $\mathcal{{Q}}$, we have the following estimation on the distance between $\mathbf{{0}}$ and $e_{\mathcal{{U}}}(\mathbf{{u}}^{k+1},1)$, which follows the similar proof as that in [8, Sec. 2.2].

Lemma 5.2

Let $\mathcal{{Q}}$ be given in (60). Then, the iterates generated by N-PDHG1 satisfy

$$\begin{aligned} {\text {dist}}^2\big (\textbf{0},e_{\mathcal{{U}}}(\mathbf{{u}}^{k+1},1)\big )\le 2 \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert _{\mathcal{{Q}}}^2. \end{aligned}$$

(61)

Proof. The first-order optimality condition in (44) implies

$$\begin{aligned} \mathbf{{x}}^{k+1}=\mathcal{{P}}_{\mathcal{{X}}}\left\{ \mathbf{{x}}^{k+1}-\Big [\xi _{\mathbf{{x}}}^{k+1} -A^\mathsf{{T}}\mathbf{{y}}^k+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\Big ]\right\} . \end{aligned}$$

Combine it with the definition of ${\text {dist}}_H(\cdot ,\cdot )$ and the property in (57) to obtain

$$\begin{aligned}{} & {} {\text {dist}}^2\left( \textbf{0},e_{\mathcal{{X}}}(\mathbf{{u}}^{k+1},1)\right) = {\text {dist}}^2\left( \mathbf{{x}}^{k+1}, \mathcal{{P}}_{\mathcal{{X}}}\left\{ \mathbf{{x}}^{k+1}-\big [\xi _{\mathbf{{x}}}^{k+1} -A^\mathsf{{T}}\mathbf{{y}}^{k+1}\big ]\right\} \right) \nonumber \\{} & {} \quad \le \Big \Vert A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) + \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})\Big \Vert ^2 \nonumber \\{} & {} \quad \le 2\Big ( \big \Vert A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1})\big \Vert ^2+ \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})\big \Vert ^2\Big )=2 \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_{\mathcal{{Q}}_1}, \end{aligned}$$

(62)

where $\mathcal{{Q}}_1={\text {diag}}\left( (rA^\mathsf{{T}}A+Q)^\mathsf{{T}}(rA^\mathsf{{T}}A+Q),AA^\mathsf{{T}}\right) .$ Similarly, we have from (46) that

$$\begin{aligned} \mathbf{{y}}^{k+1}=\mathcal{{P}}_{\mathcal{{Y}}}\left\{ \mathbf{{y}}^{k+1}-\Big [\xi _{\mathbf{{y}}}^{k+1}+ A (2\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)+\frac{1}{r} (\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)\Big ]\right\} \end{aligned}$$

and

$$\begin{aligned}{} & {} {\text {dist}}^2\big (\textbf{0},e_{\mathcal{{Y}}}(\mathbf{{u}}^{k+1},1)\big ) = {\text {dist}}^2\Big (\mathbf{{y}}^{k+1}, \mathcal{{P}}_{\mathcal{{Y}}}\big \{\mathbf{{y}}^{k+1}-\big [\xi _{\mathbf{{y}}}^{k+1} +A \mathbf{{x}}^{k+1}\big ]\big \}\Big ) \nonumber \\{} & {} \quad \le \Big \Vert A (\mathbf{{x}}^k-\mathbf{{x}}^{k+1}) +\frac{1}{r} (\mathbf{{y}}^k-\mathbf{{y}}^{k+1})\Big \Vert ^2 \nonumber \\{} & {} \quad \le 2\Big ( \big \Vert A (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})\big \Vert ^2+ \big \Vert \frac{1}{r} (\mathbf{{y}}^k-\mathbf{{y}}^{k+1})\big \Vert ^2\Big )=2 \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_{\mathcal{{Q}}_2}, \end{aligned}$$

(63)

where $\mathcal{{Q}}_2={\text {diag}}\left( A^\mathsf{{T}}A,\frac{1}{r}\mathbf{{I}}\right) .$ The inequalities (62)–(63) immediately ensure (61) due to the relation $\mathcal{{Q}}=\mathcal{{Q}}_1+\mathcal{{Q}}_2.$ $\blacksquare $

Based on Lemma 5.2 and the conclusion (49), we next provide a global linear convergence rate of N-PDHG1 with the aid of the notations $\lambda _{\min }(H)$ and $ \lambda _{\max }(H)$ which denote the smallest and largest eigenvalue of the positive definite matrix H, respectively.

Theorem 5.1

Let $\mathcal{{Q}}$ be given in (60). Then, there exists a constant $\zeta >0$ such that the iterates generated by N-PDHG1 satisfy

$$\begin{aligned} {\text {dist}}^2_{H}(\mathbf{{u}}^{k+1}, \mathcal{{U}}^*)\le \frac{1}{1+\hat{\zeta } }{\text {dist}}^2_{H}(\mathbf{{u}}^k, \mathcal{{U}}^*), \end{aligned}$$

(64)

where the constant $ \hat{\zeta }= \frac{\lambda _{\min }(H)}{2\zeta ^2\lambda _{\max }(\mathcal{{Q}})\lambda _{\max }(H)}>0. $

Proof Because $ \mathcal{{U}}^*$ is a closed convex set, there exists a $\mathbf{{u}}^*_k\in \mathcal{{U}}^*$ satisfying

$$\begin{aligned} {\text {dist}}_{H}(\mathbf{{u}}^k, \mathcal{{U}}^*) =\big \Vert \mathbf{{u}}^k-\mathbf{{u}}^*_k\big \Vert _{H}. \end{aligned}$$

(65)

By the condition (59) and Lemma 5.2 there exists a constant $\zeta >0$ such that

$$\begin{aligned} {\text {dist}}^2\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big )\le 2\zeta ^2\big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert _{\mathcal{{Q}}}^2\le \frac{2\zeta ^2\lambda _{\max }(\mathcal{{Q}})}{\lambda _{\min }(H)}\big \Vert \mathbf{{u}}^k -\mathbf{{u}}^{k+1}\big \Vert _{H}^2. \end{aligned}$$

(66)

By the definition of ${\text {dist}}_{H}(\cdot ,\cdot )$, we have

$$\begin{aligned} \frac{1}{\lambda _{\max }(H)}{\text {dist}}^2_H\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big )\le {\text {dist}}^2\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big ). \end{aligned}$$

(67)

Combine (66)–(67) and (49) to have

$$\begin{aligned}{} & {} {\text {dist}}^2_{H}(\mathbf{{u}}^{k+1}, \mathcal{{U}}^*)\le \big \Vert \mathbf{{u}}^{k+1}-\mathbf{{u}}^*_k\big \Vert _H^2 \\{} & {} \quad \le \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^*_k\big \Vert _H^2 - \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert _H^2\\{} & {} \quad \le {{\text {dist}}^2_{H}}(\mathbf{{u}}^k, \mathcal{{U}}^*)-\frac{\lambda _{\min }(H)}{2\zeta ^2\lambda _{\max }(\mathcal{{Q}})\lambda _{\max }(H)} {\text {dist}}^2_H\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big ). \end{aligned}$$

Rearranging the above inequality is to confirm (64). $\blacksquare $

Corollary 5.1

Let $\hat{\zeta }>0$ be given in Theorem 5.1 and the sequence $\{\mathbf{{u}}^k\}$ be generated by N-PDHG1. Then, there exists a point $\mathbf{{u}}^\infty \in \mathcal{{U}}^*$ such that

$$\begin{aligned} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^\infty \big \Vert _{H}\le C \epsilon ^k, \end{aligned}$$

(68)

where

$$\begin{aligned} C=\frac{2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*)}{1-\epsilon }>0\quad \text{ and }\quad \epsilon =\frac{1}{\sqrt{1+\hat{\zeta }}}\in (0,1). \end{aligned}$$

Proof Let ${\mathbf{{u}}^*}\in \mathcal{{U}}^*$ such that (65) holds and let

$$\begin{aligned} \mathbf{{u}}^{k+1}=\mathbf{{u}}^{k}+\mathbf{{d}}^k. \end{aligned}$$

(69)

Then, it follows from (49) that $ \left\| \mathbf{{u}}^{k+1}-{\mathbf{{u}}^*}\right\| _{H}\le \left\| \mathbf{{u}}^k-{\mathbf{{u}}^*}\right\| _{H} $ which further implies

$$\begin{aligned} \big \Vert \mathbf{{d}}^k\big \Vert _H= & {} \big \Vert \mathbf{{u}}^{k+1}-\mathbf{{u}}^k\big \Vert _{H} \le \big \Vert \mathbf{{u}}^{k+1}-{\mathbf{{u}}^*}\big \Vert _{H}+ \big \Vert \mathbf{{u}}^k-{\mathbf{{u}}^*}\big \Vert _{H} \nonumber \\\le & {} 2\big \Vert \mathbf{{u}}^k-{\mathbf{{u}}^*}\big \Vert _{H} = 2{\text {dist}}_{H}(\mathbf{{u}}^k, \mathcal{{U}}^*)\nonumber \\\le & {} 2 \epsilon ^k {\text {dist}}_{H}\big (\mathbf{{u}}^0, \mathcal{{U}}^*\big ), \end{aligned}$$

(70)

where the final inequality follows from (64). Because the sequence $\{\mathbf{{u}}^k\} $ generated by N-PDHG1 converges to a $ \mathbf{{u}}^\infty \in \mathcal{{U}}^*$, we have from (69) that $ \mathbf{{u}}^\infty =\mathbf{{u}}^k+\sum _{j=k}^{\infty }\mathbf{{d}}^j $, which by (70) indicates

$$\begin{aligned} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^\infty \big \Vert _{H}\le & {} \sum \limits _{j=k}^{\infty }\Vert \mathbf{{d}}^j\Vert _{H} \le 2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*)\sum \limits _{j=k}^{\infty }\epsilon ^{j} \\= & {} 2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*) \epsilon ^k \sum \limits _{j=0}^{\infty }\epsilon ^{j} \le \epsilon ^k \Big [2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*)\frac{1}{1-\epsilon }\Big ]. \end{aligned}$$

So, the inequality (68) holds, that is, $\mathbf{{u}}^k$ converges $\mathbf{{u}}^\infty $ R-linearly. $\blacksquare $

Remark 5.1

Consider the following general saddle-point problem

$$\begin{aligned} \min \limits _{\mathbf{{x}}\in \mathcal{{X}}}\max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (\mathbf{{x,y}}):= f(\mathbf{{x}})+\theta _1(\mathbf{{x}})-\mathbf{{y}}^\mathsf{{T}}A\mathbf{{x}}-\theta _2(\mathbf{{y}}), \end{aligned}$$

or, equivalently, the composite problem $ \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \big \{f(\mathbf{{x}}) + \theta _1(\mathbf{{x}}) + \theta _2^*(-A\mathbf{{x}})\big \},$ where $f(\mathbf{{x}}): {\mathcal{{X}}}\rightarrow \mathcal{{R}}$ is a smooth convex function and its gradient is Lipschitz continuous with constant $L_f$, and the remaining notations have the same meanings as before. For this problem, similar to the previous case 2 in Sect. 2.3 we can develop the following iterative scheme

$$\begin{aligned} \left\{ \begin{array}{lllll} \mathbf{{x}}^{k+1}=\arg \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \theta _1(\mathbf{{x}}) + \big \langle \nabla f(\mathbf{{x}}^k)-A^\mathsf{{T}}\mathbf{{y}}^k, \mathbf{{x}}\big \rangle + \frac{1}{2}\left\| \mathbf{{x}}-\mathbf{{x}}^{k}\right\| ^2_{rA^\mathsf{{T}}A+ Q},\\ \mathbf{{y}}^{k+1}=\arg \max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (2\mathbf{{x}}^{k+1}-\mathbf{{x}}^k,\mathbf{{y}})-\frac{1}{2r}\Vert \mathbf{{y}}-\mathbf{{y}}^k\Vert ^2. \end{array}\right. \end{aligned}$$

Its global convergence and linear convergence rate can be also established by the above analysis.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bai, J., Jia, L. & Peng, Z. A New Insight on Augmented Lagrangian Method with Applications in Machine Learning. J Sci Comput 99, 53 (2024). https://doi.org/10.1007/s10915-024-02518-0

Download citation

Received: 05 September 2023
Revised: 17 February 2024
Accepted: 28 February 2024
Published: 13 April 2024
DOI: https://doi.org/10.1007/s10915-024-02518-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On a new approach for Lagrangian support vector regression

Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data

Alternating Proximal Gradient Method for Convex Minimization

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Discussions on Two New PDHG

1.1 Sublinear Convergence Under General Convex Assumption

Lemma 5.1

1.2 Linear Convergence Under Strongly Convexity Assumption

1.3 Linear Convergence Under The Error Bound Condition

Lemma 5.2

Theorem 5.1

Corollary 5.1

Remark 5.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On a new approach for Lagrangian support vector regression

Gradient-type penalty method with inertial effects for solving constrained convex optimization problems with smooth data

Alternating Proximal Gradient Method for Convex Minimization

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Discussions on Two New PDHG

Appendix: Discussions on Two New PDHG

1.1 Sublinear Convergence Under General Convex Assumption

Lemma 5.1

1.2 Linear Convergence Under Strongly Convexity Assumption

1.3 Linear Convergence Under The Error Bound Condition

Lemma 5.2

Theorem 5.1

Corollary 5.1

Remark 5.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation