Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization

Hajinezhad, Davood; Hong, Mingyi

doi:10.1007/s10107-019-01365-4

Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization

Full Length Paper
Series B
Published: 20 February 2019

Volume 176, pages 207–245, (2019)
Cite this article

Mathematical Programming Submit manuscript

2619 Accesses
38 Citations
Explore all metrics

Abstract

In this paper, we propose a perturbed proximal primal–dual algorithm (PProx-PDA) for an important class of linearly constrained optimization problems, whose objective is the sum of smooth (possibly nonconvex) and convex (possibly nonsmooth) functions. This family of problems can be used to model many statistical and engineering applications, such as high-dimensional subspace estimation and the distributed machine learning. The proposed method is of the Uzawa type, in which a primal gradient descent step is performed followed by an (approximate) dual gradient ascent step. One distinctive feature of the proposed algorithm is that the primal and dual steps are both perturbed appropriately using past iterates so that a number of asymptotic convergence and rate of convergence results (to first-order stationary solutions) can be obtained. Finally, we conduct extensive numerical experiments to validate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid and Inexact Algorithm for Nonconvex and Nonsmooth Optimization

Article 10 July 2024

An Alternating Proximal Gradient Algorithm for Nonsmooth Nonconvex-Linear Minimax Problems with Coupled Linear Constraints

Article 01 July 2024

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Article 22 March 2022

Notes

http://people.ece.umn.edu/~mhong/PProx_PDA.pdf.

References

Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: Proceedings of the 33rd International Conference on Machine Learning, ICML, pp. 699–707 (2016)
Ames, B., Hong, M.: Alternating directions method of multipliers for l1-penalized zero variance discriminant analysis and principal component analysis. Comput. Optim. Appl. 64(3), 725–754 (2016)
Article MathSciNet MATH Google Scholar
Andreani, R., Haeser, G., Martnez, J.M.: On sequential optimality conditions for smooth constrained optimization. Optimization 60(5), 627–641 (2011)
Article MathSciNet MATH Google Scholar
Antoniadis, A., Gijbels, I., Nikolova, M.: Penalized likelihood regression for generalized linear models with non-quadratic penalties. Ann. Inst. Stat. Math. 63(3), 585–615 (2009)
Article MathSciNet MATH Google Scholar
Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford University Press, Palo Alto (1958)
MATH Google Scholar
Asteris, M., Papailiopoulos, D., Dimakis, A.: Nonnegative sparse PCA with provable guarantees. In: Proceedings of the 31st International Conference on Machine Learning (ICML), vol. 32, pp. 1728–1736 (2014)
Aybat, N.S., Hamedani, E.Y.: A primal–dual method for conic constrained distributed optimization problems. Adv. Neural Inf. Process. Syst. (NIPS) 5049–5057 (2016)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Method. Academic Press, Cambridge (1982)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, 2nd edn. Athena Scientific, Belmont (1997)
MATH Google Scholar
Bianchi, P., Jakubowicz, J.: Convergence of a multi-agent projected stochastic gradient algorithm for non-convex optimization. IEEE Trans. Autom. Control 58(2), 391–405 (2013)
Article MathSciNet MATH Google Scholar
Birgin, E., Martínez, J.: Practical Augmented Lagrangian Methods for Constrained Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2014)
Book MATH Google Scholar
Björnson, E., Jorswieck, E.: Optimal resource allocation in coordinated multi-cell systems. Found. Trends Commun. Inf. Theory 9, 113–381 (2013)
Article MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Burachik, R.S., Kaya, C.Y., Mammadov, M.: An inexact modified subgradient algorithm for nonconvex optimization. Comput. Optim. Appl. 45(1), 1–24 (2008)
Article MathSciNet MATH Google Scholar
Chung, F.R.K.: Spectral Graph Theory. The American Mathematical Society, Providence (1997)
MATH Google Scholar
Cressie, N.: Statistics for Spatial Data. Wiley, Hoboken (2015)
MATH Google Scholar
Curtis, F.E., Gould, N.I.M., Jiang, H., Robinson, D.P.: Adaptive augmented Lagrangian methods: algorithms and practical numerical experience. Optim. Methods Softw. 31(1), 157–186 (2016)
Article MathSciNet MATH Google Scholar
D’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)
Article MathSciNet MATH Google Scholar
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Article MathSciNet MATH Google Scholar
Dutta, J., Deb, K., Tulshyan, R., Arora, R.: Approximate KKT points and a proximity measure for termination. J. Glob. Optim. 56(4), 1463–1499 (2013)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fernández, D., Solodov, M.V.: Local convergence of exact and inexact augmented Lagrangian methods under the second-order sufficient optimality condition. SIAM J. Optim. 22(2), 384–407 (2012)
Article MathSciNet MATH Google Scholar
Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, Hoboken (2003)
Book MATH Google Scholar
Forero, P.A., Cano, A., Giannakis, G.B.: Distributed clustering using wireless sensor networks. IEEE J. Sel. Top. Signal Proces. 5(4), 707–724 (2011)
Article Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Giannakis, G.B., Ling, Q., Mateos, G., Schizas, I.D., Zhu, H.: Decentralized learning for wireless communications and networking. In: Splitting Methods in Communication and Imaging. Springer, New York (2015)
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problémes de dirichlet non linéares. Revue Franqaise d’Automatique, Informatique et Recherche Opirationelle 9, 41–76 (1975)
MATH Google Scholar
Gu, Q., Z. Wang, Z., Liu, H.: Sparse PCA with oracle property. In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), pp. 1529–1537 (2014)
Haeser, G., Melo, V.: On sequential optimality conditions for smooth constrained optimization. Preprint (2013)
Hajinezhad, D., Chang, T.H., Wang, X., Shi, Q., Hong, M.: Nonnegative matrix factorization using ADMM: algorithm and convergence analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4742–4746 (2016)
Hajinezhad, D., Hong, M.: Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis. In: IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE (2015)
Hajinezhad, D., Hong, M., Garcia, A.: Zeroth order nonconvex multi-agent optimization over networks. arXiv preprint arXiv:1710.09997 (2017)
Hajinezhad, D., Hong, M., Zhao, T., Wang, Z.: NESTT: A nonconvex primal–dual splitting method for distributed and stochastic optimization. In: Advances in Neural Information Processing Systems (NIPS), pp. 3215–3223 (2016)
Hajinezhad, D., Shi, Q.: Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications. J. Glob. Optim. 70, 1–28 (2018)
Article MathSciNet MATH Google Scholar
Hamdi, A., Mishra, S.K.: Decomposition Methods Based on Augmented Lagrangians: A Survey, pp. 175–203. Springer, New York (2011)
MATH Google Scholar
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Appl. 4, 303–320 (1969)
Article MathSciNet MATH Google Scholar
Hong, M., Hajinezhad, D., Zhao, M.M.: Prox-PDA: the proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In: Proceedings of the 34th International Conference on Machine Learning (ICML), (70), pp. 1529–1538 (2017)
Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1), 165–199 (2017)
Article MathSciNet MATH Google Scholar
Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Article MathSciNet MATH Google Scholar
Houska, B., Frasch, J., Diehl, M.: An augmented Lagrangian based algorithm for distributed nonconvex optimization. SIAM J. Optim. 26(2), 1101–1127 (2016)
Article MathSciNet MATH Google Scholar
Koppel, A., Sadler, B.M., Ribeiro, A.: Proximity without consensus in online multi-agent optimization. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3726–3730 (2016)
Koshal, J., Nedić, A., Shanbhag, Y.V.: Multiuser optimization: distributed algorithms and error analysis. SIAM J. Optim. 21(3), 1046–1081 (2011)
Article MathSciNet MATH Google Scholar
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1), 511–547 (2015)
MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)
Article MathSciNet MATH Google Scholar
Liao, W., Hong, M., Farmanbar, H., Luo, Z.: Semi-asynchronous routing for large scale hierarchical networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2894–2898 (2015)
Liavas, A.P., Sidiropoulos, N.D.: Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. IEEE Trans. Signal Process. 63(20), 5450–5463 (2015)
Article MathSciNet MATH Google Scholar
Liu, Y.F., Liu, X., Ma, S.: On the non-ergodic convergence rate of an inexact augmented Lagrangian framework for composite convex programming. arXiv preprint arXiv:1603.05738 (2016)
Lobel, I., Ozdaglar, A.: Distributed subgradient methods for convex optimization over random networks. IEEE Trans. Autom. Control 56(6), 1291–1306 (2011)
Article MathSciNet MATH Google Scholar
Lorenzo, P.D., Scutari, G.: NEXT: in-network nonconvex optimization. IEEE Trans. Signal Inf. Process Over Netw. 2(2), 120–136 (2016)
Article MathSciNet Google Scholar
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23(4), 2448–2478 (2013)
Article MathSciNet MATH Google Scholar
Mateos, G., Bazerque, J.A., Giannakis, G.B.: Distributed sparse linear regression. IEEE Trans. Signal Process. 58(10), 5262–5276 (2010)
Article MathSciNet MATH Google Scholar
Max L.N. Goncalves, J.G.M., Monteiro, R.D.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). Preprint arXiv:1702.01850
Nedić, A., Olshevsky, A.: Distributed optimization over time-varying directed graphs. IEEE Trans. Autom. Control 60(3), 601–615 (2015)
Article MathSciNet MATH Google Scholar
Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
Article MathSciNet MATH Google Scholar
Nedić, A., Ozdaglar, A., Parrilo, P.A.: Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, Berlin (2004)
Book MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin (1999)
Book MATH Google Scholar
Powell, M.M.D.: An efficient method for nonlinear constraints in minimization problems. In: Optimization. Academic Press, pp. 283–298 (1969)
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Article MathSciNet MATH Google Scholar
Ruszczyński, A.: Nonlinear Optimization. Princeton University, Princeton (2011)
Book MATH Google Scholar
Schizas, I., Ribeiro, A., Giannakis, G.: Consensus in ad hoc WSNs with noisy links—part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process. 56(1), 350–364 (2008)
Article MathSciNet MATH Google Scholar
Scutari, G., Facchinei, F., Song, P., Palomar, D.P., Pang, J.S.: Decomposition by partial linearization: parallel optimization of multi-agent systems. IEEE Trans. Signal Process. 63(3), 641–656 (2014)
Article MathSciNet MATH Google Scholar
Shi, W., Ling, Q., Wu, G., Yin, W.: EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2014)
Article MathSciNet MATH Google Scholar
Sun, Y., Scutari, G., Palomar, D.: Distributed nonconvex multiagent optimization over time-varying networks. In: 50th Asilomar Conference on Signals, Systems and Computers, pp. 788–794 (2016)
Tsitsiklis, J.: Problems in decentralized decision making and computation. Ph.D. thesis, Massachusetts Institute of Technology (1984)
Vu, V.Q., Cho, J., Lei, J., Rohe, K.: Fantope projection and selection: a near-optimal convex relaxation of sparse PCA. In: Advances in Neural Information Processing Systems (NIPS), pp. 2670–2678 (2013)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
Article MathSciNet MATH Google Scholar
Wright, S.J.: Implementing proximal point methods for linear programming. J. Optim. Theory Appl. 65(3), 531–554 (1990)
Article MathSciNet MATH Google Scholar
Wen, Z., Yang, C., Liu, X., Marchesini, S.: Alternating direction methods for classical and ptychographic phase retrieval. Inverse Probl. 28(11), 1–18 (2012)
Article MathSciNet MATH Google Scholar
Yildiz, M.E., Scaglione, A.: Coding with side information for rate-constrained consensus. IEEE Trans. Signal Process. 56(8), 3753–3764 (2008)
Article MathSciNet MATH Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhang, Y.: Convergence of a class of stationary iterative methods for saddle point problems. Preprint (2010)
Zhu, H., Cano, A., Giannakis, G.: Distributed consensus-based demodulation: algorithms and error analysis. IEEE Trans. Wirel. Commun. 9(6), 2044–2054 (2010)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Quanquan Gu who provided us with the codes of [29]. The authors would also like to thanks Dr. Gesualdo Scutari for helpful discussions about the numerical results.

Author information

Authors and Affiliations

SAS Institute, Cary, NC, USA
Davood Hajinezhad
Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, USA
Mingyi Hong

Authors

Davood Hajinezhad
View author publications
You can also search for this author in PubMed Google Scholar
Mingyi Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingyi Hong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was completed when Davood Hajinezhad was a Ph.D. student at Iowa State University. Mingyi Hong is supported by NSF Grants CMMI-1727757, and an AFOSR Grant 15RT0767.

Appendices

Appendix A

In this section, we justify Assumption [B4], which imposes the boundedness of the sequence of dual variables. Throughout this section we will assume that Assumptions A and [B1]–[B3] hold. First, we prove that when $\Vert \lambda ^{t+1}||\rightarrow \infty $, we have $\lim \inf _{r\rightarrow \infty } \frac{\beta ^{r+1}\Vert x^{r+1}-x^r\Vert }{\Vert \lambda ^{r+1}\Vert }= 0$. Using Assumption [B3] we have the following identity

$$\begin{aligned} \frac{\beta ^{r+1}\rho ^{r+1}}{2}\Vert x^{r+1}-x^r\Vert ^2 = \frac{(\beta ^{r+1})^2}{2 c_0}\Vert x^{r+1}-x^r\Vert ^2. \end{aligned}$$

(82)

Assume the contrary, that there exists $c_1>0$ such that

$$\begin{aligned} \lim _{r\rightarrow \infty }\,\, \beta ^{r+1}\Vert x^{r+1}-x^r\Vert ^2\ge \frac{c_1}{\beta ^{r+1}}\Vert \lambda ^{r+1}\Vert ^2. \end{aligned}$$

(83)

Then from (74), it is easy to show that when r is large enough, P is decreasing.

Similarly as in Lemma 3, it is relatively easy to show that the potential function is lower and upper bounded (the proof is included in Lemma 10–11 in the online version). The lower boundedness of the potential function and the fact that it is descending, implies that (75) holds true, which further implies that $\frac{1}{\beta ^{r+1}}\Vert \lambda ^{r+1}\Vert ^2\rightarrow 0$, according to (83). Examine the definition of the potential function in (73) and use the choice of c in (70) we conclude that $\frac{\beta ^{r+1}\rho ^{r+1}}{2}\Vert x^{r+1}-x^r\Vert ^2$ in the potential function is bounded. Therefore, there exists $D_1$ such that

$$\begin{aligned} \beta ^{r+1}\Vert x^{r+1}-x^r\Vert \le D_1. \end{aligned}$$

(84)

It follows that $c_1\Vert \lambda ^{r+1}\Vert ^2$ is also upper bounded. This contradicts our assumption that $\Vert \lambda ^{r}\Vert \rightarrow \infty $.

Next, we make use of some constraint qualification to argue the boundedness of the dual variables. The technique used in the proof is relatively standard, see recent works [21, 51]. Assume that the so-called Robinson’s condition is satisfied for problem (1) at ${\hat{x}}$ [62, Chap. 3]. This means $\{A d_x\mid d_x\in {\mathcal {T}}_{X}(\hat{x})\}={\mathbb {R}}^M,$ where $d_x$ is the tangent direction for convex set X, and ${\mathcal {T}}_{X}(\hat{x})$ is the tangent cone to the feasible set X at the point $\hat{x}$. Utilizing this assumption we will prove that the dual variable is bounded.

Lemma 6

Suppose the Robinson’s condition holds true for problem (1). Then the sequence of dual variable $\{\lambda ^r\}$ generated by (67b) is bounded.

Proof

We argue by contradiction. Suppose that the dual variable sequence is not bounded, i.e.,

$$\begin{aligned} \Vert \lambda ^r\Vert \rightarrow \infty . \end{aligned}$$

(85)

From the optimality condition of $x^{r+1}$ we have for all $x\in X$

$$\begin{aligned} \langle \nabla f(x^{r})+\xi ^{r+1}+A^T \lambda ^{r+1} + \beta ^{r+1}B^TB (x^{r+1}-x^r), x-x^{r+1}) \rangle \ge 0. \end{aligned}$$

Note that $\lim \inf _{r\rightarrow \infty } \frac{\beta ^{r+1}\Vert x^{r+1}-x^r\Vert }{\Vert \lambda ^{r+1}\Vert } =0$, so the following holds:

$$\begin{aligned} \lim \inf _{r\rightarrow \infty } \frac{\beta ^{r+1}\Vert B^T B(x^{r+1}-x^r)\Vert }{\Vert \lambda ^{r+1}\Vert } =0. \end{aligned}$$

Let us define a new bounded sequence as $\mu ^r = \lambda ^r/\Vert \lambda ^r\Vert , r=1,2, \ldots $. Let $(x^*, \mu ^*)$ be an accumulation point of $\{x^{r+1}, \mu ^{r+1}\}$. Assume that the Robinson’s condition holds at $x^*$. Dividing both sides of the above inequality by $\Vert \lambda ^{r+1}\Vert $ we obtain for all $x\in X$

$$\begin{aligned}&\langle \nabla f(x^{r})/\Vert \lambda ^{r+1}\Vert +\xi ^{r+1}/\Vert \lambda ^{r+1}\Vert +A^T \mu ^{r+1}\nonumber \\&\quad + \beta ^{r+1} B^T B (x^{r+1}-x^r)/\Vert \lambda ^{r+1}\Vert , x-x^{r+1} \rangle \ge 0. \end{aligned}$$

Taking the limit, passing a subsequence if necessary and utilizing the assumption that $\Vert \lambda ^{r+1}\Vert \rightarrow \infty $, and that X is a compact set, we obtain

$$\begin{aligned} \langle A^T \mu ^*, x-x^{*} \rangle \ge 0,\; \forall ~x\in X. \end{aligned}$$

Utilizing the Robinson’s condition, we know that there exists $x\in X$ and a scaling constant $c>0$ that such $cA(x-x^*) = - \mu ^*$, which combined with the above relation yields: $-c\Vert \mu ^*\Vert ^2\le 0$. Therefore we must have $\mu ^* = 0$. However, this contradicts to the fact that $\Vert \mu ^*\Vert =1$. Therefore, we conclude that $\{\lambda ^r\}$ is a bounded sequence.$\square $

Appendix B

We show how the sufficient conditions developed in “Appendix A” can be applied to the problems discussed in Sect. 1.2. We will focus on the partial consensus problem (10).

To proceed, we note that the Robinson’s condition reduces to the well-known Mangasarian–Fromovitz constraint qualification (MFCQ) if we set $X={\mathbb {R}}^{N}$, and write out explicitly the inequality constraints as $g(x)\le 0$ [62, Lemma 3.16]. To state the MFCQ, consider the following system

$$\begin{aligned} \quad&p_i(y)=0,\; i=1,\ldots , M\nonumber \\&g_j(y)\le 0,\; j=1,\ldots , P \end{aligned}$$

(86)

where $p_i:{\mathbb {R}}^N\rightarrow {\mathbb {R}}$ and $g_j:{\mathbb {R}}^N\rightarrow {\mathbb {R}}$ are all continuously differentiable functions. For a given feasible solution ${\hat{y}}$ let us use ${\mathcal {A}}({\hat{y}})$ to denote the indices for active inequality constraints, that is

$$\begin{aligned} {\mathcal {A}}({\hat{y}}):=\{1\le j\le P ~\mid ~ g_j({\hat{y}})=0\}. \end{aligned}$$

(87)

Let us define

$$\begin{aligned} p(y):=[p_1(y);p_2(y);\cdots ; p_M(y)],\quad g(y):=[g_1(y);g_2(y);\cdots ; g_P(y)]. \end{aligned}$$

Then the MFCQ holds for system (86) at point ${\hat{y}}$ if we have: 1) The rows of Jacobian matrix of p(y) denoted by $\nabla p({\hat{y}})$ are linearly independent. 2) There exists a vector $d_y\in {\mathbb {R}}^N$ such that

$$\begin{aligned} \nabla p({\hat{y}})d_y = 0, \quad \nabla g_j({\hat{y}})^Td_y<0, \; \forall ~ j\in {\mathcal {A}}({\hat{y}}). \end{aligned}$$

(88)

See [62, Lemma 3.17] for more details. In the following, we show that MFCQ holds true for problem (10) at any point (x, z) that satisfies $z\in Z$. Comparing the constraint set of this problem with system (86) we have the following specifications. The optimization variable $y=[x;z]$, where $x\in {\mathbb {R}}^N$ stacks all $x_i\in {\mathbb {R}}$ from N nodes (here we assume $x_i\in {\mathbb {R}}$ only for the ease of presentation). Also, $z\in {\mathbb {R}}^E$ stacks all $z_{e}\in {\mathbb {R}}$ for $e\in {\mathcal {E}}$. The equality constraint is written as $p(y)=[A,-I]y =0$, where $A\in {\mathbb {R}}^{E\times N}$ and I is an $E\times E$ identity matrix. Finally, for the inequality constraint we have $g_{e}(y)= |z_{e}| - \xi $, and the active set is given by ${\mathcal {A}}(y):={\mathcal {A}}^+(y)\cup {\mathcal {A}}^-(y)$, where

$$\begin{aligned} {\mathcal {A}}^+(y)=\{e\in {\mathcal {E}}\mid z_{e} = \xi \}, \quad {\mathcal {A}}^-(y)=\{e\in {\mathcal {E}}\mid z_{e} = -\xi \}. \end{aligned}$$

Without loss of generality we assume $\xi =1$. To show that MFCQ holds, consider a solution ${\hat{y}}:=({\hat{x}},{\hat{z}})$. First observe that the Jacobian of equality constraint is $\nabla p({\hat{y}})= [A,-I]$ which has full row rank. In order to verify the second condition we need to find a vector $d_y:=[d_x;d_z]\in {\mathbb {R}}^{N+E}$ such that

$$\begin{aligned}&Ad_x=d_z,\end{aligned}$$

(89a)

$$\begin{aligned}&[d_z]_{e}<0\quad \text {for} ~e\in {\mathcal {A}}^+({\hat{y}}) \end{aligned}$$

(89b)

$$\begin{aligned}&[d_z]_{e}>0\quad \text {for} ~e\in {\mathcal {A}}^-({\hat{y}}) \end{aligned}$$

(89c)

where $[d_z]_e$ denotes the eth component of vector $d_z$. Let us denote an all-one vector and all-zero vector by $\mathbf{1}$ and $\mathbf{0}$ respectively. To proceed, let us consider two different cases:

Case 1 For the vector ${\hat{z}}\in {\mathbb {R}}^E$ we have ${\hat{z}}\ne \mathbf{1}$ and ${\hat{z}}\ne -\mathbf{1}$. Let us take

$$\begin{aligned} d_z=\frac{1}{E}({\hat{z}}^T\mathbf{1})\mathbf{1} - {\hat{z}}. \end{aligned}$$

First we can show that $d_z\in \text {col}(A)$. Note that for our problem when the graph is connected, the only null space of A (which is the incidence of the graph) is spanned by the vector ${\mathbf {1}}$ [16]. Using this fact, we have $\mathbf{1}^Td_z = {\hat{z}}^T\mathbf{1} - \mathbf{1}^T{\hat{z}}=0$, therefore, $Ad_x=d_z$ holds true. Second, for $e\in {\mathcal {A}}^+({\hat{y}})$ we have that ${\hat{z}}_e=1$. Therefore, we can check that $[d_z]_e=\left[ \frac{1}{E}({\hat{z}}^T\mathbf{1})\mathbf{1} - {\hat{z}}\right] _e<0$, because $\frac{1}{E}({\hat{z}}^T\mathbf{1})\mathbf{1}<1$ from the fact that ${\hat{z}}\ne \mathbf{1}$. Condition (96b) is verified. Using similar argument we can verify condition (96c).

Case 2 Suppose we have ${\hat{z}}=\mathbf{1}$ (resp. ${\hat{z}}=-\mathbf{1}$). Since ${\hat{z}}\in \text {null}(A)$ let us set $d_x=\mathbf{0}$ and $d_z = -{\hat{z}}$ (resp. $d_z = {\hat{z}}$). First we have $Ad_x=d_z$. Second, for $e\in {\mathcal {A}}^+({\hat{y}})$ we have that $[d_z]_e<0$. Similarly, we have $[d_z]_e>0$ for $e\in {\mathcal {A}}^-({\hat{y}})$. All conditions (96a)–(96c) are verified. The above proof shows that MFCQ holds true for the sequence $\{(x^r,z^r)\}$ generated by the PProx-PDA algorithm, since in the algorithm it is always guaranteed that $z^r\in Z$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hajinezhad, D., Hong, M. Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization. Math. Program. 176, 207–245 (2019). https://doi.org/10.1007/s10107-019-01365-4

Download citation

Received: 02 November 2017
Accepted: 17 January 2019
Published: 20 February 2019
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10107-019-01365-4

Keywords

Mathematics Subject Classification

49
90

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid and Inexact Algorithm for Nonconvex and Nonsmooth Optimization

An Alternating Proximal Gradient Algorithm for Nonsmooth Nonconvex-Linear Minimax Problems with Coupled Linear Constraints

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Lemma 6

Proof

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid and Inexact Algorithm for Nonconvex and Nonsmooth Optimization

An Alternating Proximal Gradient Algorithm for Nonsmooth Nonconvex-Linear Minimax Problems with Coupled Linear Constraints

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Lemma 6

Proof

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation