Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper, we propose a perturbed proximal primal–dual algorithm (PProx-PDA) for an important class of linearly constrained optimization problems, whose objective is the sum of smooth (possibly nonconvex) and convex (possibly nonsmooth) functions. This family of problems can be used to model many statistical and engineering applications, such as high-dimensional subspace estimation and the distributed machine learning. The proposed method is of the Uzawa type, in which a primal gradient descent step is performed followed by an (approximate) dual gradient ascent step. One distinctive feature of the proposed algorithm is that the primal and dual steps are both perturbed appropriately using past iterates so that a number of asymptotic convergence and rate of convergence results (to first-order stationary solutions) can be obtained. Finally, we conduct extensive numerical experiments to validate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://people.ece.umn.edu/~mhong/PProx_PDA.pdf.

References

  1. Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: Proceedings of the 33rd International Conference on Machine Learning, ICML, pp. 699–707 (2016)

  2. Ames, B., Hong, M.: Alternating directions method of multipliers for l1-penalized zero variance discriminant analysis and principal component analysis. Comput. Optim. Appl. 64(3), 725–754 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Andreani, R., Haeser, G., Martnez, J.M.: On sequential optimality conditions for smooth constrained optimization. Optimization 60(5), 627–641 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Antoniadis, A., Gijbels, I., Nikolova, M.: Penalized likelihood regression for generalized linear models with non-quadratic penalties. Ann. Inst. Stat. Math. 63(3), 585–615 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford University Press, Palo Alto (1958)

    MATH  Google Scholar 

  6. Asteris, M., Papailiopoulos, D., Dimakis, A.: Nonnegative sparse PCA with provable guarantees. In: Proceedings of the 31st International Conference on Machine Learning (ICML), vol. 32, pp. 1728–1736 (2014)

  7. Aybat, N.S., Hamedani, E.Y.: A primal–dual method for conic constrained distributed optimization problems. Adv. Neural Inf. Process. Syst. (NIPS) 5049–5057 (2016)

  8. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Method. Academic Press, Cambridge (1982)

    MATH  Google Scholar 

  9. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  10. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, 2nd edn. Athena Scientific, Belmont (1997)

    MATH  Google Scholar 

  11. Bianchi, P., Jakubowicz, J.: Convergence of a multi-agent projected stochastic gradient algorithm for non-convex optimization. IEEE Trans. Autom. Control 58(2), 391–405 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Birgin, E., Martínez, J.: Practical Augmented Lagrangian Methods for Constrained Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2014)

    Book  MATH  Google Scholar 

  13. Björnson, E., Jorswieck, E.: Optimal resource allocation in coordinated multi-cell systems. Found. Trends Commun. Inf. Theory 9, 113–381 (2013)

    Article  MATH  Google Scholar 

  14. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  15. Burachik, R.S., Kaya, C.Y., Mammadov, M.: An inexact modified subgradient algorithm for nonconvex optimization. Comput. Optim. Appl. 45(1), 1–24 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. Chung, F.R.K.: Spectral Graph Theory. The American Mathematical Society, Providence (1997)

    MATH  Google Scholar 

  17. Cressie, N.: Statistics for Spatial Data. Wiley, Hoboken (2015)

    MATH  Google Scholar 

  18. Curtis, F.E., Gould, N.I.M., Jiang, H., Robinson, D.P.: Adaptive augmented Lagrangian methods: algorithms and practical numerical experience. Optim. Methods Softw. 31(1), 157–186 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  19. D’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  20. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Dutta, J., Deb, K., Tulshyan, R., Arora, R.: Approximate KKT points and a proximity measure for termination. J. Glob. Optim. 56(4), 1463–1499 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  23. Fernández, D., Solodov, M.V.: Local convergence of exact and inexact augmented Lagrangian methods under the second-order sufficient optimality condition. SIAM J. Optim. 22(2), 384–407 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  24. Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, Hoboken (2003)

    Book  MATH  Google Scholar 

  25. Forero, P.A., Cano, A., Giannakis, G.B.: Distributed clustering using wireless sensor networks. IEEE J. Sel. Top. Signal Proces. 5(4), 707–724 (2011)

    Article  Google Scholar 

  26. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)

    Article  MATH  Google Scholar 

  27. Giannakis, G.B., Ling, Q., Mateos, G., Schizas, I.D., Zhu, H.: Decentralized learning for wireless communications and networking. In: Splitting Methods in Communication and Imaging. Springer, New York (2015)

  28. Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problémes de dirichlet non linéares. Revue Franqaise d’Automatique, Informatique et Recherche Opirationelle 9, 41–76 (1975)

    MATH  Google Scholar 

  29. Gu, Q., Z. Wang, Z., Liu, H.: Sparse PCA with oracle property. In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), pp. 1529–1537 (2014)

  30. Haeser, G., Melo, V.: On sequential optimality conditions for smooth constrained optimization. Preprint (2013)

  31. Hajinezhad, D., Chang, T.H., Wang, X., Shi, Q., Hong, M.: Nonnegative matrix factorization using ADMM: algorithm and convergence analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4742–4746 (2016)

  32. Hajinezhad, D., Hong, M.: Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis. In: IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE (2015)

  33. Hajinezhad, D., Hong, M., Garcia, A.: Zeroth order nonconvex multi-agent optimization over networks. arXiv preprint arXiv:1710.09997 (2017)

  34. Hajinezhad, D., Hong, M., Zhao, T., Wang, Z.: NESTT: A nonconvex primal–dual splitting method for distributed and stochastic optimization. In: Advances in Neural Information Processing Systems (NIPS), pp. 3215–3223 (2016)

  35. Hajinezhad, D., Shi, Q.: Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications. J. Glob. Optim. 70, 1–28 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  36. Hamdi, A., Mishra, S.K.: Decomposition Methods Based on Augmented Lagrangians: A Survey, pp. 175–203. Springer, New York (2011)

    MATH  Google Scholar 

  37. Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Appl. 4, 303–320 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  38. Hong, M., Hajinezhad, D., Zhao, M.M.: Prox-PDA: the proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In: Proceedings of the 34th International Conference on Machine Learning (ICML), (70), pp. 1529–1538 (2017)

  39. Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1), 165–199 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  40. Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  41. Houska, B., Frasch, J., Diehl, M.: An augmented Lagrangian based algorithm for distributed nonconvex optimization. SIAM J. Optim. 26(2), 1101–1127 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  42. Koppel, A., Sadler, B.M., Ribeiro, A.: Proximity without consensus in online multi-agent optimization. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3726–3730 (2016)

  43. Koshal, J., Nedić, A., Shanbhag, Y.V.: Multiuser optimization: distributed algorithms and error analysis. SIAM J. Optim. 21(3), 1046–1081 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  44. Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1), 511–547 (2015)

    MathSciNet  MATH  Google Scholar 

  45. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  46. Liao, W., Hong, M., Farmanbar, H., Luo, Z.: Semi-asynchronous routing for large scale hierarchical networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2894–2898 (2015)

  47. Liavas, A.P., Sidiropoulos, N.D.: Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. IEEE Trans. Signal Process. 63(20), 5450–5463 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  48. Liu, Y.F., Liu, X., Ma, S.: On the non-ergodic convergence rate of an inexact augmented Lagrangian framework for composite convex programming. arXiv preprint arXiv:1603.05738 (2016)

  49. Lobel, I., Ozdaglar, A.: Distributed subgradient methods for convex optimization over random networks. IEEE Trans. Autom. Control 56(6), 1291–1306 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  50. Lorenzo, P.D., Scutari, G.: NEXT: in-network nonconvex optimization. IEEE Trans. Signal Inf. Process Over Netw. 2(2), 120–136 (2016)

    Article  MathSciNet  Google Scholar 

  51. Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23(4), 2448–2478 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  52. Mateos, G., Bazerque, J.A., Giannakis, G.B.: Distributed sparse linear regression. IEEE Trans. Signal Process. 58(10), 5262–5276 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  53. Max L.N. Goncalves, J.G.M., Monteiro, R.D.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). Preprint arXiv:1702.01850

  54. Nedić, A., Olshevsky, A.: Distributed optimization over time-varying directed graphs. IEEE Trans. Autom. Control 60(3), 601–615 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  55. Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  56. Nedić, A., Ozdaglar, A., Parrilo, P.A.: Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  57. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, Berlin (2004)

    Book  MATH  Google Scholar 

  58. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin (1999)

    Book  MATH  Google Scholar 

  59. Powell, M.M.D.: An efficient method for nonlinear constraints in minimization problems. In: Optimization. Academic Press, pp. 283–298 (1969)

  60. Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  61. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  62. Ruszczyński, A.: Nonlinear Optimization. Princeton University, Princeton (2011)

    Book  MATH  Google Scholar 

  63. Schizas, I., Ribeiro, A., Giannakis, G.: Consensus in ad hoc WSNs with noisy links—part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process. 56(1), 350–364 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  64. Scutari, G., Facchinei, F., Song, P., Palomar, D.P., Pang, J.S.: Decomposition by partial linearization: parallel optimization of multi-agent systems. IEEE Trans. Signal Process. 63(3), 641–656 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  65. Shi, W., Ling, Q., Wu, G., Yin, W.: EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  66. Sun, Y., Scutari, G., Palomar, D.: Distributed nonconvex multiagent optimization over time-varying networks. In: 50th Asilomar Conference on Signals, Systems and Computers, pp. 788–794 (2016)

  67. Tsitsiklis, J.: Problems in decentralized decision making and computation. Ph.D. thesis, Massachusetts Institute of Technology (1984)

  68. Vu, V.Q., Cho, J., Lei, J., Rohe, K.: Fantope projection and selection: a near-optimal convex relaxation of sparse PCA. In: Advances in Neural Information Processing Systems (NIPS), pp. 2670–2678 (2013)

  69. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  70. Wright, S.J.: Implementing proximal point methods for linear programming. J. Optim. Theory Appl. 65(3), 531–554 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  71. Wen, Z., Yang, C., Liu, X., Marchesini, S.: Alternating direction methods for classical and ptychographic phase retrieval. Inverse Probl. 28(11), 1–18 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  72. Yildiz, M.E., Scaglione, A.: Coding with side information for rate-constrained consensus. IEEE Trans. Signal Process. 56(8), 3753–3764 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  73. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  74. Zhang, Y.: Convergence of a class of stationary iterative methods for saddle point problems. Preprint (2010)

  75. Zhu, H., Cano, A., Giannakis, G.: Distributed consensus-based demodulation: algorithms and error analysis. IEEE Trans. Wirel. Commun. 9(6), 2044–2054 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Quanquan Gu who provided us with the codes of [29]. The authors would also like to thanks Dr. Gesualdo Scutari for helpful discussions about the numerical results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyi Hong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was completed when Davood Hajinezhad was a Ph.D. student at Iowa State University. Mingyi Hong is supported by NSF Grants CMMI-1727757, and an AFOSR Grant 15RT0767.

Appendices

Appendix A

In this section, we justify Assumption [B4], which imposes the boundedness of the sequence of dual variables. Throughout this section we will assume that Assumptions A and [B1]–[B3] hold. First, we prove that when \(\Vert \lambda ^{t+1}||\rightarrow \infty \), we have \(\lim \inf _{r\rightarrow \infty } \frac{\beta ^{r+1}\Vert x^{r+1}-x^r\Vert }{\Vert \lambda ^{r+1}\Vert }= 0\). Using Assumption [B3] we have the following identity

$$\begin{aligned} \frac{\beta ^{r+1}\rho ^{r+1}}{2}\Vert x^{r+1}-x^r\Vert ^2 = \frac{(\beta ^{r+1})^2}{2 c_0}\Vert x^{r+1}-x^r\Vert ^2. \end{aligned}$$
(82)

Assume the contrary, that there exists \(c_1>0\) such that

$$\begin{aligned} \lim _{r\rightarrow \infty }\,\, \beta ^{r+1}\Vert x^{r+1}-x^r\Vert ^2\ge \frac{c_1}{\beta ^{r+1}}\Vert \lambda ^{r+1}\Vert ^2. \end{aligned}$$
(83)

Then from (74), it is easy to show that when r is large enough, P is decreasing.

Similarly as in Lemma 3, it is relatively easy to show that the potential function is lower and upper bounded (the proof is included in Lemma 10–11 in the online version). The lower boundedness of the potential function and the fact that it is descending, implies that (75) holds true, which further implies that \(\frac{1}{\beta ^{r+1}}\Vert \lambda ^{r+1}\Vert ^2\rightarrow 0\), according to (83). Examine the definition of the potential function in (73) and use the choice of c in (70) we conclude that \(\frac{\beta ^{r+1}\rho ^{r+1}}{2}\Vert x^{r+1}-x^r\Vert ^2\) in the potential function is bounded. Therefore, there exists \(D_1\) such that

$$\begin{aligned} \beta ^{r+1}\Vert x^{r+1}-x^r\Vert \le D_1. \end{aligned}$$
(84)

It follows that \(c_1\Vert \lambda ^{r+1}\Vert ^2\) is also upper bounded. This contradicts our assumption that \(\Vert \lambda ^{r}\Vert \rightarrow \infty \).

Next, we make use of some constraint qualification to argue the boundedness of the dual variables. The technique used in the proof is relatively standard, see recent works [21, 51]. Assume that the so-called Robinson’s condition is satisfied for problem (1) at \({\hat{x}}\) [62, Chap. 3]. This means \(\{A d_x\mid d_x\in {\mathcal {T}}_{X}(\hat{x})\}={\mathbb {R}}^M,\) where \(d_x\) is the tangent direction for convex set X, and \({\mathcal {T}}_{X}(\hat{x})\) is the tangent cone to the feasible set X at the point \(\hat{x}\). Utilizing this assumption we will prove that the dual variable is bounded.

Lemma 6

Suppose the Robinson’s condition holds true for problem (1). Then the sequence of dual variable \(\{\lambda ^r\}\) generated by (67b) is bounded.

Proof

We argue by contradiction. Suppose that the dual variable sequence is not bounded, i.e.,

$$\begin{aligned} \Vert \lambda ^r\Vert \rightarrow \infty . \end{aligned}$$
(85)

From the optimality condition of \(x^{r+1}\) we have for all \(x\in X\)

$$\begin{aligned} \langle \nabla f(x^{r})+\xi ^{r+1}+A^T \lambda ^{r+1} + \beta ^{r+1}B^TB (x^{r+1}-x^r), x-x^{r+1}) \rangle \ge 0. \end{aligned}$$

Note that \(\lim \inf _{r\rightarrow \infty } \frac{\beta ^{r+1}\Vert x^{r+1}-x^r\Vert }{\Vert \lambda ^{r+1}\Vert } =0\), so the following holds:

$$\begin{aligned} \lim \inf _{r\rightarrow \infty } \frac{\beta ^{r+1}\Vert B^T B(x^{r+1}-x^r)\Vert }{\Vert \lambda ^{r+1}\Vert } =0. \end{aligned}$$

Let us define a new bounded sequence as \(\mu ^r = \lambda ^r/\Vert \lambda ^r\Vert , r=1,2, \ldots \). Let \((x^*, \mu ^*)\) be an accumulation point of \(\{x^{r+1}, \mu ^{r+1}\}\). Assume that the Robinson’s condition holds at \(x^*\). Dividing both sides of the above inequality by \(\Vert \lambda ^{r+1}\Vert \) we obtain for all \(x\in X\)

$$\begin{aligned}&\langle \nabla f(x^{r})/\Vert \lambda ^{r+1}\Vert +\xi ^{r+1}/\Vert \lambda ^{r+1}\Vert +A^T \mu ^{r+1}\nonumber \\&\quad + \beta ^{r+1} B^T B (x^{r+1}-x^r)/\Vert \lambda ^{r+1}\Vert , x-x^{r+1} \rangle \ge 0. \end{aligned}$$

Taking the limit, passing a subsequence if necessary and utilizing the assumption that \(\Vert \lambda ^{r+1}\Vert \rightarrow \infty \), and that X is a compact set, we obtain

$$\begin{aligned} \langle A^T \mu ^*, x-x^{*} \rangle \ge 0,\; \forall ~x\in X. \end{aligned}$$

Utilizing the Robinson’s condition, we know that there exists \(x\in X\) and a scaling constant \(c>0\) that such \(cA(x-x^*) = - \mu ^*\), which combined with the above relation yields: \(-c\Vert \mu ^*\Vert ^2\le 0\). Therefore we must have \(\mu ^* = 0\). However, this contradicts to the fact that \(\Vert \mu ^*\Vert =1\). Therefore, we conclude that \(\{\lambda ^r\}\) is a bounded sequence.\(\square \)

Appendix B

We show how the sufficient conditions developed in “Appendix A” can be applied to the problems discussed in Sect. 1.2. We will focus on the partial consensus problem (10).

To proceed, we note that the Robinson’s condition reduces to the well-known Mangasarian–Fromovitz constraint qualification (MFCQ) if we set \(X={\mathbb {R}}^{N}\), and write out explicitly the inequality constraints as \(g(x)\le 0\) [62, Lemma 3.16]. To state the MFCQ, consider the following system

$$\begin{aligned} \quad&p_i(y)=0,\; i=1,\ldots , M\nonumber \\&g_j(y)\le 0,\; j=1,\ldots , P \end{aligned}$$
(86)

where \(p_i:{\mathbb {R}}^N\rightarrow {\mathbb {R}}\) and \(g_j:{\mathbb {R}}^N\rightarrow {\mathbb {R}}\) are all continuously differentiable functions. For a given feasible solution \({\hat{y}}\) let us use \({\mathcal {A}}({\hat{y}})\) to denote the indices for active inequality constraints, that is

$$\begin{aligned} {\mathcal {A}}({\hat{y}}):=\{1\le j\le P ~\mid ~ g_j({\hat{y}})=0\}. \end{aligned}$$
(87)

Let us define

$$\begin{aligned} p(y):=[p_1(y);p_2(y);\cdots ; p_M(y)],\quad g(y):=[g_1(y);g_2(y);\cdots ; g_P(y)]. \end{aligned}$$

Then the MFCQ holds for system (86) at point \({\hat{y}}\) if we have: 1) The rows of Jacobian matrix of p(y) denoted by \(\nabla p({\hat{y}})\) are linearly independent. 2) There exists a vector \(d_y\in {\mathbb {R}}^N\) such that

$$\begin{aligned} \nabla p({\hat{y}})d_y = 0, \quad \nabla g_j({\hat{y}})^Td_y<0, \; \forall ~ j\in {\mathcal {A}}({\hat{y}}). \end{aligned}$$
(88)

See [62, Lemma 3.17] for more details. In the following, we show that MFCQ holds true for problem (10) at any point (xz) that satisfies \(z\in Z\). Comparing the constraint set of this problem with system (86) we have the following specifications. The optimization variable \(y=[x;z]\), where \(x\in {\mathbb {R}}^N\) stacks all \(x_i\in {\mathbb {R}}\) from N nodes (here we assume \(x_i\in {\mathbb {R}}\) only for the ease of presentation). Also, \(z\in {\mathbb {R}}^E\) stacks all \(z_{e}\in {\mathbb {R}}\) for \(e\in {\mathcal {E}}\). The equality constraint is written as \(p(y)=[A,-I]y =0\), where \(A\in {\mathbb {R}}^{E\times N}\) and I is an \(E\times E\) identity matrix. Finally, for the inequality constraint we have \(g_{e}(y)= |z_{e}| - \xi \), and the active set is given by \({\mathcal {A}}(y):={\mathcal {A}}^+(y)\cup {\mathcal {A}}^-(y)\), where

$$\begin{aligned} {\mathcal {A}}^+(y)=\{e\in {\mathcal {E}}\mid z_{e} = \xi \}, \quad {\mathcal {A}}^-(y)=\{e\in {\mathcal {E}}\mid z_{e} = -\xi \}. \end{aligned}$$

Without loss of generality we assume \(\xi =1\). To show that MFCQ holds, consider a solution \({\hat{y}}:=({\hat{x}},{\hat{z}})\). First observe that the Jacobian of equality constraint is \(\nabla p({\hat{y}})= [A,-I]\) which has full row rank. In order to verify the second condition we need to find a vector \(d_y:=[d_x;d_z]\in {\mathbb {R}}^{N+E}\) such that

$$\begin{aligned}&Ad_x=d_z,\end{aligned}$$
(89a)
$$\begin{aligned}&[d_z]_{e}<0\quad \text {for} ~e\in {\mathcal {A}}^+({\hat{y}}) \end{aligned}$$
(89b)
$$\begin{aligned}&[d_z]_{e}>0\quad \text {for} ~e\in {\mathcal {A}}^-({\hat{y}}) \end{aligned}$$
(89c)

where \([d_z]_e\) denotes the eth component of vector \(d_z\). Let us denote an all-one vector and all-zero vector by \(\mathbf{1}\) and \(\mathbf{0}\) respectively. To proceed, let us consider two different cases:

Case 1 For the vector \({\hat{z}}\in {\mathbb {R}}^E\) we have \({\hat{z}}\ne \mathbf{1}\) and \({\hat{z}}\ne -\mathbf{1}\). Let us take

$$\begin{aligned} d_z=\frac{1}{E}({\hat{z}}^T\mathbf{1})\mathbf{1} - {\hat{z}}. \end{aligned}$$

First we can show that \(d_z\in \text {col}(A)\). Note that for our problem when the graph is connected, the only null space of A (which is the incidence of the graph) is spanned by the vector \({\mathbf {1}}\) [16]. Using this fact, we have \(\mathbf{1}^Td_z = {\hat{z}}^T\mathbf{1} - \mathbf{1}^T{\hat{z}}=0\), therefore, \(Ad_x=d_z\) holds true. Second, for \(e\in {\mathcal {A}}^+({\hat{y}})\) we have that \({\hat{z}}_e=1\). Therefore, we can check that \([d_z]_e=\left[ \frac{1}{E}({\hat{z}}^T\mathbf{1})\mathbf{1} - {\hat{z}}\right] _e<0\), because \(\frac{1}{E}({\hat{z}}^T\mathbf{1})\mathbf{1}<1\) from the fact that \({\hat{z}}\ne \mathbf{1}\). Condition (96b) is verified. Using similar argument we can verify condition (96c).

Case 2 Suppose we have \({\hat{z}}=\mathbf{1}\) (resp. \({\hat{z}}=-\mathbf{1}\)). Since \({\hat{z}}\in \text {null}(A)\) let us set \(d_x=\mathbf{0}\) and \(d_z = -{\hat{z}}\) (resp. \(d_z = {\hat{z}}\)). First we have \(Ad_x=d_z\). Second, for \(e\in {\mathcal {A}}^+({\hat{y}})\) we have that \([d_z]_e<0\). Similarly, we have \([d_z]_e>0\) for \(e\in {\mathcal {A}}^-({\hat{y}})\). All conditions (96a)–(96c) are verified. The above proof shows that MFCQ holds true for the sequence \(\{(x^r,z^r)\}\) generated by the PProx-PDA algorithm, since in the algorithm it is always guaranteed that \(z^r\in Z\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hajinezhad, D., Hong, M. Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization. Math. Program. 176, 207–245 (2019). https://doi.org/10.1007/s10107-019-01365-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01365-4

Keywords

Mathematics Subject Classification