Abstract
For the zero-norm regularized problem, we verify that the penalty problem of its equivalent MPEC reformulation is a global exact penalty, which implies a family of equivalent surrogates. For a subfamily of these surrogates, the critical point set is demonstrated to coincide with the d-directional stationary point set and when a critical point has no too small nonzero component, it is a strongly local optimal solution of the surrogate problem and the zero-norm regularized problem. We also develop a proximal majorization-minimization (MM) method for solving the DC (difference of convex functions) surrogates, and provide its global and linear convergence analysis. For the limit of the generated sequence, the statistical error bound is established under a mild condition, which implies its good quality from a statistical respective. Numerical comparisons with ADMM for solving the DC surrogate and APG for solving its partially smoothed form indicate that our proximal MM method armed with an inexact dual PPA plus the semismooth Newton method (PMMSN for short) is remarkably superior to ADMM and APG in terms of the quality of solutions and the CPU time.
Similar content being viewed by others
Data availability
The data used to form the test problems in Subsection 5.4 are freely available in https://www.csie.ntu.edu.tw.
References
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Belloni, A., Chernozhukov, V.: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 4, 791–806 (2010)
Bi, S.J., Liu, X.L., Pan, S.H.: Exact penalty decomposition method for zero-norm minimization based on MPEC formulation. SIAM J. Sci. Comput. 36, A1451–A1477 (2014)
Bian, W., Chen, X.J.: A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal. 58, 858–883 (2020)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Bot, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45, 1–31 (2018)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceeding of ICML (1998)
Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51, 34–81 (2009)
Cao, S.S., Huo, X.M., Pang, J.S.: A unifying framework of high-dimensional sparse estimation with difference-of-convex (DC) regularizations, arXiv:1812.07130 (2018)
Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14, 707–710 (2007)
Chen, X.J., Xu, F.M., Ye, Y.Y.: Lower bound theory of nonzero entries in solutions of \(\ell _2\)-\(\ell _p\) minimization. SIAM J. Sci. Comput. 32, 2832–2852 (2010)
Clarke, F.H.: Optimization and Nonsmooth Analysis. John Wiley and Sons, New York (1983)
Cui, Y., Chang, T.H., Hong, M., Pang, J.S.: A study of piecewise linear-quadratic programs. J. Optim. Theory Appl. 186, 523–553 (2020)
Cui, Y., Pang, J.S.: Modern Nonconvex Nondifferentiable Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2022)
Cui, Y., Sun, D.F., Toh, K.C.: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming. Math. Program. 178, 381–415 (2019)
Donoho, D.L., Stark, B.F.: Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49, 906–931 (1989)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings-a View from Variational Analysis. Springer Monographs in Mathematics, LLC, New York (2009)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, New York (2003)
Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Fan, J.Q., Xue, L.Z., Zou, H.: Strong oracle optimality of folded concave penalized estimation. Ann. Stat. 42, 819–849 (2014)
Feng, M.B., Mitchell, J.E., Pang, J.S., Shen, X., Wächter, A.: Complementarity formulations of \(\ell _0\)-norm optimization problems. Pac. J. Optim. 14, 273–305 (2018)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)
Gotoh, J.Y., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Program. 169, 141–176 (2018)
Gu, Y.W., Fan, J., Kong, L.C., Ma, S.Q., Zou, H.: ADMM for high-dimensional sparse penalized quantile regression. Technometrics 60, 319–331 (2018)
Hiriart-Urruty, J.B., Strodiot, J.J., Nguyen, V.H.: Generalized Hessian matrix and second-order optimality conditions for problems with \(C^{1,1}\) data. Appl. Math. Optim. 11, 43–56 (1984)
Huang, J., Hom, H., Jiao, Y., Liu, Y., Lu, X.: A constructive approach to \(L_0\) penalized regression. J. Mach. Learn. Res. 19, 1–37 (2018)
Ioffe, A.D., Outrata, J.V.: On metric and calmness qualification conditions in subdifferential calculus. Set-Valued Anal. 16, 199–227 (2008)
Le, H.Y.: Generalized subdifferentials of the rank function. Optim. Lett. 7, 731–743 (2013)
Le Thi, H.A., Le, H.M., Pham Dinh, T.: Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach. Learn. 101, 163–186 (2015)
Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Mathematical Programming B, Special Issue dedicated to: DC Programming-Theory, Algorithms and Applications 169, 5-68 (2018)
Li, G.Y., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)
Liu, Y.L., Bi, S.J., Pan, S.H.: Equivalent Lipschitz surrogates for zero-norm and rank optimization problems. J. Global Optim. 72, 679–704 (2018)
Lu, Z.: Iterative hard thresholding methods for \(\ell _0\) regularized convex cone programming. Math. Program. 147, 125–154 (2014)
Loh, P.L., Wainwright, M.J.: Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16, 559–616 (2015)
Mangasarian, O.L.: Machine learning via polyhedral concave minimization. In: Fischer, H., Riedmueller, B., Schaeffler, S. (eds.) Applied Mathematics and Parallel Computing-Festschrift for Klaus Ritter, pp. 175–188. Physica-Verlag, Heidelberg (1996)
Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control. Optim. 15, 959–972 (1977)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Soviet Math. Dokl. 27, 372–376 (1983)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
Pan, S.H., Liu, Y.L.: Subregularity of subdifferential mappings relative to the critical set and KL property of exponent 1/2, arXiv:1812.00558v3
Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42, 95–118 (2017)
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnamica 22, 289–355 (1997)
Qi, L.Q., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58, 353–367 (1993)
Qian, Y.T., Pan, S.H., Liu, Y.L.: Calmness of partial perturbation to composite rank constraint systems and its applications, arXiv:2102.10373v2, October 8 (2021)
Raskutti, G., Wainwright, M.J.: Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11, 2241–2259 (2010)
Rinaldi, F., Schoen, F., Sciandrone, M.: Concave programming for minimizing the zero-norm over polyhedral sets. Comput. Optim. Appl. 46, 467–486 (2010)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1, 97–116 (1976)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Cham (1998)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Study 14, 206–214 (1981)
Soubies, E., Blang-Fraud, L., Aubert, G.: A unified view of exact continuous penalities for \(\ell _2\)-\(\ell _0\) minimization. SIAM J. Optim. 8, 1067–1639 (2017)
Tang, P.P., Wang, C.J., Sun, D.F., Toh, K.C.: A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. J. Mach. Learn. Res. 21, 1–38 (2020)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. B 58, 267–288 (1996)
Wang, L., Wu, Y.C., Li, R.Z.: Quantile regression for analyzing heterogeneity in ultra high dimension. J. Am. Stat. Assoc. 107, 214–222 (2012)
Wang, Y., Yin, W.T., Zeng, J.S.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63 (2019)
Weston, J., Elisseef, A., Schölkopf, B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
Wright, J., Ma, Y.: Dense error correction via \(\ell _1\)-mininization. IEEE Trans. Inf. Theory 56, 3540–3560 (2010)
Wu, F., Bian, W.: Accelerated iterative hard thresholding algorithm for \(\ell _0\) regularized regression problem. J. Global Optim. 76, 819–840 (2020)
Wu, F., Bian, W., Xue, X.P.: Smoothing fast iterative hard thresholding algorithm for \(\ell _0\) regularized nonsmooth convex regression problem, arXiv:2104.13107v1
Ye, J.J., Zhu, D.L.: Optimality conditions for bilevel programming problems. Optimization 33, 9–27 (1995)
Ye, J.J., Zhu, D.L., Zhu, Q.J.: Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM J. Optim. 7, 481–507 (1997)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhang, X., Zhang, X.Q.: A new proximal iterative hard thresholding method with extrapolation for \(\ell _0\) minimization. J. Sci. Comput. 79, 809–826 (2019)
Zhao, X.Y., Sun, D.F., Toh, K.C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20, 1737–1765 (2010)
Acknowledgements
The first two authors would like to express their sincere thanks to Prof. Kim-Chuan Toh from National University of Singapore for helpful suggestions on the implementation of Algorithm A.1 when visiting SCUT, and give thanks to Prof. Liping Zhu from RenMin University of China for helpful discussion on Theorem 5.
Funding
The funding was provided by the National Natural Science Foundation of China under projects No. 11971177 and the Hong Kong Research Grant Council under grant No. 15304019
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Proposition 3
The following two technical lemmas are need for the proof of Proposition 3.
Lemma 6
Fix any \(\nu >0\) and \(\mu >0\). Let \(h_{\nu }(x){:}{=}\nu \Vert x\Vert _0\) for \(x\in {\mathbb {R}}^p\). If \(\vartheta \) is regular and strictly continuous relative to \(\textrm{dom}\vartheta \), then for any \(x\in \textrm{dom}f\) and \(\zeta \in {\mathbb {R}}^p\),
Proof
Fix any \(x\in \textrm{dom}f\) and \(\zeta \in {\mathbb {R}}^p\). Since \(\vartheta \) is strictly continuous relative to \(\textrm{dom}\vartheta \), by the expression of \(f_{\!\mu }\) in (4), the function \(f_{\!\mu }\) can be rewritten as \({\widetilde{f}}_{\!\mu }+\delta _{\textrm{dom}f}\) where \({\widetilde{f}}_{\!\mu }\) is a finite strictly continuous function on \({\mathbb {R}}^p\). Clearly, \({\widetilde{f}}_{\!\mu }\) is regular by the regularity of \(\vartheta \), and \(\delta _{\textrm{dom}f}\) is also regular by the polyhedrality of \(\textrm{dom}\vartheta \). By invoking [50, Exercise 10.10] and the first inclusion of [50, Corollary 10.9], it is not hard to obtain that
Since \(\textrm{epi}\,h_{\nu }\) is a union of finitely many polyhedral sets and \(\textrm{dom}f\) is polyhedral, from [29, Page 213] it follows that \(\partial (\delta _{\textrm{dom}f}+h_{\nu })(x)\subseteq {\mathcal {N}}_{\textrm{dom}f}(x)+\partial h_{\nu }(x)\) and \(\partial ^{\infty }(\delta _{\textrm{dom}f}+h_{\nu })(x)\subseteq {\mathcal {N}}_{\textrm{dom}f}(x)+\partial ^{\infty }h_{\nu }(x)\). The first inclusion, along with the first inclusion of [50, Corollary 10.9] and the regularity of \(\delta _{\textrm{dom}f}\) and \(h_{\nu }\), implies that
The regularity of \(h_{\nu }\) is implied by [30, Theorem 1]. The last two equations imply the first equality in (A1). By the strict continuity of \({\widetilde{f}}_{\!\mu }\) and [50, Corollary 10.9],
where the second inequality in (A3) is due to \(\partial ^{\infty }(\delta _{\textrm{dom}f}\!+\!h_{\nu })({\overline{x}})\subseteq {\mathcal {N}}_{\textrm{dom}f}({\overline{x}})+\partial ^{\infty }h_{\nu }({\overline{x}})\) and [50, Exercise 8.23], and the first equality in (A3) is due to the regularity of \({\widetilde{f}}_{\!\mu },h_{\nu }\) and \(\textrm{dom}f\). Note that \({\widehat{d}}\Theta _{\nu ,\mu }({\overline{x}})(\zeta )\ge d\Theta _{\nu ,\mu }({\overline{x}})(\zeta )\). From the last two inequality, we obtain the second equality in (A1). The proof is completed. \(\square \)
Lemma 7
Pick any \(\phi \in \!{\mathscr {L}}_{\sigma ,\gamma }\). The associated function \(g_{\rho }\) for any \(\rho >0\) is continuously differentiable on \({\mathbb {R}}^p\).
Proof
Recall that \(\psi ^*\) is a finite nondecreasing convex function on \({\mathbb {R}}\). If in addition \(\phi \) is strongly convex on [0, 1] with modulus \(\sigma \), then by [49, Theorem 26.3] and [50, Proposition 12.60], \(\psi ^*\) is smooth on \({\mathbb {R}}\) and \((\psi ^*)'\) is Lipschitz continuous with constant \(1/\sigma \). Thus, by the expression of \(g_{\rho }\), it suffices to argue that \(h(t){:}{=}\rho ^{-1}\psi ^*(\rho \vert t\vert )\) for \(t\in {\mathbb {R}}\) is continuously differentiable at \(t=0\). Indeed, by the assumption on \(\phi \), it is easy to verify that \(\psi ^*(s)=0\) for all \(s\in [0,\gamma ]\). Then, for all \(\vert t\vert \le \gamma \), \(h(t)=0\). Together with \(h(0)=0\), h is differentiable at \(t=0\) with \(h'(0)=0\). \(\square \)
Proof
(i) Since the range of \(\partial \psi ^*\) is contained in \(\textrm{dom}\partial \psi =[0,1]\), for any \(x\in {\mathbb {R}}^p\) it holds that \(\Vert x\Vert _1-g_{\rho }(x)\ge 0\). Together with the nonnegativity and coerciveness of \(f_{\!\mu }\), it follows that \(\Theta _{\!\rho ,\nu ,\mu }\) is nonnegative and coercive. Fix any \(x\in \textrm{dom}f\). From Lemma 7 and [50, Exericse 8.8], \({\widehat{\partial }}\Theta _{\!\rho ,\nu ,\mu }(x)=\partial \Theta _{\!\rho ,\nu ,\mu }(x) =\partial (f_{\!\mu }\!+\!\rho \nu \Vert \cdot \Vert _1)(x)\!-\!\rho \nu \nabla g_{\rho }(x)\). Recall that \([\textrm{Im}(A)-b]\cap \textrm{dom}\vartheta \ne \emptyset \). By the convexity of \(\vartheta \) and [49, Theorem 23.9],
The characterization on the regular and limiting subdifferentials of \(\Theta _{\!\rho ,\nu ,\mu }\) then holds.
(ii) By the definition of d-stationary point for DC program (see [42, Section 3]), a point \(x\in \textrm{dom}f\) is a d-stationary point of (14) iff \(\rho \nu \nabla \!g_{\rho }(x)\in \partial (f_{\!\mu }\!+\!\rho \nu \Vert \cdot \Vert _1)(x)\), which by part (i) is equivalent to saying that \(x\in \textrm{dom}f\) is a limiting critical point of \(\Theta _{\!\rho ,\nu ,\mu }\).
(iii) By [50, Theorem 13.24 (c)], it suffices to argue that \(d^2\Theta _{\!\rho ,\nu ,\mu }({\overline{x}}\vert 0)(\zeta )>0\) for all \(\zeta \ne 0\). Fix any \(\zeta \in {\mathbb {R}}^p\backslash \{0\}\). Let \(\varphi _{\!\rho ,\lambda }(x)\!{:}{=}\lambda [\Vert x\Vert _1\!-\!g_{\rho }(x)]\) with \(\lambda =\rho \nu \) for \(x\in {\mathbb {R}}^p\). Clearly, \(\varphi _{\!\rho ,\lambda }\) is Lipschitz continuous and regular by the smoothness of \(g_{\rho }\). Note that \(\Theta _{\!\rho ,\nu ,\mu }\!=\!f_{\!\mu }+\varphi _{\!\rho ,\lambda }\). By invoking [50, Proposition 13.19], it follows that
Recall that \(f_{\!\mu }\) is strongly convex with modulus \(\mu \). By Definition 3, we have that
Fix any \(v\in \partial \varphi _{\!\rho ,\lambda }({\overline{x}})\). Since \(\varphi _{\!\rho ,\lambda }\) is Lipschitz and directionally differentiable,
By [50, Proposition 13.5], \(d^2\varphi _{\!\rho ,\lambda }({\overline{x}}\vert v)(\zeta )=+\infty \) when \(d\varphi _{\!\rho ,\lambda }({\overline{x}})(\zeta )>\langle v,\zeta \rangle \). This, together with (A4)-(A5), implies that \(d^2\Theta _{\!\rho ,\nu ,\mu }({\overline{x}}\vert 0)(\zeta )>0\), so it suffices to consider that \(\varphi _{\!\rho ,\lambda }'({\overline{x}};\zeta )=\langle v,\zeta \rangle \). In this case, from Definition 3 it follows that
where the second equality is because \(\Vert {\overline{x}}+\tau \zeta '\Vert _1-\Vert {\overline{x}}\Vert _1-\tau (\Vert \cdot \Vert _1)'({\overline{x}},\zeta ')=0\) for any \(\tau >0\) small enough. Let \(h(t){:}{=}\rho ^{-1}\psi ^*(\rho \vert t\vert )\) for \(t\in {\mathbb {R}}\). Clearly, \(g_{\rho }(z)=\sum _{i=1}^ph(z_i)\) for \(z\in {\mathbb {R}}^p\). When \(i\notin \textrm{supp}({\overline{x}})\), from the proof of Lemma 7, for all \(\tau >0\) small enough, we have \(h({\overline{x}}_i\!+\!\tau \zeta _i')-h({\overline{x}}_i)-\tau h'({\overline{x}}_i)\zeta _i'=0\). When \(i\in \textrm{supp}({\overline{x}})\), by noting that \(\psi ^*(s)=s-\phi (1)\) for all \(s\ge \phi _{+}'(1)\) and using the assumption \(\vert {\overline{x}}\vert _\textrm{nz}\ge {\phi _{+}'(1)}/{\rho }\),
for all sufficiently \(\tau >0\). This means that, for all \(\tau >0\) small enough,
By combining this with (A6), we obtain from (A4)-(A5) that \(d^2\Theta _{\!\rho ,\nu ,\mu }({\overline{x}}\vert 0)(\zeta )>0\).
(iv) By Lemma 6, \(\widehat{\textrm{crit}}\,\Theta _{\nu ,\mu }=\textrm{crit}\,\Theta _{\nu ,\mu }\). We next argue that \({\overline{x}}\in \widehat{\textrm{crit}}\,\Theta _{\nu ,\mu }\). Since \({\overline{x}}\in \textrm{crit}\,\Theta _{\!\rho ,\nu ,\mu }\), from part (i) it follows that
where \([w_{\rho }({\overline{x}})]_i=(\psi ^*)'(\rho \vert {\overline{x}}_i\vert )\) for \(i=1,2,\ldots ,p\). By the definition of \(\psi ^*\), it is easy to deduce that \(\psi ^*(s)=s-\phi (1)\) for all \(s\ge \phi _{+}'(1)\). Together with \(\vert {\overline{x}}\vert _{\textrm{nz}}\ge \phi _{+}'(1)/\rho \), it holds that \([w_{\rho }({\overline{x}})]_i=(\psi ^*)'(\rho \vert {\overline{x}}_i\vert )=1\) for all \(i\in \textrm{supp}({\overline{x}})\). From [30, Theorem 1], we know that \({\widehat{\partial }}\Vert {\overline{x}}\Vert _0=\{v\in {\mathbb {R}}^p\,\vert \,v_i=0\ \textrm{for}\ i\in \textrm{supp}({\overline{x}})\}\). This means that
From the last two equations, \(0\in A^{{\mathbb {T}}}\partial \vartheta (A{\overline{x}}\!-b)+\mu {\overline{x}} +{\widehat{\partial }}\Vert {\overline{x}}\Vert _0={\widehat{\partial }}\Theta _{\nu ,\mu }({\overline{x}})\), where the equality is by Lemma 6. This means that \({\overline{x}}\in \widehat{\textrm{crit}}\,\Theta _{\nu ,\mu }\). For the rest, it suffices to argue that every point in \(\textrm{crit}\,\Theta _{\nu ,\mu }\) is a strongly local optimal solution of (1). Pick any \({\overline{x}}\in \textrm{crit}\,\Theta _{\nu ,\mu }\). We only need to argue that \(d^2\Theta _{\nu ,\mu }({\overline{x}}\vert 0)(\zeta )>0\) for all \(\zeta \ne 0\). Fix any \(0\ne \zeta \in {\mathbb {R}}^p\). By combining Lemma 6 with [50, Proposition 13.19], it holds that
where \(h_{\nu }\) is same as in Lemma 6. Fix any \(v\in \partial h_{\nu }({\overline{x}})\). Let \({\overline{J}}\!{:}{=}\{1,\ldots ,p\}\backslash \textrm{supp}({\overline{x}})\). Then, \(\langle v,\zeta \rangle =\langle v_{{\overline{J}}},\zeta _{{\overline{J}}}\rangle \). A simple calculation yields \( dh_{\nu }({\overline{x}})(\zeta )={\textstyle \sum _{i\in {\overline{J}}}}\,\delta _{\{0\}}(\zeta _i). \) This means that \(dh_{\nu }({\overline{x}})(\zeta )\ge \langle v,\zeta \rangle \). When \(dh_{\nu }({\overline{x}})(\zeta )>\langle v,\zeta \rangle \), by [50, Proposition 13.5] we have \(d^2h_{\nu }({\overline{x}}\vert \xi )(\zeta )=+\infty \). This along with (A5) and (A7) means that \(d^2\Theta _{\nu ,\mu }({\overline{x}}\vert 0)(\zeta )>0\), so it suffices to consider the case \(dh_{\nu }({\overline{x}})(\zeta )=\langle v,\zeta \rangle \). For this case, from \(dh_{\nu }({\overline{x}})(\zeta )={\textstyle \sum _{i\in {\overline{J}}}}\,\delta _{\{0\}}(\zeta _i)\), we have \(\zeta _{{\overline{J}}}=0\). Consequently,
This along with (A5) and (A7) implies that \(d^2\Theta _{\nu ,\mu }({\overline{x}}\vert 0)(\zeta )>0\). \(\square \)
Appendix B: Proof of results in Sect. 4.2
In this section, let \(x^*\) be the true vector in model (21), and for each \(k\in {\mathbb {N}}\) write
By Assumption 2 and [50, Theorem 10.49], for any \({\overline{t}}\in {\mathbb {R}}\) we have \(\partial (\theta ^2)({\overline{t}})=2D^*\theta ({\overline{t}})(\theta ({\overline{t}}))\) where \(D^*\theta ({\overline{t}})\!:{\mathbb {R}}\rightrightarrows {\mathbb {R}}\) is the coderivative of \(\theta \) at \({\overline{t}}\). Together with [50, Proposition 9.24(b)], \(D^*\theta ({\overline{t}})(\theta ({\overline{t}}))=\partial (\theta ({\overline{t}})\theta )({\overline{t}})\). Thus,
By using (B9) and the above notation, we can establish the following lemma.
Lemma 8
Suppose that for a certain \(k\ge 1\) there exists an index set \(S^{k-1}\supseteq S^*\) satisfying \(\min _{i\in (S^{k-1})^c}v_i^{k-1}\ge {1}/{2}\). Let \({\mathcal {I}}{:}{=}\{i\in \{1,\ldots ,n\}\ \vert \ \varpi _i\ne 0\}\). Then, when \(\lambda \ge 16{\widetilde{\tau }}n^{-1}\!|\!\Vert \!A_{{\mathcal {I}}\cdot }\!|\!\Vert _1+8\Vert \xi ^k\Vert _\infty \), it holds that \(\big \Vert \Delta x^{k}_{\!(S^{k-1})^c}\Vert _1\le 3\Vert \Delta x^{k}_{\!S^{k-1}}\Vert _1\).
Proof
From \(x^*\in \textrm{dom}f\) and the definition of \(x^k\) in Step 2, it is not difficult to obtain
where the strong convexity of the objective function of (19) is used. After a suitable rearrangement for the last inequality, we obtain
For each \(k\in {\mathbb {N}}\), let \({\mathcal {J}}_k\!{:}{=}\big \{i\notin {\mathcal {I}}\,\vert \, y_i^{k}\ne 0\big \}\). By the expression of \(\vartheta \) and \(\varpi =b-Ax^*\),
where the inequality is since \(\theta (y_i)\le {\widetilde{\tau }}\Vert y\Vert _\infty \) for \(i=1,\ldots ,n\), implied by \(\theta (0)=0\) and (22). Fix any \(\eta _i\in \partial (\theta ^2)(\varpi _i)\). Since \(\theta ^2\) is strongly convex with modulus \(\tau \), we have
Along with (B9), for each \(i\in {\mathcal {J}}_k\), \(\eta _i=0\) and \(\theta ^2(y^{k}_i)-\theta ^2(\varpi _i)\ge \frac{\tau }{2}(y^{k}_i-\varpi _i)^2\), and consequently,
For each \(i\in {\mathcal {I}}\), write \({\widetilde{y}}_i^{k}\!{:}{=}\frac{\eta _i}{\theta (y_i^{k})+\theta (\varpi _i)}\). From (B9) and (22), it is not hard to obtain \(\vert {\widetilde{y}}_i^{k}\vert \le 2{\widetilde{\tau }}\) for all \(i\in {\mathcal {I}}\). Together with (B12), \(\varpi =b-Ax^*\) and \(\theta (y^{k}_i)\le {\widetilde{\tau }}\Vert y^k\Vert _\infty \),
Substituting the last two inequalities into (B11) and using the definition of f yields
Write \(\Upsilon ^k{:}{=}\frac{\tau \Vert A(x^k\!- x^*)\Vert ^2}{2n{\widetilde{\tau }}(\Vert y^{k}\Vert _{\infty }+\Vert \varpi \Vert _\infty )}\). By combining this inequality and (B10), we get
Since \(S^{k-1}\supseteq S^*\) and \(v_i^{k-1}\in [0.5,1]\) for \(i\in (S^{k-1})^{c}\), from the last inequality we have
From the nonnegativity of the left hand side and the given assumption on \(\lambda \), we have
This implies that the desired result holds. The proof is completed. \(\square \)
By invoking (B13) and Lemma 8, we can obtain the following conclusion.
Lemma 9
Suppose that \(A^{{\mathbb {T}}}A/n\) satisfies the RE condition of parameter \(\kappa >0\) on \({\mathcal {C}}(S^*)\), and that for some \(k\ge 1\) there is an index set \(S^{k-1}\supseteq S^*\) and \(\vert S^{k-1}\vert \le 1.5s^*\) such that \(\min _{i\in (S^{k-1})^c}v_i^{k-1}\ge \frac{1}{2}\). Let \({\mathcal {I}}{:}{=}\{i\ \vert \ \varpi _i\ne 0\}\). If \(\lambda \) is chosen such that \(16{\widetilde{\tau }}n^{-1}\!|\!\Vert \!A_{{\mathcal {I}}\cdot }\!|\!\Vert _1+8\Vert \xi ^k\Vert _\infty \le \lambda <\frac{2\mu {\widetilde{\tau }}\Vert \varpi \Vert _\infty +\tau \kappa -4{\widetilde{\tau }}\Vert A\Vert _{\infty } (2{\widetilde{\tau }}n^{-1}|\!\Vert A_{{\mathcal {I}}\cdot }|\!\Vert _1+\Vert \xi ^k\Vert _\infty )\vert S^{k-1}\vert }{4{\widetilde{\tau }}\Vert A\Vert _{\infty }\Vert v_{\!S^*}^{k-1}\Vert _{\infty }\vert S^{k-1}\vert }\),
Proof
Note that \(\Vert y^k\Vert _\infty +\Vert \varpi \Vert _\infty =\Vert \varpi -\!A\Delta x^k\Vert _{\infty }+\Vert \varpi \Vert _\infty \le \Vert A\Delta x^k\Vert _{\infty }+2\Vert \varpi \Vert _\infty \). Then
Together with inequality (B13) and \(v_i^{k-1}\in [0.5,1]\) for \(i\in (S^{k-1})^{c}\), it follows that
where the second inequality is due to \(\lambda \ge 16{\widetilde{\tau }}n^{-1}\!|\!\Vert \!A_{{\mathcal {I}}\cdot }\!|\!\Vert _1+8\Vert \xi ^k\Vert _\infty \). By Lemma 8, \(\Vert \Delta x^{k}_{(S^{k-1})^c}\Vert _1\le 3\Vert \Delta x^{k}_{\!S^{k-1}}\Vert _1\), which means that \(\Delta x^{k}\in {\mathcal {C}}(S^*)\). From the assumption on \(\frac{1}{n}A^{{\mathbb {T}}}A\), we have \(\Vert A\Delta x^k\Vert ^2\ge 2n\kappa \Vert \Delta x^k\Vert ^2\). Then, it holds that
Multiplying the both sides of this inequality with \({\widetilde{\tau }}(\Vert A\Delta x^k\Vert _{\infty }+2\Vert \varpi \Vert _\infty )\) yields that
Note that \(\Vert A\Delta x^k\Vert _{\infty }\le \Vert A\Vert _{\infty }\Vert \Delta x^k\Vert _1\). Together with \(\Vert \Delta x^{k}_{(S^{k-1})^c}\Vert _1\le 3\Vert \Delta x^{k}_{S^{k-1}}\Vert _1\), \(\Vert A\Delta x^k\Vert _{\infty }\le 4\Vert A\Vert _{\infty }\Vert \Delta x_{S^{k-1}}^k\Vert _1\), so the right hand side of the last inequality satisfies
From the last two equations, a suitable rearrangement yields that
which along with \(\lambda <\frac{2\mu {\widetilde{\tau }}\Vert \varpi \Vert _\infty +\tau \kappa -4{\widetilde{\tau }}\Vert A\Vert _{\infty } (2{\widetilde{\tau }}n^{-1}|\!\Vert A_{{\mathcal {I}}\cdot }|\!\Vert _1+\Vert \xi ^k\Vert _\infty )\vert S^{k-1}\vert }{4{\widetilde{\tau }}\Vert A\Vert _{\infty }\Vert v_{S^*}^{k-1}\Vert _{\infty }\vert S^{k-1}\vert }\) implies the desired result. The proof is then completed. \(\square \)
1.1 B.1 Proof of Proposition 6:
Let \(\Delta x^{0}{:}{=}x^0-x^*\). From \(x^*\in \textrm{dom}f\) and the strong convexity of (20),
From \(\vartheta (z)=\frac{1}{n}\sum _{i=1}^n\theta (z_i)\) and Assumption 2, \(f(x^*)-f(x^0)\le \frac{{\widetilde{\tau }}}{n}\Vert A(x^*\!-\!x^0)\Vert _1\). Notice that \( \Vert x^0\Vert ^2-\Vert x^*\Vert ^2 =\Vert x^0\!-\!x^*\Vert ^2+2\langle x^0-x^*,x^*\rangle \). Together with the last inequality and \(\Vert {\widetilde{\delta }}^0\Vert _\infty \le {\widetilde{\epsilon }}_0\), it follows that
By the assumption on \({\widetilde{\lambda }}\) and the nonnegativity of \(\Vert \Delta x^0\Vert _{{\widetilde{\gamma }}_{1,0}I+{\widetilde{\gamma }}_{2,0}A^{{\mathbb {T}}}A}^2\), we get \(\Vert \Delta x_{(S^{*})^{c}}^0\Vert _1\le 3\Vert \Delta x_{S^{*}}^0\Vert _1\). Substituting this into the last inequality yields
which implies that the desired conclusion holds. The proof is completed. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, D., Pan, S., Bi, S. et al. Zero-norm regularized problems: equivalent surrogates, proximal MM method and statistical error bound. Comput Optim Appl 86, 627–667 (2023). https://doi.org/10.1007/s10589-023-00496-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-023-00496-x