Abstract
By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems.




Similar content being viewed by others
Code availability
The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. Our PMM method is implemented in MATLAB and a code reproducing the examples of Sect. 5 is available at https://github.com/HAPP1114/pmm.git.
References
Ahn M, Pang JS, Xin J (2017) Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J Optim 27(3):1637–1665
Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, Norwell
Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5(1):232–253
Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215
Chartrand R (2007) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process Lett 14(10):707–710
Chen L, Gu Y (2014) The convergence guarantees of a non-convex approach for sparse recovery. IEEE Trans Signal Process 62(15):3754–3767
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Cui Y, Pang JS, Sen B (2018) Composite difference-max programs for modern statistical estimation problems. SIAM J Opti 28(4):3344–3374
Donoho DL, Johnstone IM (1995) Adapting to Unknown Smoothness via Wavelet Shrinkage. J Am Stat Assoc 90(432):1200–1224
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–451
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Fan Q, Jiao Y, Lu X (2014) A primal dual active set algorithm with continuation for compressed sensing. IEEE Trans Signal Process 62(23):6276–6285
Flemming J (2011) Generalized Tikhonov regularization, basic theory and comprehensive results on convergence rates. PhD thesis, Fakultat fur Mathematik Technische Universitat Chemnitz
Frank L, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Fu W (1998) Penalized regressions: the bridge versus the lasso. J Comput Gr Stat 7(3):397–416
Gong P, Zhang C, Lu Z, Huang J, Ye J (2013) A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th international conference on machine learning, PMLR 28(2):37-45
Hintermüller M, Ito K, Kunisch K (2002) The primal-dual active set strategy as a semismooth newton method. SIAM J Optim 13(3):865–888
Huang J, Jiao Y, Jin B, Liu J, Lu X, Yang C (2021) A unified primal dual active set algorithm for nonconvex sparse recovery. Stat Sci 36(2):215–238
Huang L, Jia J, Yu B, Chun BG, Maniatis P, Naik M (2010) Predicting execution time of computer programs using sparse polynomial regression. Adv Neural Inf Process Syst 23:883–891
Hunter D, Li R (2005) Variable selection using MM algorithms. Ann Stat 33(4):1617–1642
Le Thi HA, Dinh TP, Le HM, Vo XT (2015) DC approximation approaches for sparse optimization. Eur J Op Res 244(1):26–46
Lee JD, Sun Y, Saunders MA (2014) Proximal newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443
Li G, Pong TK (2018) Calculus of the exponent of Kurdyka-Lojasiewicz inequality and its applications to linear convergence of first-order methods. Found Comput Math 18(5):1199–1232
Li X, Yang L, Ge J, Haupt J, Zhang T, Zhao T (2017) On quadratic convergence of DC proximal Newton algorithm in nonconvex sparse learning. Adv Neural Inf Process Syst 30:2742–2752
Li XD, Sun DF, Toh KC (2018) A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems. SIAM J Optim 28(1):433–458
Luo S, Chen Z (2014) Sequential lasso cum EBIC for feature selection with ultra-high dimensional feature space. J Am Stat Asso 109(507):1229–1240
Mazumder R, Friedman JH, Hastie T (2011) SparseNet: coordinate descent with nonconvex penalties. J Am Stat Assoc 106(495):1125–1138
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462
Micchelli CA, Shen L, Xu Y (2011) Proximity algorithms for image models: denoising. Inverse Probl 27(4):045009
Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234
Pang JS, Razaviyayn M, Alvarado A (2017) Computing B-stationary points of nonsmooth DC programs. Math Op Res 42(1):95–118
Rockafellar RT (2015) Convex analysis. Princeton University Press, Princeton
Shi Y, Huang J, Jiao Y, Yang Q (2018) Semi-smooth Newton algorithm for non-convex penalized linear regression. arXiv preprint arXiv:1802.08895 URL https://arxiv.org/pdf/1802.08895.pdf
Tang P, Wang C, Sun D, Toh K-C (2020) A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. J Mach Learn Res 21(226):1–38
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Royal Stat Soc Ser B (Methodol) 58(1):267–288
Wang L, Kim Y, Li R (2013) Calibrating non-convex penalized regression in ultra-high dimension. Ann Stat 41(5):2505–2536
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244
Zhang CH (2010a) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Zhang T (2010b) Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res 11(3):1081–1107
Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7:2541–2563
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Acknowledgements
The authors would like to thank the anonymous referees for their useful suggestions, which have led to considerable improvements in the paper. The work of Zhou Yu is supported by the National Key R &D Program of China (Grant No. 2021YFA1000100 and 2021YFA1000101), the National Natural Science Foundation of China (Grant No. 11971170) and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, P., Liu, M. & Yu, Z. A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems. Comput Stat 38, 871–898 (2023). https://doi.org/10.1007/s00180-022-01249-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01249-w