A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems

Li, Peili; Liu, Min; Yu, Zhou

doi:10.1007/s00180-022-01249-w

A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems

Original paper
Published: 22 July 2022

Volume 38, pages 871–898, (2023)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Peili Li¹,
Min Liu² &
Zhou Yu¹

298 Accesses
1 Citation
Explore all metrics

Abstract

By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A class of modified accelerated proximal gradient methods for nonsmooth and nonconvex minimization problems

Article 30 June 2023

An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization

Article Open access 20 February 2024

Accelerated Smoothing Hard Thresholding Algorithms for $\ell _0$ Regularized Nonsmooth Convex Regression Problem

Article 15 June 2023

Code availability

The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. Our PMM method is implemented in MATLAB and a code reproducing the examples of Sect. 5 is available at https://github.com/HAPP1114/pmm.git.

References

Ahn M, Pang JS, Xin J (2017) Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J Optim 27(3):1637–1665
Article MathSciNet MATH Google Scholar
Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, Norwell
MATH Google Scholar
Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5(1):232–253
Article MathSciNet MATH Google Scholar
Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215
Article MathSciNet MATH Google Scholar
Chartrand R (2007) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process Lett 14(10):707–710
Article Google Scholar
Chen L, Gu Y (2014) The convergence guarantees of a non-convex approach for sparse recovery. IEEE Trans Signal Process 62(15):3754–3767
Article MathSciNet MATH Google Scholar
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Article MathSciNet MATH Google Scholar
Cui Y, Pang JS, Sen B (2018) Composite difference-max programs for modern statistical estimation problems. SIAM J Opti 28(4):3344–3374
Article MathSciNet MATH Google Scholar
Donoho DL, Johnstone IM (1995) Adapting to Unknown Smoothness via Wavelet Shrinkage. J Am Stat Assoc 90(432):1200–1224
Article MathSciNet MATH Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–451
Article MathSciNet MATH Google Scholar
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet MATH Google Scholar
Fan Q, Jiao Y, Lu X (2014) A primal dual active set algorithm with continuation for compressed sensing. IEEE Trans Signal Process 62(23):6276–6285
Article MathSciNet MATH Google Scholar
Flemming J (2011) Generalized Tikhonov regularization, basic theory and comprehensive results on convergence rates. PhD thesis, Fakultat fur Mathematik Technische Universitat Chemnitz
Frank L, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Article MATH Google Scholar
Fu W (1998) Penalized regressions: the bridge versus the lasso. J Comput Gr Stat 7(3):397–416
MathSciNet Google Scholar
Gong P, Zhang C, Lu Z, Huang J, Ye J (2013) A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th international conference on machine learning, PMLR 28(2):37-45
Hintermüller M, Ito K, Kunisch K (2002) The primal-dual active set strategy as a semismooth newton method. SIAM J Optim 13(3):865–888
Article MathSciNet MATH Google Scholar
Huang J, Jiao Y, Jin B, Liu J, Lu X, Yang C (2021) A unified primal dual active set algorithm for nonconvex sparse recovery. Stat Sci 36(2):215–238
Article MathSciNet MATH Google Scholar
Huang L, Jia J, Yu B, Chun BG, Maniatis P, Naik M (2010) Predicting execution time of computer programs using sparse polynomial regression. Adv Neural Inf Process Syst 23:883–891
Google Scholar
Hunter D, Li R (2005) Variable selection using MM algorithms. Ann Stat 33(4):1617–1642
Article MathSciNet MATH Google Scholar
Le Thi HA, Dinh TP, Le HM, Vo XT (2015) DC approximation approaches for sparse optimization. Eur J Op Res 244(1):26–46
Article MathSciNet MATH Google Scholar
Lee JD, Sun Y, Saunders MA (2014) Proximal newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443
Article MathSciNet MATH Google Scholar
Li G, Pong TK (2018) Calculus of the exponent of Kurdyka-Lojasiewicz inequality and its applications to linear convergence of first-order methods. Found Comput Math 18(5):1199–1232
Article MathSciNet MATH Google Scholar
Li X, Yang L, Ge J, Haupt J, Zhang T, Zhao T (2017) On quadratic convergence of DC proximal Newton algorithm in nonconvex sparse learning. Adv Neural Inf Process Syst 30:2742–2752
Google Scholar
Li XD, Sun DF, Toh KC (2018) A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems. SIAM J Optim 28(1):433–458
Article MathSciNet MATH Google Scholar
Luo S, Chen Z (2014) Sequential lasso cum EBIC for feature selection with ultra-high dimensional feature space. J Am Stat Asso 109(507):1229–1240
Article MathSciNet MATH Google Scholar
Mazumder R, Friedman JH, Hastie T (2011) SparseNet: coordinate descent with nonconvex penalties. J Am Stat Assoc 106(495):1125–1138
Article MathSciNet MATH Google Scholar
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462
Article MathSciNet MATH Google Scholar
Micchelli CA, Shen L, Xu Y (2011) Proximity algorithms for image models: denoising. Inverse Probl 27(4):045009
Article MathSciNet MATH Google Scholar
Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234
Article MathSciNet MATH Google Scholar
Pang JS, Razaviyayn M, Alvarado A (2017) Computing B-stationary points of nonsmooth DC programs. Math Op Res 42(1):95–118
Article MathSciNet MATH Google Scholar
Rockafellar RT (2015) Convex analysis. Princeton University Press, Princeton
Google Scholar
Shi Y, Huang J, Jiao Y, Yang Q (2018) Semi-smooth Newton algorithm for non-convex penalized linear regression. arXiv preprint arXiv:1802.08895 URL https://arxiv.org/pdf/1802.08895.pdf
Tang P, Wang C, Sun D, Toh K-C (2020) A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. J Mach Learn Res 21(226):1–38
MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Royal Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Wang L, Kim Y, Li R (2013) Calibrating non-convex penalized regression in ultra-high dimension. Ann Stat 41(5):2505–2536
Article MATH Google Scholar
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244
Article MathSciNet MATH Google Scholar
Zhang CH (2010a) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet MATH Google Scholar
Zhang T (2010b) Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res 11(3):1081–1107
MathSciNet MATH Google Scholar
Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7:2541–2563
MathSciNet MATH Google Scholar
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their useful suggestions, which have led to considerable improvements in the paper. The work of Zhou Yu is supported by the National Key R &D Program of China (Grant No. 2021YFA1000100 and 2021YFA1000101), the National Natural Science Foundation of China (Grant No. 11971170) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, 200062, People’s Republic of China
Peili Li & Zhou Yu
School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, People’s Republic of China
Min Liu

Authors

Peili Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhou Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, P., Liu, M. & Yu, Z. A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems. Comput Stat 38, 871–898 (2023). https://doi.org/10.1007/s00180-022-01249-w

Download citation

Received: 20 September 2021
Accepted: 09 June 2022
Published: 22 July 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00180-022-01249-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A class of modified accelerated proximal gradient methods for nonsmooth and nonconvex minimization problems

An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization

Accelerated Smoothing Hard Thresholding Algorithms for \(\ell _0\) Regularized Nonsmooth Convex Regression Problem

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A class of modified accelerated proximal gradient methods for nonsmooth and nonconvex minimization problems

An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization

Accelerated Smoothing Hard Thresholding Algorithms for \(\ell _0\) Regularized Nonsmooth Convex Regression Problem

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation