Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Code availability

The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. Our PMM method is implemented in MATLAB and a code reproducing the examples of Sect. 5 is available at https://github.com/HAPP1114/pmm.git.

References

  • Ahn M, Pang JS, Xin J (2017) Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J Optim 27(3):1637–1665

    Article  MathSciNet  MATH  Google Scholar 

  • Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, Norwell

    MATH  Google Scholar 

  • Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5(1):232–253

    Article  MathSciNet  MATH  Google Scholar 

  • Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215

    Article  MathSciNet  MATH  Google Scholar 

  • Chartrand R (2007) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process Lett 14(10):707–710

    Article  Google Scholar 

  • Chen L, Gu Y (2014) The convergence guarantees of a non-convex approach for sparse recovery. IEEE Trans Signal Process 62(15):3754–3767

    Article  MathSciNet  MATH  Google Scholar 

  • Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159

    Article  MathSciNet  MATH  Google Scholar 

  • Cui Y, Pang JS, Sen B (2018) Composite difference-max programs for modern statistical estimation problems. SIAM J Opti 28(4):3344–3374

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho DL, Johnstone IM (1995) Adapting to Unknown Smoothness via Wavelet Shrinkage. J Am Stat Assoc 90(432):1200–1224

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–451

    Article  MathSciNet  MATH  Google Scholar 

  • Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan Q, Jiao Y, Lu X (2014) A primal dual active set algorithm with continuation for compressed sensing. IEEE Trans Signal Process 62(23):6276–6285

    Article  MathSciNet  MATH  Google Scholar 

  • Flemming J (2011) Generalized Tikhonov regularization, basic theory and comprehensive results on convergence rates. PhD thesis, Fakultat fur Mathematik Technische Universitat Chemnitz

  • Frank L, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135

    Article  MATH  Google Scholar 

  • Fu W (1998) Penalized regressions: the bridge versus the lasso. J Comput Gr Stat 7(3):397–416

    MathSciNet  Google Scholar 

  • Gong P, Zhang C, Lu Z, Huang J, Ye J (2013) A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th international conference on machine learning, PMLR 28(2):37-45

  • Hintermüller M, Ito K, Kunisch K (2002) The primal-dual active set strategy as a semismooth newton method. SIAM J Optim 13(3):865–888

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Jiao Y, Jin B, Liu J, Lu X, Yang C (2021) A unified primal dual active set algorithm for nonconvex sparse recovery. Stat Sci 36(2):215–238

    Article  MathSciNet  MATH  Google Scholar 

  • Huang L, Jia J, Yu B, Chun BG, Maniatis P, Naik M (2010) Predicting execution time of computer programs using sparse polynomial regression. Adv Neural Inf Process Syst 23:883–891

    Google Scholar 

  • Hunter D, Li R (2005) Variable selection using MM algorithms. Ann Stat 33(4):1617–1642

    Article  MathSciNet  MATH  Google Scholar 

  • Le Thi HA, Dinh TP, Le HM, Vo XT (2015) DC approximation approaches for sparse optimization. Eur J Op Res 244(1):26–46

    Article  MathSciNet  MATH  Google Scholar 

  • Lee JD, Sun Y, Saunders MA (2014) Proximal newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443

    Article  MathSciNet  MATH  Google Scholar 

  • Li G, Pong TK (2018) Calculus of the exponent of Kurdyka-Lojasiewicz inequality and its applications to linear convergence of first-order methods. Found Comput Math 18(5):1199–1232

    Article  MathSciNet  MATH  Google Scholar 

  • Li X, Yang L, Ge J, Haupt J, Zhang T, Zhao T (2017) On quadratic convergence of DC proximal Newton algorithm in nonconvex sparse learning. Adv Neural Inf Process Syst 30:2742–2752

    Google Scholar 

  • Li XD, Sun DF, Toh KC (2018) A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems. SIAM J Optim 28(1):433–458

    Article  MathSciNet  MATH  Google Scholar 

  • Luo S, Chen Z (2014) Sequential lasso cum EBIC for feature selection with ultra-high dimensional feature space. J Am Stat Asso 109(507):1229–1240

    Article  MathSciNet  MATH  Google Scholar 

  • Mazumder R, Friedman JH, Hastie T (2011) SparseNet: coordinate descent with nonconvex penalties. J Am Stat Assoc 106(495):1125–1138

    Article  MathSciNet  MATH  Google Scholar 

  • Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462

    Article  MathSciNet  MATH  Google Scholar 

  • Micchelli CA, Shen L, Xu Y (2011) Proximity algorithms for image models: denoising. Inverse Probl 27(4):045009

    Article  MathSciNet  MATH  Google Scholar 

  • Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234

    Article  MathSciNet  MATH  Google Scholar 

  • Pang JS, Razaviyayn M, Alvarado A (2017) Computing B-stationary points of nonsmooth DC programs. Math Op Res 42(1):95–118

    Article  MathSciNet  MATH  Google Scholar 

  • Rockafellar RT (2015) Convex analysis. Princeton University Press, Princeton

    Google Scholar 

  • Shi Y, Huang J, Jiao Y, Yang Q (2018) Semi-smooth Newton algorithm for non-convex penalized linear regression. arXiv preprint arXiv:1802.08895 URL https://arxiv.org/pdf/1802.08895.pdf

  • Tang P, Wang C, Sun D, Toh K-C (2020) A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. J Mach Learn Res 21(226):1–38

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Royal Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Wang L, Kim Y, Li R (2013) Calibrating non-convex penalized regression in ultra-high dimension. Ann Stat 41(5):2505–2536

    Article  MATH  Google Scholar 

  • Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang CH (2010a) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang T (2010b) Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res 11(3):1081–1107

    MathSciNet  MATH  Google Scholar 

  • Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7:2541–2563

    MathSciNet  MATH  Google Scholar 

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their useful suggestions, which have led to considerable improvements in the paper. The work of Zhou Yu is supported by the National Key R &D Program of China (Grant No. 2021YFA1000100 and 2021YFA1000101), the National Natural Science Foundation of China (Grant No. 11971170) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhou Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, P., Liu, M. & Yu, Z. A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems. Comput Stat 38, 871–898 (2023). https://doi.org/10.1007/s00180-022-01249-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01249-w

Keywords