Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

\(\ell _1\) Regularized Robust and Sparse Linear Modeling Using Discrete Optimization

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12566))

  • 1484 Accesses

Abstract

In regression, feature selection is an effective strategy to handle contaminated data and to deal with high dimensionality while providing better prediction. In addition to the presence of spurious variables, estimators suffer form corrupted, incorrectly measured or misrecorded observations known as outliers. The natural way to select relevant variables and to detect outliers is done by using the \(\ell _0\) norm for both aspects and recast the obtained optimization problem as a mixed integer optimization (MIO) problem. The \(\ell _0\) norm estimators perform well when the signal to noise ratio (SNR) is high. However, its performance decreases when the SNR is low due to the overfitting behavior of the \(\ell _{0}\) norm when the noise is relatively high. To fix this problem, we propose to regularize the \(\ell _{0}\) norm problem for variable selection and outlier detection by adding an \(\ell _1\) penalty term. We also propose an efficient and scalable non-convex proximal alternate algorithm producing high quality solution in a short time and used as a warm start for the MIO solver. An empirical comparison between the \(\ell _0\) norm approach and its \(\ell _1\) regularized extension is presented as well. Results provided that the MIO regularized approach and its discrete first order warm start provide high quality solutions and performs better then the \(\ell _{0}\) approach especially for low SNR values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Miller, A.: Subset Selection in Regression. CRC Press, Boca Raton (2002)

    Book  MATH  Google Scholar 

  2. Beale, E.M.L., Kendall, M.G., Mann, D.W.: The discarding of variables in multivariate analysis. Biometrika 54(3–4), 357–366 (1967)

    Article  MathSciNet  Google Scholar 

  3. Hocking, R.R., Leslie, R.N.: Selection of the best subset in regression analysis. Technometrics 9(4), 531–540 (1967)

    Article  MathSciNet  Google Scholar 

  4. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  5. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  6. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)

    Article  Google Scholar 

  8. Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Lasso and elastic-net regularized generalized linear models. R-package version 2.0-5. 2016 (2016)

    Google Scholar 

  9. Weijie, S., Bogdan, M., Candes, E., et al.: False discoveries occur early on the lasso path. Ann. Stat. 45(5), 2133–2150 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20192-9

    Book  MATH  Google Scholar 

  11. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  12. Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 73(3), 273–282 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Statist. 47(3), 2324–2354 (2015)

    MATH  Google Scholar 

  14. Hastie, T., Tibshirani, R., Tibshirani, R.J.: Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692 (2017)

  15. Mazumder, R., Radchenko, P., Dedieu, A.: Subset selection with shrinkage: sparse linear modeling when the SNR is low. arXiv preprint arXiv:1708.03288 (2017)

  16. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection, vol. 589. Wiley, Hoboken (1987)

    Book  MATH  Google Scholar 

  17. Chen, Y., Caramanis, C., Mannor, S.: Robust sparse regression under adversarial corruption. In: International Conference on Machine Learning, pp. 774–782 (2013)

    Google Scholar 

  18. Li, X.: Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx. 37(1), 73–99 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Laska, J.N., Davenport, M.A., Baraniuk, R.G.: Exact signal recovery from sparsely corrupted measurements through the pursuit of justice. In: 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, pp. 1556–1560. IEEE (2009)

    Google Scholar 

  20. She, Y., Owen, A.B.: Outlier detection using nonconvex penalized regression. J. American Stat. Assoc. 106(494), 626–639 (2011)

    Google Scholar 

  21. Rousseeuw, P.J., Hubert, M.: Anomaly detection by robust statistics. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov. 8(2), e1236 (2018)

    Google Scholar 

  22. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  23. Wang, H., Li, G., Jiang, G.: Robust regression shrinkage and consistent variable selection through the lad-lasso. J. Bus. Econ. Stat. 25(3), 347–355 (2007)

    Article  MathSciNet  Google Scholar 

  24. Li, Y., Zhu, J.: L 1-norm quantile regression. J. Comput. Graph. Stat. 17(1), 163–185 (2008)

    Article  MathSciNet  Google Scholar 

  25. Wang, L., Yichao, W., Li, R.: Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Am. Stat. Assoc. 107(497), 214–222 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  26. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

    Article  Google Scholar 

  27. Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)

    Article  Google Scholar 

  28. Nguyen, N.H., Tran, T.D.: Robust lasso with missing and grossly corrupted observations. IEEE Trans. Inf. Theory 59(4), 2036–2058 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  29. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  30. Parikh, N., Boyd, S.P.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahdi Jammal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jammal, M., Canu, S., Abdallah, M. (2020). \(\ell _1\) Regularized Robust and Sparse Linear Modeling Using Discrete Optimization. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12566. Springer, Cham. https://doi.org/10.1007/978-3-030-64580-9_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64580-9_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64579-3

  • Online ISBN: 978-3-030-64580-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics