Abstract
In regression, feature selection is an effective strategy to handle contaminated data and to deal with high dimensionality while providing better prediction. In addition to the presence of spurious variables, estimators suffer form corrupted, incorrectly measured or misrecorded observations known as outliers. The natural way to select relevant variables and to detect outliers is done by using the \(\ell _0\) norm for both aspects and recast the obtained optimization problem as a mixed integer optimization (MIO) problem. The \(\ell _0\) norm estimators perform well when the signal to noise ratio (SNR) is high. However, its performance decreases when the SNR is low due to the overfitting behavior of the \(\ell _{0}\) norm when the noise is relatively high. To fix this problem, we propose to regularize the \(\ell _{0}\) norm problem for variable selection and outlier detection by adding an \(\ell _1\) penalty term. We also propose an efficient and scalable non-convex proximal alternate algorithm producing high quality solution in a short time and used as a warm start for the MIO solver. An empirical comparison between the \(\ell _0\) norm approach and its \(\ell _1\) regularized extension is presented as well. Results provided that the MIO regularized approach and its discrete first order warm start provide high quality solutions and performs better then the \(\ell _{0}\) approach especially for low SNR values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Miller, A.: Subset Selection in Regression. CRC Press, Boca Raton (2002)
Beale, E.M.L., Kendall, M.G., Mann, D.W.: The discarding of variables in multivariate analysis. Biometrika 54(3–4), 357–366 (1967)
Hocking, R.R., Leslie, R.N.: Selection of the best subset in regression analysis. Technometrics 9(4), 531–540 (1967)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)
Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Lasso and elastic-net regularized generalized linear models. R-package version 2.0-5. 2016 (2016)
Weijie, S., Bogdan, M., Candes, E., et al.: False discoveries occur early on the lasso path. Ann. Stat. 45(5), 2133–2150 (2017)
Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20192-9
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-84858-7
Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 73(3), 273–282 (2011)
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Statist. 47(3), 2324–2354 (2015)
Hastie, T., Tibshirani, R., Tibshirani, R.J.: Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692 (2017)
Mazumder, R., Radchenko, P., Dedieu, A.: Subset selection with shrinkage: sparse linear modeling when the SNR is low. arXiv preprint arXiv:1708.03288 (2017)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection, vol. 589. Wiley, Hoboken (1987)
Chen, Y., Caramanis, C., Mannor, S.: Robust sparse regression under adversarial corruption. In: International Conference on Machine Learning, pp. 774–782 (2013)
Li, X.: Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx. 37(1), 73–99 (2013)
Laska, J.N., Davenport, M.A., Baraniuk, R.G.: Exact signal recovery from sparsely corrupted measurements through the pursuit of justice. In: 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, pp. 1556–1560. IEEE (2009)
She, Y., Owen, A.B.: Outlier detection using nonconvex penalized regression. J. American Stat. Assoc. 106(494), 626–639 (2011)
Rousseeuw, P.J., Hubert, M.: Anomaly detection by robust statistics. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov. 8(2), e1236 (2018)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Wang, H., Li, G., Jiang, G.: Robust regression shrinkage and consistent variable selection through the lad-lasso. J. Bus. Econ. Stat. 25(3), 347–355 (2007)
Li, Y., Zhu, J.: L 1-norm quantile regression. J. Comput. Graph. Stat. 17(1), 163–185 (2008)
Wang, L., Yichao, W., Li, R.: Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Am. Stat. Assoc. 107(497), 214–222 (2012)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)
Nguyen, N.H., Tran, T.D.: Robust lasso with missing and grossly corrupted observations. IEEE Trans. Inf. Theory 59(4), 2036–2058 (2013)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Parikh, N., Boyd, S.P.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jammal, M., Canu, S., Abdallah, M. (2020). \(\ell _1\) Regularized Robust and Sparse Linear Modeling Using Discrete Optimization. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12566. Springer, Cham. https://doi.org/10.1007/978-3-030-64580-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-64580-9_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64579-3
Online ISBN: 978-3-030-64580-9
eBook Packages: Computer ScienceComputer Science (R0)