research-article

Sparse optimization via vector k-norm and DC programming with an application to feature selection for support vector machines

Authors:

Manlio Gaudioso,

Giovanni Giallombardo,

Giovanna MiglionicoAuthors Info & Claims

Computational Optimization and Applications, Volume 86, Issue 2

Pages 745 - 766

https://doi.org/10.1007/s10589-023-00506-y

Published: 12 July 2023 Publication History

Abstract

Sparse optimization is about finding minimizers of functions characterized by a number of nonzero components as small as possible, such paradigm being of great practical relevance in Machine Learning, particularly in classification approaches based on support vector machines. By exploiting some properties of the k-norm of a vector, namely, of the sum of its k largest absolute-value components, we formulate a sparse optimization problem as a mixed-integer nonlinear program, whose continuous relaxation is equivalent to the unconstrained minimization of a difference-of-convex function. The approach is applied to Feature Selection in the support vector machine framework, and tested on a set of benchmark instances. Numerical comparisons against both the standard

ℓ_{1}

-based support vector machine and a simple version of the Slope method are presented, that demonstrate the effectiveness of our approach in achieving high sparsity level of the solutions without impairing test-correctness.

References

[1]

Amaldi E and Kann V On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems Theoret. Comput. Sci. 1998 209 1–2 237-260

[2]

An LTH and Tao PD The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems Ann. Oper. Res. 2005 133 23-46

[3]

An LTH, Nguyen VV, and Tao PD A DC programming approach for feature selection in support vector machines learning Adv. Data Anal. Classif. 2008 2 259-278

[4]

Beck A and Eldar YC Sparsity constrained nonlinear optimization: Optimality conditions and algorithms SIAM J. Optim. 2013 23 3 1480-1509

[5]

Bellec PC, Lecué G, and Tsybakov AB Slope meets lasso: Improved oracle bounds and optimality Ann. Stat. 2018 46 6B 3603-3642

[6]

Bertolazzi P, Felici G, Festa P, Fiscon G, and Weitschek E Integer programming models for feature selection: New extensions and a randomized solution algorithm Eur. J. Oper. Res. 2016 250 2 389-399

[7]

Bertsimas D, King A, Mazumder R, et al. Best Subset Selection via a Modern Optimization Lens Ann. Stat. 2016 44 2 813-852

[8]

Bertsimas D and King A Logistic regression: from art to science Stat. Sci. 2017 32 3 367-384

[9]

Bertsimas, D., Copenhaver, M.S., Mazumder, R.: The trimmed Lasso: sparsity and robustness. arXiv preprint (2017b) https://arxiv.org/pdf/1708.04527.pdf

[10]

Bienstock D Computational study of a family of mixed-integer quadratic programming problems Math. Programm. Ser. B Part A 1996 74 2 121-140

[11]

Bogdan M, van den Berg E, Sabatti C, Su W, and Candès EJ Slope-adaptive variable selection via convex optimization Ann. Appl. Stat. 2015 9 3 1103-1140

[12]

Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, and Herrera F A review of microarray datasets and applied feature selection methods Inf. Sci. 2014 282 111-135

[13]

Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning proceedings of the fifteenth international conference (ICML ’98). Shavlik J editor, Morgan Kaufmann, San Francisco, California, 82–90 (1998)

[14]

Bradley PS, Mangasarian OL, and Street WN Feature selection via mathematical programming INFORMS J. Comput. 1998 10 2 209-217

[15]

Burdakov OP, Kanzow C, and Schwartz A Mathematical programs with cardinality constraints: Reformulation by complementarity-type conditions and a regularization method SIAM J. Optim. 2016 26 1 397-425

[16]

Candés EJ, Romberg JK, and Tao T Stable signal recovery from incomplete and inaccurate measurements Commun. Pure Appl. Math. 2006 59 1207-1223

[17]

Candés EJ and Tao T Decoding by linear programming IEEE Trans. Inf. Theory 2005 51 4203-4215

[18]

Chang C-C and Lin C-J LIBSVM: a library for support vector machines ACM Trans. Intell. Syst. Technol. 2011 2 27 1-27

[19]

Cristianini N and Shawe-Taylor J An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods 2000 Cambridge University Press

[20]

Dedieu, A., Hazimeh, H., Mazumder, R.: Learning sparse classifiers: continuous and mixed integer optimization perspectives. (2020) arXiv preprint https://arxiv.org/pdf/2001.06471.pdf

[21]

Donoho DL Compressed sensing IEEE Trans. Inf. Theory 2006 52 1289-1306

[22]

Dy JG, Brodley CE, and Wrobel S Feature selection for unsupervised learning J. Mach. Learn. Res. 2004 5 845-889

[23]

Fan JQ and Li RZ Variable selection via nonconcave penalized likelihood and its oracle properties J. Am. Stat. Assoc. 2001 96 456 1348-1360

[24]

Feng M, Mitchell JE, Pang J-S, Shen X, and Wäcther A Complementarity formulations of

ℓ_{0}

-norm optimization problems Pac. J. Optim. 2018 14 2 273-305

[25]

Fuduli A, Gaudioso M, and Giallombardo G Minimizing nonconvex nonsmooth functions via cutting planes and proximity control SIAM J. Optim. 2004 14 3 743-756

[26]

Gaudioso, M., Giallombardo, G., Miglionico, G.: The DCA-SVM-RkSOP approach (2023) https://github.com/GGiallombardo/DCA-SVM-RkSOP

[27]

Gaudioso M, Giallombardo G, Miglionico G, and Bagirov AM Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations J. Global Optim. 2018 71 1 37-55

[28]

Gaudioso M, Gorgone E, and Hiriart-Urruty JB Feature selection in SVM via polyhedral

k

-norm Optim. Lett. 2020 14 19-36

[29]

Gaudioso M, Gorgone E, Labbé M, and Rodríguez-Chía AM Lagrangian relaxation for SVM feature selection Comput. Oper. Res. 2017 87 137-145

[30]

Gaudioso M and Hiriart-Urruty J-B Deforming

{‖ \cdot ‖}_{1}

into

{‖ \cdot ‖}_{\infty}

via polyhedral norms: a pedestrian approach SIAM Rev. 2022 64 3 713-727

[31]

Gotoh J, Takeda A, and Tono K DC formulations and algorithms for sparse optimization problems Math. Programm. Ser. B 2018 169 1 141-176

[32]

Guyon I and Elisseeff A An introduction to variable and feature selection J. Mach. Learn. Res. 2003 3 1157-1182

[33]

Hiriart-Urruty, J.-B.: Generalized differentiability/duality and optimization for problems dealing with differences of convex functions. In Convexity and duality in optimization. Lecture Notes in Economics and Mathematical Systems (1985)

[34]

Hiriart-Urruty J-B From convex optimization to nonconvex optimization: necessary and sufficient conditions for global optimality Nonsmooth Optimization and Related Topics 1989 New York/London Plenum 219-240

[35]

IBM ILOG CPLEX 12.8 User Manual (2018) IBM Corp. Accessed 13 May 2023. https://www.ibm.com/docs/SSSA5P_12.8.0/ilog.odms.studio.help/pdf/usrcplex.pdf

[36]

Joki K, Bagirov AM, Karmitsa N, and Mäkelä MM A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes J. Global Optim. 2017 68 3 501-535

[37]

Levato, T.: Algorithms for

ℓ_{0}

: norm optimization problems. Doctoral Dissertation, Dipartimento di Ingegneria dell’Informazione, Università di Firenze, Italia (2019)

[38]

Liu YL, Bi SJ, and Pan SH Equivalent Lipschitz surrogates for zero-norm and rank optimization problems J. Glob. Optim. 2018 72 679-704

[39]

Maldonado S, Pérez J, Weber R, and Labbé M Feature selection for Support Vector Machines via Mixed Integer Linear Programming Inf. Sci. 2014 279 163-175

[40]

Miao W, Pan S, and Sun D A Rank-Corrected Procedure for Matrix Completion with Fixed Basis Coefficients Math. Program. 2016 159 289-338

[41]

Overton ML and Womersley RS Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices Math. Program. 1993 62 1–3 321-357

[42]

Rinaldi F, Schoen F, and Sciandrone M Concave programming for minimizing the zero-norm over polyhedral sets Comput. Optim. Appl. 2010 46 467-486

[43]

Sato T, Takano Y, Miyashiro R, and Yoshise A Feature subset selection for logistic regression via mixed integer optimization Comput. Optim. Appl. 2016 64 3 865-880

[44]

Strekalovsky AS Global optimality conditions for nonconvex optimization J. Global Optim. 1998 12 415-434

[45]

Tibshirani R Regression shrinkage and selection via the lasso J. R. Stat. Soc. B 1996 58 1 267-288

[46]

Ustun B and Rudin C Supersparse linear integer models for optimized medical scoring systems Mach. Learn. 2016 102 349-391

[47]

Vapnik V The Nature of the Statistical Learning Theory 1995 Springer

[48]

Watson GA Linear best approximation using a class of polyhedral norms Numer. Algorithms 1992 2 321-336

[49]

Weston J, Elisseeff A, Schölkopf B, and Tipping M Use of the zero-norm with linear models and kernel methods J. Mach. Learn. Res. 2003 3 1439-1461

[50]

Wu B, Ding C, Sun D, and Toh K-C On the Moreau-Yosida regularization of the vector

k -

norm related functions SIAM J. Optim. 2014 24 2 766-794

[51]

Yin P, Lou Y, He Q, and Xin J Minimization of

ℓ_{1 - 2}

for compressed sensing SIAM J. Sci. Comput. 2015 37 2 536-563

[52]

Zhang CH Nearly unbiased variable selection under minimax concave penalty Ann. Stat. 2010 38 894-942

Recommendations

Sparse high-dimensional fractional-norm support vector machine via DC programming

This paper considers a class of feature selecting support vector machines (SVMs) based on L"q-norm regularization, where q@?(0,1). The standard SVM [Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer, NY.] minimizes the hinge loss ...
Sparse pinball twin support vector machines
Abstract
The original twin support vector machine (TWSVM) formulation works by solving two smaller quadratic programming problems (QPPs) as compared to the traditional hinge-loss SVM (C-SVM) which solves a single large QPP — this makes the ...
Highlights
- A novel twin support vector machine with sparse pinball loss (SPTWSVM) is proposed.
PAC-Bayes bounds for twin support vector machines

Twin support vector machines are regarded as a milestone in the development of support vector machines. Compared to standard support vector machines, they learn two nonparallel hyperplanes rather than one as in standard support vector machines for ...

Comments

Information & Contributors

Information

Published In

cover image Computational Optimization and Applications

Computational Optimization and Applications Volume 86, Issue 2

Nov 2023

368 pages

ISSN:0926-6003

Issue’s Table of Contents

© The Author(s) 2023.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 12 July 2023

Accepted: 24 June 2023

Received: 27 February 2023

Author Tags

Qualifiers

Research-article

Funding Sources

Università della Calabria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents