Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Sparse optimization via vector k-norm and DC programming with an application to feature selection for support vector machines

Published: 12 July 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Sparse optimization is about finding minimizers of functions characterized by a number of nonzero components as small as possible, such paradigm being of great practical relevance in Machine Learning, particularly in classification approaches based on support vector machines. By exploiting some properties of the k-norm of a vector, namely, of the sum of its k largest absolute-value components, we formulate a sparse optimization problem as a mixed-integer nonlinear program, whose continuous relaxation is equivalent to the unconstrained minimization of a difference-of-convex function. The approach is applied to Feature Selection in the support vector machine framework, and tested on a set of benchmark instances. Numerical comparisons against both the standard 1-based support vector machine and a simple version of the Slope method are presented, that demonstrate the effectiveness of our approach in achieving high sparsity level of the solutions without impairing test-correctness.

    References

    [1]
    Amaldi E and Kann V On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems Theoret. Comput. Sci. 1998 209 1–2 237-260
    [2]
    An LTH and Tao PD The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems Ann. Oper. Res. 2005 133 23-46
    [3]
    An LTH, Nguyen VV, and Tao PD A DC programming approach for feature selection in support vector machines learning Adv. Data Anal. Classif. 2008 2 259-278
    [4]
    Beck A and Eldar YC Sparsity constrained nonlinear optimization: Optimality conditions and algorithms SIAM J. Optim. 2013 23 3 1480-1509
    [5]
    Bellec PC, Lecué G, and Tsybakov AB Slope meets lasso: Improved oracle bounds and optimality Ann. Stat. 2018 46 6B 3603-3642
    [6]
    Bertolazzi P, Felici G, Festa P, Fiscon G, and Weitschek E Integer programming models for feature selection: New extensions and a randomized solution algorithm Eur. J. Oper. Res. 2016 250 2 389-399
    [7]
    Bertsimas D, King A, Mazumder R, et al. Best Subset Selection via a Modern Optimization Lens Ann. Stat. 2016 44 2 813-852
    [8]
    Bertsimas D and King A Logistic regression: from art to science Stat. Sci. 2017 32 3 367-384
    [9]
    Bertsimas, D., Copenhaver, M.S., Mazumder, R.: The trimmed Lasso: sparsity and robustness. arXiv preprint (2017b) https://arxiv.org/pdf/1708.04527.pdf
    [10]
    Bienstock D Computational study of a family of mixed-integer quadratic programming problems Math. Programm. Ser. B Part A 1996 74 2 121-140
    [11]
    Bogdan M, van den Berg E, Sabatti C, Su W, and Candès EJ Slope-adaptive variable selection via convex optimization Ann. Appl. Stat. 2015 9 3 1103-1140
    [12]
    Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, and Herrera F A review of microarray datasets and applied feature selection methods Inf. Sci. 2014 282 111-135
    [13]
    Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning proceedings of the fifteenth international conference (ICML ’98). Shavlik J editor, Morgan Kaufmann, San Francisco, California, 82–90 (1998)
    [14]
    Bradley PS, Mangasarian OL, and Street WN Feature selection via mathematical programming INFORMS J. Comput. 1998 10 2 209-217
    [15]
    Burdakov OP, Kanzow C, and Schwartz A Mathematical programs with cardinality constraints: Reformulation by complementarity-type conditions and a regularization method SIAM J. Optim. 2016 26 1 397-425
    [16]
    Candés EJ, Romberg JK, and Tao T Stable signal recovery from incomplete and inaccurate measurements Commun. Pure Appl. Math. 2006 59 1207-1223
    [17]
    Candés EJ and Tao T Decoding by linear programming IEEE Trans. Inf. Theory 2005 51 4203-4215
    [18]
    Chang C-C and Lin C-J LIBSVM: a library for support vector machines ACM Trans. Intell. Syst. Technol. 2011 2 27 1-27
    [19]
    Cristianini N and Shawe-Taylor J An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods 2000 Cambridge University Press
    [20]
    Dedieu, A., Hazimeh, H., Mazumder, R.: Learning sparse classifiers: continuous and mixed integer optimization perspectives. (2020) arXiv preprint https://arxiv.org/pdf/2001.06471.pdf
    [21]
    Donoho DL Compressed sensing IEEE Trans. Inf. Theory 2006 52 1289-1306
    [22]
    Dy JG, Brodley CE, and Wrobel S Feature selection for unsupervised learning J. Mach. Learn. Res. 2004 5 845-889
    [23]
    Fan JQ and Li RZ Variable selection via nonconcave penalized likelihood and its oracle properties J. Am. Stat. Assoc. 2001 96 456 1348-1360
    [24]
    Feng M, Mitchell JE, Pang J-S, Shen X, and Wäcther A Complementarity formulations of 0-norm optimization problems Pac. J. Optim. 2018 14 2 273-305
    [25]
    Fuduli A, Gaudioso M, and Giallombardo G Minimizing nonconvex nonsmooth functions via cutting planes and proximity control SIAM J. Optim. 2004 14 3 743-756
    [26]
    Gaudioso, M., Giallombardo, G., Miglionico, G.: The DCA-SVM-RkSOP approach (2023) https://github.com/GGiallombardo/DCA-SVM-RkSOP
    [27]
    Gaudioso M, Giallombardo G, Miglionico G, and Bagirov AM Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations J. Global Optim. 2018 71 1 37-55
    [28]
    Gaudioso M, Gorgone E, and Hiriart-Urruty JB Feature selection in SVM via polyhedral k-norm Optim. Lett. 2020 14 19-36
    [29]
    Gaudioso M, Gorgone E, Labbé M, and Rodríguez-Chía AM Lagrangian relaxation for SVM feature selection Comput. Oper. Res. 2017 87 137-145
    [30]
    Gaudioso M and Hiriart-Urruty J-B Deforming ·1 into · via polyhedral norms: a pedestrian approach SIAM Rev. 2022 64 3 713-727
    [31]
    Gotoh J, Takeda A, and Tono K DC formulations and algorithms for sparse optimization problems Math. Programm. Ser. B 2018 169 1 141-176
    [32]
    Guyon I and Elisseeff A An introduction to variable and feature selection J. Mach. Learn. Res. 2003 3 1157-1182
    [33]
    Hiriart-Urruty, J.-B.: Generalized differentiability/duality and optimization for problems dealing with differences of convex functions. In Convexity and duality in optimization. Lecture Notes in Economics and Mathematical Systems (1985)
    [34]
    Hiriart-Urruty J-B From convex optimization to nonconvex optimization: necessary and sufficient conditions for global optimality Nonsmooth Optimization and Related Topics 1989 New York/London Plenum 219-240
    [35]
    IBM ILOG CPLEX 12.8 User Manual (2018) IBM Corp. Accessed 13 May 2023. https://www.ibm.com/docs/SSSA5P_12.8.0/ilog.odms.studio.help/pdf/usrcplex.pdf
    [36]
    Joki K, Bagirov AM, Karmitsa N, and Mäkelä MM A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes J. Global Optim. 2017 68 3 501-535
    [37]
    Levato, T.: Algorithms for 0: norm optimization problems. Doctoral Dissertation, Dipartimento di Ingegneria dell’Informazione, Università di Firenze, Italia (2019)
    [38]
    Liu YL, Bi SJ, and Pan SH Equivalent Lipschitz surrogates for zero-norm and rank optimization problems J. Glob. Optim. 2018 72 679-704
    [39]
    Maldonado S, Pérez J, Weber R, and Labbé M Feature selection for Support Vector Machines via Mixed Integer Linear Programming Inf. Sci. 2014 279 163-175
    [40]
    Miao W, Pan S, and Sun D A Rank-Corrected Procedure for Matrix Completion with Fixed Basis Coefficients Math. Program. 2016 159 289-338
    [41]
    Overton ML and Womersley RS Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices Math. Program. 1993 62 1–3 321-357
    [42]
    Rinaldi F, Schoen F, and Sciandrone M Concave programming for minimizing the zero-norm over polyhedral sets Comput. Optim. Appl. 2010 46 467-486
    [43]
    Sato T, Takano Y, Miyashiro R, and Yoshise A Feature subset selection for logistic regression via mixed integer optimization Comput. Optim. Appl. 2016 64 3 865-880
    [44]
    Strekalovsky AS Global optimality conditions for nonconvex optimization J. Global Optim. 1998 12 415-434
    [45]
    Tibshirani R Regression shrinkage and selection via the lasso J. R. Stat. Soc. B 1996 58 1 267-288
    [46]
    Ustun B and Rudin C Supersparse linear integer models for optimized medical scoring systems Mach. Learn. 2016 102 349-391
    [47]
    Vapnik V The Nature of the Statistical Learning Theory 1995 Springer
    [48]
    Watson GA Linear best approximation using a class of polyhedral norms Numer. Algorithms 1992 2 321-336
    [49]
    Weston J, Elisseeff A, Schölkopf B, and Tipping M Use of the zero-norm with linear models and kernel methods J. Mach. Learn. Res. 2003 3 1439-1461
    [50]
    Wu B, Ding C, Sun D, and Toh K-C On the Moreau-Yosida regularization of the vector k-norm related functions SIAM J. Optim. 2014 24 2 766-794
    [51]
    Yin P, Lou Y, He Q, and Xin J Minimization of 1-2 for compressed sensing SIAM J. Sci. Comput. 2015 37 2 536-563
    [52]
    Zhang CH Nearly unbiased variable selection under minimax concave penalty Ann. Stat. 2010 38 894-942

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Computational Optimization and Applications
    Computational Optimization and Applications  Volume 86, Issue 2
    Nov 2023
    368 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 12 July 2023
    Accepted: 24 June 2023
    Received: 27 February 2023

    Author Tags

    1. Global optimization
    2. Sparse optimization
    3. Cardinality constraint
    4. k-norm
    5. Support vector machine

    Qualifiers

    • Research-article

    Funding Sources

    • Università della Calabria

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media