Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3060832.3060889guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Direct sparsity optimization based feature selection for multi-class classification

Published: 09 July 2016 Publication History

Abstract

A novel sparsity optimization method is proposed to select features for multi-class classification problems by directly optimizing a l 2,p -norm (0 < p ≤ 1) based sparsity function subject to data-fitting inequality constraints to obtain large between-class margins. The direct sparse optimization method circumvents the empirical tuning of regularization parameters in existing feature selection methods that adopt the sparsity model as a regularization term. To solve the direct sparsity optimization problem that is nonsmooth and non-convex when 0 < p < 1, we propose an efficient iterative algorithm with proved convergence by converting it to a convex and smooth optimization problem at every iteration step. The proposed algorithm has been evaluated based on publicly available datasets. The experiments have demonstrated that our algorithm could achieve feature selection performance competitive to state-of-the-art algorithms.

References

[1]
Large margin classifiers: convex loss, low noise, and convergence rates. Advances in Neural Information Processing Systems 16. 16: 1173-1180, 2004.
[2]
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of the United States of America. 98: 13790-13795, 2001.
[3]
Feature selection via concave minimization and support vector machines. Proceedings of the International Conference on Machine Learning: 82-90, 1998.
[4]
Decoding by linear programming. Ieee Transactions on Information Theory. 51: 4203-4215, 2005.
[5]
Sparse multinomial ogistic regression via Bayesian l1 regularisation. Advances in Neural Information Processing Systems: 209-216, 2006.
[6]
LIBSVM: A Library for Support Vector Machines. Acm Transactions on Intelligent Systems and Technology. 2, 2011.
[7]
Restricted isometry properties and nonconvex compressive sensing. Inverse Problems. 24, 2008.
[8]
Sparse solutions to linear inverse problems with multiple measurement vectors. Ieee Transactions on Signal Processing. 53: 2477-2488, 2005.
[9]
Cvx Research, I. CVX: Matlab software for disciplined convex programming, version 2.0, http://cvxr.com/cvx, 2011.
[10]
Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm. Ieee Transactions on Signal Processing. 45: 600-616, 1997.
[11]
An introduction to variable and feature selection. Journal of Machine Learning Research. 3: 1157-1182, 2003.
[12]
Gene selection for cancer classification using support vector machines. Machine Learning. 46: 389-422, 2002.
[13]
Feature Selection via Joint Embedding Learning and Sparse Regression. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
[14]
A Practical Approach to Feature-Selection. Machine Learning: 249-256, 1992.
[15]
Efficient Algorithms for Selecting Features with Arbitrary Group Constraints via Group Lasso. 2013 Ieee 13th International Conference on Data Mining (Icdm): 379-388, 2013.
[16]
Exclusive Feature Learning on Arbitrary Structures via L1,2-norm. Advances in Neural Information Processing Systems, 2014.
[17]
Sparse multinomial logistic regression: Fast algorithms and generalization bounds. Ieee Transactions on Pattern Analysis and Machine Intelligence. 27: 957-968, 2005.
[18]
Feature-Selection and Feature-Extraction for Text Categorization. Speech and Natural Language: 212-217, 1992.
[19]
Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. International Conference on Machine Learning, 2009.
[20]
Multi-Task Feature Learning Via Efficient L2,1-Norm Minimization. Uncertainty in Artificial Intelligence, 2009.
[21]
Support vector machines with adaptive L-q penalty. Computational Statistics & Data Analysis. 51: 6380-6394, 2007.
[22]
Exact 1-norm support vector machines via unconstrained convex differentiable minimization. Journal of Machine Learning Research: 1517-1530, 2006.
[23]
An Effcient Approach to Sparse Linear Discriminant Analysis. International Conference on Machine Learning, 2012.
[24]
Efficient and Robust Feature Selection via Joint L2,1-Norms Minimization. Advances in Neural Information Processing Systems, 2010.
[25]
Multi-task feature selection. Technical report, Department of Statistics, University of California, Berkeley, 2006.
[26]
Supplementary material of "Direct Sparsity Optimization Based Feature Selection for Multi-Class Classification", 2016.
[27]
Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. Ieee Transactions on Pattern Analysis and Machine Intelligence. 27: 1226-1238, 2005.
[28]
Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B-Methodological. 58: 267-288, 1996.
[29]
Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics. 24: 412-419, 2008.
[30]
Discriminative Least Squares Regression for Multiclass Classification and Feature Selection. Ieee Transactions on Neural Networks and Learning Systems. 23: 1738-1754, 2012.
[31]
L2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
[32]
Feature Selection at the Discrete Limit. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.

Cited By

View all
  • (2017)Feature SelectionACM Computing Surveys10.1145/313662550:6(1-45)Online publication date: 6-Dec-2017
  • (2017)Feature selection by optimizing a lower bound of conditional mutual informationInformation Sciences: an International Journal10.1016/j.ins.2017.08.036418:C(652-667)Online publication date: 1-Dec-2017

Index Terms

  1. Direct sparsity optimization based feature selection for multi-class classification
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence
    July 2016
    4277 pages
    ISBN:9781577357704

    Sponsors

    • Sony: Sony Corporation
    • Arizona State University: Arizona State University
    • Microsoft: Microsoft
    • Facebook: Facebook
    • AI Journal: AI Journal

    Publisher

    AAAI Press

    Publication History

    Published: 09 July 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Feature SelectionACM Computing Surveys10.1145/313662550:6(1-45)Online publication date: 6-Dec-2017
    • (2017)Feature selection by optimizing a lower bound of conditional mutual informationInformation Sciences: an International Journal10.1016/j.ins.2017.08.036418:C(652-667)Online publication date: 1-Dec-2017

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media