article

Free access

An introduction to variable and feature selection

Authors:

Isabelle Guyon and

André ElisseeffAuthors Info & Claims

The Journal of Machine Learning Research, Volume 3

Pages 1157 - 1182

Published: 01 March 2003 Publication History

Abstract

Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

References

[1]

E. Amaldi and V. Kann. On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209: 237-260, 1998.

[2]

R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter. Distributional word clusters vs. words for text categorization. JMLR, 3: 1183-1208 (this issue), 2003.

[3]

A. Ben-Hur and I. Guyon. Detecting stable clusters using principal component analysis. In M.J. Brownstein and A. Kohodursky, editors, Methods In Molecular Biology, pages 159-182. Humana Press, 2003.

[4]

Y. Bengio and N. Chapados. Extensions to metric-based model selection. JMLR, 3: 1209- 1227 (this issue), 2003.

[5]

J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. JMLR, 3: 1229-1243 (this issue), 2003.

[6]

A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2): 245-271, December 1997.

[7]

B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, pages 144-152, Pittsburgh, 1992. ACM.

[8]

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks, 1984.

[9]

R. Caruana and V. de Sa. Benefitting from the variables that variable selection discards. JMLR, 3: 1245-1264 (this issue), 2003.

[10]

I. Dhillon, S. Mallela, and R. Kumar. A divisive information-theoretic feature clustering algorithm for text classification. JMLR, 3: 1265-1287 (this issue), 2003.

[11]

T. G. Dietterich. Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10(7): 1895-1924, 1998.

[12]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley & Sons, USA, 2nd edition, 2001.

[13]

T. R. Golub et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286: 531-537, 1999.

[14]

G. Forman. An extensive empirical study of feature selection metrics for text classification. JMLR, 3: 1289-1306 (this issue), 2003.

[15]

T. Furey, N. Cristianini, Duffy, Bednarski N., Schummer D., M., and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16: 906-914, 2000.

[16]

A. Globerson and N. Tishby. Sufficient dimensionality reduction. JMLR, 3: 1307-1331 (this issue), 2003.

[17]

Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in SVMs. In NIPS 15, 2002.

[18]

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3): 389-422, 2002.

[19]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001.

[20]

T. Jebara and T. Jaakkola. Feature selection and dualities in maximum entropy discrimination. In 16th Annual Conference on Uncertainty in Artificial Intelligence, 2000.

[21]

K. Kira and L. Rendell. A practical approach to feature selection. In D. Sleeman and P. Edwards, editors, International Conference on Machine Learning, pages 368-377, Aberdeen, July 1992. Morgan Kaufmann.

[22]

R. Kohavi and G. John. Wrappers for feature selection. Artificial Intelligence, 97(1-2): 273-324, December 1997.

[23]

D. Koller and M. Sahami. Toward optimal feature selection. In 13th International Conference on Machine Learning, pages 284-292, July 1996.

[24]

Y. LeCun, J. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal brain damage. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems II, San Mateo, CA, 1990. Morgan Kaufmann.

[25]

G. Monari and G. Dreyfus. Withdrawing an example from the training set: an analytic estimation of its effect on a nonlinear parameterized model. Neurocomputing Letters, 35: 195-201, 2000.

[26]

C. Nadeau and Y. Bengio. Inference for the generalization error. Machine Learning (to appear), 2001.

[27]

A. Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In 15th International Conference on Machine Learning, pages 404- 412. Morgan Kaufmann, San Francisco, CA, 1998.

[28]

A. Y. Ng and M. Jordan. Convergence rates of the voting Gibbs classifier, with application to Bayesian feature selection. In 18th International Conference on Machine Learning, 2001.

[29]

J. Pearl. Causality. Cambridge University Press, 2000.

[30]

F. Pereira, N. Tishby, and L. Lee. Distributional clustering of English words. In Proc. Meeting of the Association for Computational Linguistics, pages 183-190, 1993.

[31]

S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast incremental feature selection by gradient descent in function space. JMLR, 3: 1333-1356 (this issue), 2003.

[32]

A. Rakotomamonjy. Variable selection using SVM-based criteria. JMLR, 3: 1357-1370 (this issue), 2003.

[33]

J. Reunanen. Overfitting in making comparisons between variable selection methods. JMLR, 3: 1371-1382 (this issue), 2003.

[34]

I. Rivals and L. Personnaz. MLPs (mono-layer polynomials and multi-layer perceptrons) for non-linear modeling. JMLR, 3: 1383-1398 (this issue), 2003.

[35]

B. Schoelkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge MA, 2002.

[36]

D. Schuurmans. A new metric-based approach to model selection. In 9th Innovative Applications of Artificial Intelligence Conference, pages 552-558, 1997.

[37]

H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar. Ranking a random feature for variable and feature selection. JMLR, 3: 1399-1414 (this issue), 2003.

[38]

R. Tibshirani. Regression selection and shrinkage via the lasso. Technical report, Stanford University, Palo Alto, CA, June 1994.

[39]

N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proc. of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368-377, 1999.

[40]

K. Torkkola. Feature extraction by non-parametric mutual information maximization. JMLR, 3: 1415-1438 (this issue), 2003.

[41]

V. G. Tusher, R. Tibshirani, and G. Chu. Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98: 5116-5121, April 2001.

[42]

V. Vapnik. Estimation of dependencies based on empirical data. Springer series in statistics. Springer, 1982.

[43]

V. Vapnik. Statistical Learning Theory. John Wiley & Sons, N.Y., 1998.

[44]

A. Vehtari and J. Lampinen. Bayesian input variable selection using posterior probabilities and expected utilities. Report B31, 2002.

[45]

J. Weston, A. Elisseff, B. Schoelkopf, and M. Tipping. Use of the zero norm with linear models and kernel methods. JMLR, 3: 1439-1461 (this issue), 2003.

[46]

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for SVMs. In NIPS 13, 2000.

[47]

E.P. Xing and R.M. Karp. Cliff: Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. In 9th International Conference on Intelligence Systems for Molecular Biology, 2001.

Cited By

Zhou QSun B(2024)A Gaussian–Based WGAN–GP Oversampling Approach for Solving the Class Imbalance ProblemInternational Journal of Applied Mathematics and Computer Science10.61822/amcs-2024-002134:2(291-307)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.61822/amcs-2024-0021
Liu ZWang ASun GLi JBao HLiu Y(2024)Evolutionary feature selection based on hybrid bald eagle search and particle swarm optimizationIntelligent Data Analysis10.3233/IDA-22722228:1(121-159)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDA-227222
Harbi YMerat SAliouat ZHarous S(2024)Bio-inspired Intrusion Detection System for Internet of Things Networks SecurityProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660856(14-19)Online publication date: 25-May-2024
https://dl.acm.org/doi/10.1145/3660853.3660856
Show More Cited By

Index Terms

An introduction to variable and feature selection
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
    2. Search methodologies
      1. Heuristic function construction
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Variable Global Feature Selection Scheme for automatic classification of text documents

A novel Variable Global Feature Selection Scheme (VGFSS) is proposed.VGFSS selects variable number of features from each class instead of equal features.The selection of features in VGFSS is based on distribution of terms in the classes.The methods are ...
Read More
Variable selection in model-based clustering: A general variable role modeling

The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. A more versatile variable selection model is ...
Read More
Input variable selection for feature extraction in classification problems

We propose an input variable selection method based on discriminant features. By analyzing the relationship between the input space and feature space obtained by discriminant analysis, the input variables that contain a large amount of discriminative ...
Read More

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 3, Issue

3/1/2003

1437 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 March 2003

Published in JMLR Volume 3

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2,424
Total Citations
View Citations
27,116
Total Downloads

Downloads (Last 12 months)1,390
Downloads (Last 6 weeks)151

Other Metrics

View Author Metrics

Citations

Cited By

Zhou QSun B(2024)A Gaussian–Based WGAN–GP Oversampling Approach for Solving the Class Imbalance ProblemInternational Journal of Applied Mathematics and Computer Science10.61822/amcs-2024-002134:2(291-307)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.61822/amcs-2024-0021
Liu ZWang ASun GLi JBao HLiu Y(2024)Evolutionary feature selection based on hybrid bald eagle search and particle swarm optimizationIntelligent Data Analysis10.3233/IDA-22722228:1(121-159)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDA-227222
Harbi YMerat SAliouat ZHarous S(2024)Bio-inspired Intrusion Detection System for Internet of Things Networks SecurityProceedings of the Cognitive Models and Artificial Intelligence Conference10.1145/3660853.3660856(14-19)Online publication date: 25-May-2024
https://dl.acm.org/doi/10.1145/3660853.3660856
Zhao CLing HLu SShi YChen JLi PGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Improve Deep Hashing with Language Guidance for Unsupervised Image RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658059(137-145)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658059
Alchieri LAbdalazim NAlecci LGashi SGjoreski MSantini S(2024)Lateralization Effects in Electrodermal Activity Data Collected Using Wearable DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435418:1(1-30)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643541
Xiao MWang DWu MLiu KXiong HZhou YFu Y(2024)Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/363805918:4(1-22)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3638059
Li CKim KWu BZhang PZhang HDai XVajda PLin Y(2024)An Investigation on Hardware-Aware Vision Transformer ScalingACM Transactions on Embedded Computing Systems10.1145/361138723:3(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3611387
Saarela MHong JPark J(2024)On the relation of causality- versus correlation-based feature selection on model fairnessProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636018(56-64)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3605098.3636018
Cheng XYe YHe GSong QCao PKim T(2024)Heterogeneous Graph Attention Network Based Statistical Timing Library Characterization with Parasitic RC ReductionProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473881(171-176)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1109/ASP-DAC58780.2024.10473881
Luftensteiner SChasparis GKüng J(2024)PAS - A Feature Selection Process Definition for Industrial SettingsProcedia Computer Science10.1016/j.procs.2024.01.030232:C(308-316)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.procs.2024.01.030
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents