Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3041838.3041913guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Online feature selection using grafting

Published: 21 August 2003 Publication History
  • Get Citation Alerts
  • Abstract

    In the standard feature selection problem, we are given a fixed set of candidate features for use in a learning problem, and must select a subset that will be used to train a model that is "as good as possible" according to some criterion. In this paper, we present an interesting and useful variant, the online feature selection problem, in which, instead of all features being available from the start, features arrive one at a time. The learner's task is to select a subset of features and return a corresponding model at each time step which is as good as possible given the features seen so far. We argue that existing feature selection methods do not perform well in this scenario, and describe a promising alternative method, based on a stagewise gradient descent technique which we call grafting.

    References

    [1]
    Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. www.ics.uci.edu/~mlearn/MLRepository.html. University of California, Irvine, Dept. of Information and Computer Science.
    [2]
    Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. Proc. Fifth Annual Workshop on Computational Learning Theory (pp. 144-152). Pittsburgh, ACM.
    [3]
    Chang, C., & Lin, C. (2001). LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
    [4]
    Fletcher, R. (1987). Practical methods of optimization. Wiley. 2nd edition.
    [5]
    Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. Machine Learning: Proc. 13th Int. Conf. (pp. 148-156). Morgan Kaufmann.
    [6]
    Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28, 337-307.
    [7]
    Hall, M. (2000). Correlation-based feature selection for discrete and numeric class machine learning. Proc. Int. Conf. Machine Learning (pp. 359-365). Morgan Kaufmann.
    [8]
    Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. Springer.
    [9]
    Hoerl, A., & Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55-67.
    [10]
    Kira, K., & Rendell, L. (1992). A practical approach to feature selection. Proc. Int. Conf. on Machine Learning (pp. 249-256). Morgan Kaufmann.
    [11]
    Kohavi, R., & John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97, 273-324.
    [12]
    Mallat, S., & Zhang, Z. (1993). Matching pursuit with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41, 3397-3415.
    [13]
    Perkins, S., Harvey, N.R., Brumby, S. P., & Lacker, K. (2001). Support vector machines for broad area feature classification in remotely sensed images. Proc. SPIE 4381, Aerosense 2001. Orlando.
    [14]
    Perkins, S., Lacker, K., & Theiler, J. (2003). Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research. In press. Also at: http://niswww.lanl.gov/~simes/pubs.
    [15]
    Tibshirani, R. (1994). Regression shrinkage and selection via the lasso (Technical Report). Dept. of Statistics, University of Toronto.

    Cited By

    View all
    • (2021)Spatio-Temporal Event Forecasting Using Incremental Multi-Source Feature LearningACM Transactions on Knowledge Discovery from Data10.1145/346497616:2(1-28)Online publication date: 13-Sep-2021
    • (2021)Dynamic, Incremental, and Continuous Detection of Cyberbullying in Online Social MediaACM Transactions on the Web10.1145/344801415:3(1-33)Online publication date: 13-May-2021
    • (2021)Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label DatasetACM Transactions on Knowledge Discovery from Data10.1145/344758615:6(1-24)Online publication date: 19-May-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'03: Proceedings of the Twentieth International Conference on International Conference on Machine Learning
    August 2003
    935 pages
    ISBN:1577351894

    Sponsors

    • Kluwer Academic Publishers
    • NSF: National Science Foundation
    • Kaidara Software
    • AAAI: American Association for Artificial Intelligence
    • Microsoft Research: Microsoft Research
    • HP: HP

    Publisher

    AAAI Press

    Publication History

    Published: 21 August 2003

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Spatio-Temporal Event Forecasting Using Incremental Multi-Source Feature LearningACM Transactions on Knowledge Discovery from Data10.1145/346497616:2(1-28)Online publication date: 13-Sep-2021
    • (2021)Dynamic, Incremental, and Continuous Detection of Cyberbullying in Online Social MediaACM Transactions on the Web10.1145/344801415:3(1-33)Online publication date: 13-May-2021
    • (2021)Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label DatasetACM Transactions on Knowledge Discovery from Data10.1145/344758615:6(1-24)Online publication date: 19-May-2021
    • (2019)Cyberbullying Ends Here: Towards Robust Detection of Cyberbullying in Social MediaThe World Wide Web Conference10.1145/3308558.3313462(3427-3433)Online publication date: 13-May-2019
    • (2019)Online streaming feature selectionPattern Analysis & Applications10.1007/s10044-018-0690-722:3(949-963)Online publication date: 1-Aug-2019
    • (2019)A piecewise weight update rule for a supervised training of cortical algorithmsNeural Computing and Applications10.1007/s00521-017-3167-531:6(1915-1930)Online publication date: 1-Jun-2019
    • (2018)Dynamic sparse coding for sparse time-series modeling via first-order smooth optimizationApplied Intelligence10.5555/3288064.328809648:11(3889-3901)Online publication date: 1-Nov-2018
    • (2018)A survey on online feature selection with streaming featuresFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5489-312:3(479-493)Online publication date: 1-Jun-2018
    • (2017)Feature SelectionACM Computing Surveys10.1145/313662550:6(1-45)Online publication date: 6-Dec-2017
    • (2017)Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse DataACM Transactions on Knowledge Discovery from Data10.1145/307064611:4(1-22)Online publication date: 29-Jun-2017
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media