Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1620853.1620882dlproceedingsArticle/Chapter ViewAbstractPublication PagesnaaclConference Proceedingsconference-collections
research-article
Free access

Learning combination features with L1 regularization

Published: 31 May 2009 Publication History
  • Get Citation Alerts
  • Abstract

    When linear classifiers cannot successfully classify data, we often add combination features, which are products of several original features. The searching for effective combination features, namely feature engineering, requires domain-specific knowledge and hard work. We present herein an efficient algorithm for learning an L1 regularized logistic regression model with combination features. We propose to use the grafting algorithm with efficient computation of gradients. This enables us to find optimal weights efficiently without enumerating all combination features. By using L1 regularization, the result we obtain is very compact and achieves very efficient inference. In experiments with NLP tasks, we show that the proposed method can extract effective combination features, and achieve high performance with very few features.

    References

    [1]
    Davidov, D., E. Gabrilovich, and S. Markovitch. 2004. Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In Proc. of SIGIR.
    [2]
    Dudík, Miroslav, Steven J. Phillips, and Robert E. Schapire. 2007. Maximum entropy density estimation with generalized regularization and an application to species distribution modeling. JMLR, 8:1217--1260.
    [3]
    Gao, J., H. Suzuki, and B. Yu. 2006. Approximation lasso methods for language modeling. In Proc. of ACL/COLING.
    [4]
    Gao, J., G. Andrew, M. Johnson, and K. Toutanova. 2007. A comparative study of parameter estimation methods for statistical natural language processing. In Proc. of ACL, pages 824--831.
    [5]
    Kudo, T. and Y. Matsumoto. 2004. A boosting algorithm for classification of semi-structured text. In Proc. of EMNLP.
    [6]
    Ng, A. 2004. Feature selection, 11 vs. 12 regularization, and rotational invariance. In NIPS.
    [7]
    Perkins, S. and J. Theeiler. 2003. Online feature selection using grafting. ICML.
    [8]
    Saigo, H., T. Uno, and K. Tsuda. 2007. Mining complex genotypic features for predicting HIV-1 drug resistance. Bioinformatics, 23:2455--2462.
    [9]
    Sassano, Manabu. 2004. Linear-time dependency analysis for japanese. In Proc. of COLING.

    Cited By

    View all
    • (2011)Classifying dialogue in high-dimensional spaceACM Transactions on Speech and Language Processing 10.1145/1966407.19664137:3(1-15)Online publication date: 6-Jun-2011
    • (2010)Kernel slicingProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873921(1245-1253)Online publication date: 23-Aug-2010
    • (2009)Polynomial to linearProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 310.5555/1699648.1699701(1542-1551)Online publication date: 6-Aug-2009

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    NAACL-Short '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
    May 2009
    317 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    Published: 31 May 2009

    Qualifiers

    • Research-article

    Acceptance Rates

    Overall Acceptance Rate 21 of 29 submissions, 72%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 28 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2011)Classifying dialogue in high-dimensional spaceACM Transactions on Speech and Language Processing 10.1145/1966407.19664137:3(1-15)Online publication date: 6-Jun-2011
    • (2010)Kernel slicingProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873921(1245-1253)Online publication date: 23-Aug-2010
    • (2009)Polynomial to linearProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 310.5555/1699648.1699701(1542-1551)Online publication date: 6-Aug-2009

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media