Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Analytic Feature Selection for Support Vector Machines

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7988))

Abstract

Support vector machines (SVMs) rely on the inherent geometry of a data set to classify training data. Because of this, we believe SVMs are an excellent candidate to guide the development of an analytic feature selection algorithm, as opposed to the more commonly used heuristic methods. We propose a filter-based feature selection algorithm based on the inherent geometry of a feature set. Through observation, we identified six geometric properties that differ between optimal and suboptimal feature sets, and have statistically significant correlations to classifier performance. Our algorithm is based on logistic and linear regression models using these six geometric properties as predictor variables. The proposed algorithm achieves excellent results on high dimensional text data sets, with features that can be organized into a handful of feature types; for example, unigrams, bigrams or semantic structural features. We believe this algorithm is a novel and effective approach to solving the feature selection problem for linear SVMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience (1998)

    Google Scholar 

  2. Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann (2006)

    Google Scholar 

  3. Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 306–313. IEEE Comput. Soc. (2002)

    Google Scholar 

  4. Joachims, T.: Making large-scale support vector machine learning practical (1998)

    Google Scholar 

  5. Garg, A., Har-peled, S., Roth, D.: On generalization bounds, projection profile, and margin distribution (2002)

    Google Scholar 

  6. Bradley, P., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference, ICML 1998, pp. 82–90. Morgan Kaufmann (1998)

    Google Scholar 

  7. Bern, M., Eppstein, D.: Optimization over zonotopes and training support vector machines (2001)

    Google Scholar 

  8. Webster, R.: Convexity. Oxford University Press, Oxford (1994)

    MATH  Google Scholar 

  9. Ziegler, G.M.: Lectures on Polytopes. Springer (1995)

    Google Scholar 

  10. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001)

    Google Scholar 

  11. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes: The Art of Scientific Computing, 3rd edn. Cambridge University Press, New York (2007)

    Google Scholar 

  12. Swaminathan, R., Sharma, A., Yang, H.: Opinion mining for biomedical text data: Feature space design and feature selection. In: The Nineth International Workshop on Data Mining in Bioinformatics, BIOKDD 2010 (July 2010)

    Google Scholar 

  13. Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACL (2004)

    Google Scholar 

  14. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009)

    Google Scholar 

  15. Frank, A., Asuncion, A.: UCI machine learning repository (2010)

    Google Scholar 

  16. Tesic, J.: Evaluating a class of dimensionality reduction algorithms abstract

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stambaugh, C., Yang, H., Breuer, F. (2013). Analytic Feature Selection for Support Vector Machines. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39712-7_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39711-0

  • Online ISBN: 978-3-642-39712-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics