Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11941439_21guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

An efficient alternative to SVM based recursive feature elimination with applications in natural language processing and bioinformatics

Published: 04 December 2006 Publication History

Abstract

The SVM based Recursive Feature Elimination (RFE-SVM) algorithm is a popular technique for feature selection, used in natural language processing and bioinformatics. Recently it was demonstrated that a small regularisation constant C can considerably improve the performance of RFE-SVM on microarray datasets. In this paper we show that further improvements are possible if the explicitly computable limit C →0 is used. We prove that in this limit most forms of SVM and ridge regression classifiers scaled by the factor $\frac{1}{C}$ converge to a centroid classifier. As this classifier can be used directly for feature ranking, in the limit we can avoid the computationally demanding recursion and convex optimisation in RFE-SVM. Comparisons on two text based author verification tasks and on three genomic microarray classification tasks indicate that this straightforward method can surprisingly obtain comparable (at times superior) performance and is about an order of magnitude faster.

References

[1]
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46 (2002) 389-422
[2]
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proc. 21st Int. Conf. Machine Learning (ICML), Banff, Canada (2004)
[3]
Huang, T.M., Kecman, V.: Gene extraction for cancer diagnosis by support vector machines - an improvement. Artificial Intelligence in Medicine 35 (2005) 185-194
[4]
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
[5]
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK (2000)
[6]
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
[7]
Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley & Sons (2001)
[8]
Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proc. 20th Int. Conf. Computational Linguistics (COLING), Geneva (2004) 611-617
[9]
Love, H.: Attributing Authorship: An Introduction. Cambridge University Press, U.K. (2002)
[10]
Sanderson, C., Guenter, S.: Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation. In: Proc. 2006 Conf. Empirical Methods in Natural Language Processing (EMNLP), Sydney (2006) 482- 491
[11]
Ambroise, C., McLachlan, G.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. National Acad. Sci. 99 (2002) 6562-6566
[12]
Alizadeh, A., Eisen, M., Davis, R., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403 (2000) 503-511
[13]
Chu, F., Wang, L.: Gene expression data analysis using support vector machines. In: Proc. Intl. Joint Conf. Neural Networks. (2003) 2268-2271
[14]
Tothill, R., Kowalczyk, A., Rischin, D., Bousioutas, A., Haviv, I., et al.: An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Research 65 (2005) 4031-4040
[15]
Tibshirani, R., Hastie, T., et al.: Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Statistical Science 18 (2003) 104-117
[16]
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research 5 (2004) 101-141
[17]
van't Veer, L., Dai, H., van de Vijver, M., He, Y., Hart, A., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 (2002) 530-536

Cited By

View all
  • (2012)Support vector machines for anti-pattern detectionProceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering10.1145/2351676.2351723(278-281)Online publication date: 3-Sep-2012
  • (2011)A General Framework for Analyzing Data from Two Short Time-Series Microarray ExperimentsIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2009.518:1(14-26)Online publication date: 1-Jan-2011

Index Terms

  1. An efficient alternative to SVM based recursive feature elimination with applications in natural language processing and bioinformatics

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      AI'06: Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
      December 2006
      1297 pages
      ISBN:3540497870
      • Editors:
      • Abdul Sattar,
      • Byeong-ho Kang

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 04 December 2006

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2012)Support vector machines for anti-pattern detectionProceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering10.1145/2351676.2351723(278-281)Online publication date: 3-Sep-2012
      • (2011)A General Framework for Analyzing Data from Two Short Time-Series Microarray ExperimentsIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2009.518:1(14-26)Online publication date: 1-Jan-2011

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media