Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Comparative study on classification performance between support vector machine and logistic regression

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Support vector machine (SVM) is a comparatively new machine learning algorithm for classification, while logistic regression (LR) is an old standard statistical classification method. Although there have been many comprehensive studies comparing SVM and LR, since they were made, there have been many new improvements applied to them such as bagging and ensemble. Recently, bagging and ensemble learning have become hot topics, widely used to improve the generalization performance of single learning algorithm. Therefore, comparing classification performance between SVM and LR using bagging and ensemble is an interesting issue. The average of estimated probabilities’ strategy was used for combining classifiers in this paper. Different evaluation metrics assess different characteristics of machine learning algorithm. It is possible for a learning method to perform well on one metric, but be suboptimal on other metrics. Therefore this study includes a variety of criteria to evaluate the classification performance of the learning methods: accuracy, sensitivity, specificity, precision, F-score and the area under the receiver operating characteristic curve. This has not been included in previous studies of SVM, owing to the fact that it did not support estimated probabilities at that time. Other metrics used in medical diagnosis, such as, Youden’s index (γ), positive and negative likelihoods (ρ+, ρ−) and diagnostic odds ratio were evaluated to convey and compare the qualities of the two algorithms. This study is distinct by its inclusion of a comprehensive statistical analysis for the results of the SVM and LR algorithms on various data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley series in probability and statistics, Wiley, Inc, New York

  2. Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, 4th edn. Irwin, Chicago

    Google Scholar 

  3. Wang L (ed) (2005) Support vector machines theory and applications. Springer, Berlin

    MATH  Google Scholar 

  4. Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. MIT, Cambridge

    MATH  Google Scholar 

  5. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  6. Perlich C, Provost F, Simonoff JS (2003) Tree induction vs. logistic regression: a learning-curve analysis. J Mach Learn Res. doi:10.1162/153244304322972694

  7. King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Applied Artif Intell 9(3):289–333

    Article  Google Scholar 

  8. Muniz AMS, Nadal J, Liu H, Liu W, Lyons KE, Pahwa R (2010) Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait. J Biomech 43(4):720–726

    Article  Google Scholar 

  9. Xu L, Chow M-C, Gao X-Z (2005) Comparisons of logistic regression and artificial neural network on power distribution systems fault cause identification. Proceedings of 2005 IEEE Mid-Summer Workshop on Soft Computing in Industrial Applications (SMCia/05)

  10. Chen W-H, Shih J-Y, Wu S (2006) Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets. Int J Electron Fin 1(1):49–67. doi:10.1504/IJEF.2006.008837

    Google Scholar 

  11. Song JH, Venkatesh SS, Conan EA (2005) Comparative analysis of logistic regression and artificial neural network for computer-aided diagnosis of breast masses. Acad Radiol 12(4):487–495

    Article  Google Scholar 

  12. Verplancke T, Van Looy S, Benoit D, Vansteelandt S, Depuydt P, De Turck F, Decruyenaere J (2008) Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with hematological malignancies. BMC Med Inform Decis Mak 8:56. doi:10.1186/1472-6947-8-56

    Article  Google Scholar 

  13. Kuncheva LI (2004) Combining pattern classifiers methods and algorithms. Wiley, Hoboken

    Book  MATH  Google Scholar 

  14. Zhang L (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recogn 44(1):97–106

    Article  MATH  Google Scholar 

  15. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MathSciNet  MATH  Google Scholar 

  16. Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1(1–4):3–25

    Article  Google Scholar 

  17. He Q, Wang X, Chen J, Yan L (2006) A parallel genetic algorithm for solving the inverse problem of support vector machines. ICMLC 2005 LNAI 3930:871–879

    Google Scholar 

  18. Wang X-Z, He Q, Chen D-G, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238

    Article  Google Scholar 

  19. He Q, Congxin Wu (2011) Separating theorem of samples in Banach space for support vector machine learning. Int J Mach Learn Cybernet (IJMLC) 2(1):49–54

    Article  Google Scholar 

  20. Sathiya Keerthi S, Lin C-J (2003) Asymptotic behaviors of support vector machines with Gaussian Kernel. Neural Comput 15(7):1667–1689

    Article  MATH  Google Scholar 

  21. Zhang S, McCullagh P, Nugent C, Zheng H, Baumgarten M (2011) Optimal model selection for posture recognition in home-based healthcare. Int J Mach Learn Cybernet (IJMLC) 2(1):1–14

    Article  Google Scholar 

  22. Wang X-Z, Shu-Xia Lu, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120

    Article  Google Scholar 

  23. Kuss O (2002) Global goodness-of-t tests in logistic regression with sparse data. Statist Med 21:380–3789

    Article  Google Scholar 

  24. Dietterich TG (2000) Ensemble methods in machine learning. Lecture Notes in Computer Science, vol. 1857, pp. 1–15. doi:10.1007/3-540-45014-9_1

  25. Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775

    MathSciNet  MATH  Google Scholar 

  26. Valentini G, Dietterich TG (2003) Low Bias Bagged Support Vector Machines. Machine Learning, Proceedings of the Twentieth International Conference (ICML) Washington, DC, USA, pp 752–759

  27. Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. AI 2006: advances in artificial intelligence. LNCS 4304:1015–1021. doi:10.1007/11941439_114

    Google Scholar 

  28. Pereira BdeB, Pereira CAdeB (2005) A likelihood approach to diagnostic tests in clinical medicine. Revstat Stat J, Lisboa 3(1):77–98

    MATH  Google Scholar 

  29. Glasa AS, Lijmer JG, Bossuyta PMM (2003) The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 56:1129–1135

    Article  Google Scholar 

  30. Bradley AP (1997) The use of the area under the roc curves in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Article  Google Scholar 

  31. Avergara I, Norambuena T, Ferrada E, Slater AW, Melo F (2008) A simple tool for the statistical comparison of ROC curves. BMC Bioinform 9:265

    Article  Google Scholar 

  32. Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating graph. J Math Psychol 12(4):387–415

    Google Scholar 

  33. DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845

    Article  MATH  Google Scholar 

  34. Montgomery DC (2001) Design and analysis of experiments, 5th edn. Wiley Inc, New York, pp 21–54

    Google Scholar 

  35. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  36. Liu Z, Wu Q, Zhang Y, Philip Chen CL (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cyber 2(1):37–47

    Article  Google Scholar 

  37. Hsu C-W, Chang C-C, Lin C-J (2010) A practical guide to support vector classification. Citeseer 1(1):1–16

    Google Scholar 

  38. He Q, Congxin Wu (2011) Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets. Soft Comput 15(6):1105–1114

    Article  Google Scholar 

  39. Stone M (1974) Cross-validatory choice and assessment of statistical prediction. J Royal Stat Soc B 36:111–147

    MATH  Google Scholar 

  40. Mahmood Z (2009) On the use of K-fold cross-validation to choose cutoff values and assess the performance of predictive models in stepwise regression. Int J Biostat 5(1), Article 25

    Google Scholar 

  41. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27

    Article  Google Scholar 

  42. Mood G (1974) Introduction to the theory of statistics, 3rd edn. McGraw Hill, New York, pp 2–32

Download references

Acknowledgments

This work was supported by a grant from Hebei University, Baoding, Hebei, P.R. China. I wish to thank the PhD students of the departments of computer Sciences and mathematics for their encouragement, useful discussions, and interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdallah Bashir Musa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Musa, A.B. Comparative study on classification performance between support vector machine and logistic regression. Int. J. Mach. Learn. & Cyber. 4, 13–24 (2013). https://doi.org/10.1007/s13042-012-0068-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-012-0068-x

Keywords