Comparative study on classification performance between support vector machine and logistic regression

Musa, Abdallah Bashir

doi:10.1007/s13042-012-0068-x

Comparative study on classification performance between support vector machine and logistic regression

Original Article
Published: 24 January 2012

Volume 4, pages 13–24, (2013)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abdallah Bashir Musa¹

3456 Accesses
Explore all metrics

Abstract

Support vector machine (SVM) is a comparatively new machine learning algorithm for classification, while logistic regression (LR) is an old standard statistical classification method. Although there have been many comprehensive studies comparing SVM and LR, since they were made, there have been many new improvements applied to them such as bagging and ensemble. Recently, bagging and ensemble learning have become hot topics, widely used to improve the generalization performance of single learning algorithm. Therefore, comparing classification performance between SVM and LR using bagging and ensemble is an interesting issue. The average of estimated probabilities’ strategy was used for combining classifiers in this paper. Different evaluation metrics assess different characteristics of machine learning algorithm. It is possible for a learning method to perform well on one metric, but be suboptimal on other metrics. Therefore this study includes a variety of criteria to evaluate the classification performance of the learning methods: accuracy, sensitivity, specificity, precision, F-score and the area under the receiver operating characteristic curve. This has not been included in previous studies of SVM, owing to the fact that it did not support estimated probabilities at that time. Other metrics used in medical diagnosis, such as, Youden’s index (γ), positive and negative likelihoods (ρ+, ρ−) and diagnostic odds ratio were evaluated to convey and compare the qualities of the two algorithms. This study is distinct by its inclusion of a comprehensive statistical analysis for the results of the SVM and LR algorithms on various data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Study of Performance Metrics of Data Mining Algorithms on Medical Data

Classification of Medical Datasets Using Optimal Feature Selection Method with Multi-support Vector Machine

Identifying Classification Technique for Medical Diagnosis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley series in probability and statistics, Wiley, Inc, New York
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, 4th edn. Irwin, Chicago
Google Scholar
Wang L (ed) (2005) Support vector machines theory and applications. Springer, Berlin
MATH Google Scholar
Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. MIT, Cambridge
MATH Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Perlich C, Provost F, Simonoff JS (2003) Tree induction vs. logistic regression: a learning-curve analysis. J Mach Learn Res. doi:10.1162/153244304322972694
King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Applied Artif Intell 9(3):289–333
Article Google Scholar
Muniz AMS, Nadal J, Liu H, Liu W, Lyons KE, Pahwa R (2010) Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait. J Biomech 43(4):720–726
Article Google Scholar
Xu L, Chow M-C, Gao X-Z (2005) Comparisons of logistic regression and artificial neural network on power distribution systems fault cause identification. Proceedings of 2005 IEEE Mid-Summer Workshop on Soft Computing in Industrial Applications (SMCia/05)
Chen W-H, Shih J-Y, Wu S (2006) Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets. Int J Electron Fin 1(1):49–67. doi:10.1504/IJEF.2006.008837
Google Scholar
Song JH, Venkatesh SS, Conan EA (2005) Comparative analysis of logistic regression and artificial neural network for computer-aided diagnosis of breast masses. Acad Radiol 12(4):487–495
Article Google Scholar
Verplancke T, Van Looy S, Benoit D, Vansteelandt S, Depuydt P, De Turck F, Decruyenaere J (2008) Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with hematological malignancies. BMC Med Inform Decis Mak 8:56. doi:10.1186/1472-6947-8-56
Article Google Scholar
Kuncheva LI (2004) Combining pattern classifiers methods and algorithms. Wiley, Hoboken
Book MATH Google Scholar
Zhang L (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recogn 44(1):97–106
Article MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MathSciNet MATH Google Scholar
Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1(1–4):3–25
Article Google Scholar
He Q, Wang X, Chen J, Yan L (2006) A parallel genetic algorithm for solving the inverse problem of support vector machines. ICMLC 2005 LNAI 3930:871–879
Google Scholar
Wang X-Z, He Q, Chen D-G, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238
Article Google Scholar
He Q, Congxin Wu (2011) Separating theorem of samples in Banach space for support vector machine learning. Int J Mach Learn Cybernet (IJMLC) 2(1):49–54
Article Google Scholar
Sathiya Keerthi S, Lin C-J (2003) Asymptotic behaviors of support vector machines with Gaussian Kernel. Neural Comput 15(7):1667–1689
Article MATH Google Scholar
Zhang S, McCullagh P, Nugent C, Zheng H, Baumgarten M (2011) Optimal model selection for posture recognition in home-based healthcare. Int J Mach Learn Cybernet (IJMLC) 2(1):1–14
Article Google Scholar
Wang X-Z, Shu-Xia Lu, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120
Article Google Scholar
Kuss O (2002) Global goodness-of-t tests in logistic regression with sparse data. Statist Med 21:380–3789
Article Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. Lecture Notes in Computer Science, vol. 1857, pp. 1–15. doi:10.1007/3-540-45014-9_1
Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775
MathSciNet MATH Google Scholar
Valentini G, Dietterich TG (2003) Low Bias Bagged Support Vector Machines. Machine Learning, Proceedings of the Twentieth International Conference (ICML) Washington, DC, USA, pp 752–759
Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. AI 2006: advances in artificial intelligence. LNCS 4304:1015–1021. doi:10.1007/11941439_114
Google Scholar
Pereira BdeB, Pereira CAdeB (2005) A likelihood approach to diagnostic tests in clinical medicine. Revstat Stat J, Lisboa 3(1):77–98
MATH Google Scholar
Glasa AS, Lijmer JG, Bossuyta PMM (2003) The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 56:1129–1135
Article Google Scholar
Bradley AP (1997) The use of the area under the roc curves in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar
Avergara I, Norambuena T, Ferrada E, Slater AW, Melo F (2008) A simple tool for the statistical comparison of ROC curves. BMC Bioinform 9:265
Article Google Scholar
Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating graph. J Math Psychol 12(4):387–415
Google Scholar
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845
Article MATH Google Scholar
Montgomery DC (2001) Design and analysis of experiments, 5th edn. Wiley Inc, New York, pp 21–54
Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Liu Z, Wu Q, Zhang Y, Philip Chen CL (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cyber 2(1):37–47
Article Google Scholar
Hsu C-W, Chang C-C, Lin C-J (2010) A practical guide to support vector classification. Citeseer 1(1):1–16
Google Scholar
He Q, Congxin Wu (2011) Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets. Soft Comput 15(6):1105–1114
Article Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical prediction. J Royal Stat Soc B 36:111–147
MATH Google Scholar
Mahmood Z (2009) On the use of K-fold cross-validation to choose cutoff values and assess the performance of predictive models in stepwise regression. Int J Biostat 5(1), Article 25
Google Scholar
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Article Google Scholar
Mood G (1974) Introduction to the theory of statistics, 3rd edn. McGraw Hill, New York, pp 2–32

Download references

Acknowledgments

This work was supported by a grant from Hebei University, Baoding, Hebei, P.R. China. I wish to thank the PhD students of the departments of computer Sciences and mathematics for their encouragement, useful discussions, and interest.

Author information

Authors and Affiliations

Faculty of Mathematical Sciences and Computer, University of Gezira, Wad Madani, 20, Sudan
Abdallah Bashir Musa

Authors

Abdallah Bashir Musa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdallah Bashir Musa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Musa, A.B. Comparative study on classification performance between support vector machine and logistic regression. Int. J. Mach. Learn. & Cyber. 4, 13–24 (2013). https://doi.org/10.1007/s13042-012-0068-x

Download citation

Received: 04 September 2011
Accepted: 02 January 2012
Published: 24 January 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s13042-012-0068-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative study on classification performance between support vector machine and logistic regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparative Study of Performance Metrics of Data Mining Algorithms on Medical Data

Classification of Medical Datasets Using Optimal Feature Selection Method with Multi-support Vector Machine

Identifying Classification Technique for Medical Diagnosis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Comparative study on classification performance between support vector machine and logistic regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparative Study of Performance Metrics of Data Mining Algorithms on Medical Data

Classification of Medical Datasets Using Optimal Feature Selection Method with Multi-support Vector Machine

Identifying Classification Technique for Medical Diagnosis

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation