article

A Tutorial on Support Vector Machines for Pattern Recognition

Author:

Christopher J. C. BurgesAuthors Info & Claims

Data Mining and Knowledge Discovery, Volume 2, Issue 2

Pages 121 - 167

https://doi.org/10.1023/A:1009715923555

Published: 01 June 1998 Publication History

Abstract

The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.

References

[1]

Aizerman, M. A., Braverman, E. M. and Rozoner, L. I. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821-837, 1964.

[2]

Anthony, M. and Biggs, N. Pac learning and neural networks. In The Handbook of Brain Theory and Neural Networks, pages 694-697, 1995.

Digital Library

[3]

Bennett, K. P. and Bredensteiner, E. Geometry in learning. In Geometry at Work, page to appear, Washington, D. C., 1998. Mathematical Association of America.

[4]

Bishop, C. M. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.

Digital Library

[5]

Blanz, V., Schölkopf, B., Bülthoff, H., Burges, C., Vapnik, V. and Vetter, T. Comparison of view-based object recognition algorithms using realistic 3d models. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks--ICANN'96, pages 251-256, Berlin, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.

Digital Library

[6]

Boser, B. E., Guyon, I. M. and Vapnik, V. A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, 1992. ACM.

Digital Library

[7]

Bunch, J. R. and Kaufman, L. Some stable methods for calculating inertia and solving symmetric linear systems. Mathematics of computation, 31(137):163-179, 1977.

[8]

Bunch, J. R. and Kaufman, L. A computational method for the indefinite quadratic programming problem. Linear Algebra and its Applications, 34:341-370, 1980.

[9]

Burges, C. J. C. and Schölkopf, B. Improving the accuracy and speed of support vector learning machines. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 375-381, Cambridge, MA, 1997. MIT Press.

[10]

Burges, C. J. C. Simplified support vector decision rules. In Lorenza Saitta, editor, Proceedings of the Thirteenth International Conference on Machine Learning, pages 71-77, Bari, Italy, 1996. Morgan Kaufman.

Digital Library

[11]

Burges, C. J. C. Geometry and invariance in kernel based methods. In Advances in Kernel Methods-Support Vector Learning, Bernhard Schölkopf, Christopher J. C. Burges and Alexander J. Smola (eds.), MIT Press, Cambridge, MA, 1998 (to appear).

Digital Library

[12]

Burges, C. J. C., Knirsch, P. and Haratsch, R. Support vector web page: http://svm.research.bell-labs.com. Technical report, Lucent Technologies, 1996.

[13]

Cortes, C. and Vapnik, V. Support vector networks. Machine Learning, 20:273-297, 1995.

Digital Library

[14]

Courant, R. and Hilbert, D. Methods of Mathematical Physics. Interscience, 1953.

[15]

Devroye, L., Györfi, L. and Lugosi, G. A Probabilistic Theory of Pattern Recognition. Springer Verlag, Applications of Mathematics Vol. 31, 1996.

[16]

Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A. and Vapnik, V. Support vector regression machines. Advances in Neural Information Processing Systems, 9:155-161, 1997.

Digital Library

[17]

Fletcher, R. Practical Methods of Optimization. John Wiley and Sons, Inc., 2nd edition, 1987.

Digital Library

[18]

Geman, S. and Bienenstock, E. Neural networks and the bias/variance dilemma. Neural Computation, 4:1-58, 1992.

Digital Library

[19]

Girosi, F. An equivalence between sparse approximation and support vector machines. Neural Computation (to appear); CBCL AI Memo 1606, MIT, 1998.

Digital Library

[20]

Guyon, I., Vapnik, V., Boser, B., Bottou, L. and Solla. S. A. Structural risk minimization for character recognition. Advances in Neural Information Processing Systems, 4:471-479, 1992.

[21]

Halmos, P. R. A Hilbert Space Problem Book. D. Van Nostrand Company, Inc., 1967.

[22]

Horn, R. A. and Johnson, C. R. Matrix Analysis. Cambridge University Press, 1985.

Digital Library

[23]

Joachims, T. Text categorization with support vector machines. Technical report, LS VIII Number 23, University of Dortmund, 1997. ftp://ftp-ai.informatik.uni-dortmund.de/pub/Reports/report23.ps.Z.

[24]

Kaufman, L. Solving the quadratic programming problem arising in support vector classification. In Advances in Kernel Methods-Support Vector Learning, Bernhard Schölkopf, Chrisopher J. C. Burges and Alexander J. Smola (eds.), MIT Press, Cambridge, MA, 1998 (to appear).

Digital Library

[25]

Kolmogorov, A. N. and Fomin, S. V. Introductory Real Analysis. Prentice-Hall, Inc., 1970.

[26]

Mangarasian, O. L. Nonlinear Programming. McGraw Hill, New York, 1969.

[27]

McCormick, G. P. Non Linear Programming: Theory, Algorithms and Applications. John Wiley and Sons, Inc., 1983.

Digital Library

[28]

Montgomery, D. C. and Peck, E. A. Introduction to Linear Regression Analysis. John Wiley and Sons, Inc., 2nd edition, 1992.

Digital Library

[29]

Moré and Wright. Optimization Guide. SIAM, 1993.

[30]

Moré, J. J. and Toraldo, G. On the solution of large quadratic programming problems with bound constraints. SIAM J. Optimization, 1(1):93-113, 1991.

[31]

Mukherjee, S., Osuna, E. and Girosi, F. Nonlinear prediction of chaotic time series using a support vector machine. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing 7, pages 511-519, Amelia Island, FL, 1997.

[32]

Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J. and Vapnik, V. Predicting time series with support vector machines. In Proceedings, International Conference on Artificial Neural Networks, page 999. Springer Lecture Notes in Computer Science, 1997.

Digital Library

[33]

Osuna, E., Freund, R. and Girosi, F. An improved training algorithm for support vector machines. In Proceedings of the 1997 IEEE Workshop on Neural Networks for Signal Processing, Eds. J. Principe, L. Giles, N. Morgan, E. Wilson, pages 276-285, Amelia Island, FL, 1997.

[34]

Osuna, E., Freund, R. and Girosi, F. Training support vector machines: an application to face detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 130-136, 1997.

Digital Library

[35]

Osuna, E. and Girosi. F. Reducing the run-time complexity of support vector machines. In International Conference on Pattern Recognition (submitted), 1998.

[36]

Press, W. H., Flannery, B. P., Teukolsky, S. A. and Vettering, W. T. Numerical recipes in C: the art of scientific computing. Cambridge University Press, 2nd edition, 1992.

Digital Library

[37]

Schmidt, M. Identifying speaker with support vector networks. In Interface '96 Proceedings, Sydney, 1996.

[38]

Schölkopf, B. Support Vector Learning. R. Oldenbourg Verlag, Munich, 1997.

[39]

Schölkopf, B., Burges, C. and Vapnik, V. Extracting support data for a given task. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings, First International Conference on Knowledge Discovery & Data Mining. AAAI Press, Menlo Park, CA, 1995.

[40]

Schölkopf, B., Burges, C. and Vapnik, V. Incorporating invariances in support vector learning machines. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks-- ICANN'96, pages 47-52, Berlin, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.

Digital Library

[41]

Schölkopf, B., Simard, P., Smola, A. and Vapnik, V. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, Cambridge, MA, 1998. MIT Press. In press.

Digital Library

[42]

Schölkopf, B., Smola, A. and Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1998. In press.

Digital Library

[43]

Schölkopf, B., Smola, A., Müller, K.-R., Burges, C. J. C. and Vapnik, V. Support vector methods in learning and feature extraction. In Ninth Australian Congress on Neural Networks (to appear), 1998.

[44]

Schölkopf, B., Sung, K., Burges, C., Girosi, F., Niyogi, P., Poggio, T. and Vapnik. V. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Trans. Sign. Processing, 45:2758- 2765, 1997.

Digital Library

[45]

Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C. and Anthony, M. A framework for structural risk minimization. In Proceedings, 9th Annual Conference on Computational Learning Theory, pages 68-76, 1996.

Digital Library

[46]

Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C. and Anthony, M. Structural risk minimization over data-dependent hierarchies. Technical report, NeuroCOLT Technical Report NC-TR-96-053, 1996.

[47]

Smola, A. and Schölkopf, B. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica (to appear), 1998.

[48]

Smola, A., Schölkopf, B. and Müller, K.-R. General cost functions for support vector regression. In Ninth Australian Congress on Neural Networks (to appear), 1998.

[49]

Smola, A.J., Schölkopf, B. and Müller, K.-R. The connection between regularization operators and support vector kernels. Neural Networks (to appear), 1998.

Digital Library

[50]

Stitson, M.O., Gammerman, A., Vapnik, V., Vovk, V., Watkins, C. and Weston, J. Support vector anova decomposition. Technical report, Royal Holloway College, Report number CSD-TR-97-22, 1997.

[51]

Strang, G.T. Introduction to Applied Mathematics. Wellesley-Cambridge Press, 1986.

[52]

Vanderbei, R.J. Interior point methods : Algorithms and formulations. ORSA J. Computing, 6(1):32-34, 1994.

[53]

Vanderbei, R.J. LOQO: An interior point code for quadratic programming. Technical report, Program in Statistics & Operations Research, Princeton University, 1994.

[54]

Vapnik, V. Estimation of Dependences Based on Empirical Data {in Russian}. Nauka, Moscow, 1979. (English translation: Springer Verlag, New York, 1982).

Digital Library

[55]

Vapnik, V. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

Digital Library

[56]

Vapnik, V. Statistical Learning Theory. John Wiley and Sons, Inc., New York, in preparation.

Digital Library

[57]

Vapnik, V., Golowich, S. and Smola, A. Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, 9:281-287, 1996.

[58]

G. Wahba. Support vector machines, reproducing kernel hilbert spaces and the randomized gacv. In Advances in Kernel Methods - Support Vector Learning, Bernhard Schölkopf, Christopher J.C. Burges and Alexander J. Smola (eds.), MIT Press, Cambridge, MA, 1998 (to appear).

Digital Library

[59]

Weston, J., Gammerman, A., Stitson, M.O., Vapnik, V., Vovk, V., and Watkins, C. Density estimation using support vector machines. Technical report, Royal Holloway College, Report number CSD-TR-97-23, 1997.

Cited By

Zhang YWang XFu YWang MLiu HWang Z(2024)Machine learning algorithm based airborne LiDAR point cloud classification methodProceedings of the International Conference on Algorithms, Software Engineering, and Network Security10.1145/3677182.3677236(301-305)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3677182.3677236
Ni QMi YWu YHe LXu YZhang B(2024)Design and Implementation of the Reliable Learning Style Recognition Mechanism Based on Fusion Labels and Ensemble ClassificationIEEE Transactions on Learning Technologies10.1109/TLT.2023.326356817(241-257)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TLT.2023.3263568
Qi TFeng XGao BWang K(2024)An end-to-end Graph Convolutional Network for Semi-supervised Subspace Clustering via label self-expressivenessKnowledge-Based Systems10.1016/j.knosys.2024.111393286:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.111393
Show More Cited By

Index Terms

A Tutorial on Support Vector Machines for Pattern Recognition

Recommendations

Bi-density twin support vector machines for pattern recognition

In this paper we present a classifier called bi-density twin support vector machines (BDTWSVMs) for data classification. In the training stage, BDTWSVMs first compute the relative density degrees for all training points using the intra-class graph whose ...
Twin Support Vector Machines for Pattern Classification

We propose Twin SVM, a binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM. The Twin SVM formulation is in the spirit of proximal SVMs via generalized ...
Evaluating the Generalization Ability of Support Vector Machines through the Bootstrap

The well-known bounds on the generalization ability of learning machines, based on the Vapnik–Chernovenkis (VC) dimension, are very loose when applied to Support Vector Machines (SVMs). In this work we evaluate the validity of the assumption that these ...

Comments

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery Volume 2, Issue 2

June 1998

108 pages

ISSN:1384-5810

Issue’s Table of Contents

Copyright © Copyright © 1998 Kluwer Academic Publishers.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 1998

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2,325
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YWang XFu YWang MLiu HWang Z(2024)Machine learning algorithm based airborne LiDAR point cloud classification methodProceedings of the International Conference on Algorithms, Software Engineering, and Network Security10.1145/3677182.3677236(301-305)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3677182.3677236
Ni QMi YWu YHe LXu YZhang B(2024)Design and Implementation of the Reliable Learning Style Recognition Mechanism Based on Fusion Labels and Ensemble ClassificationIEEE Transactions on Learning Technologies10.1109/TLT.2023.326356817(241-257)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TLT.2023.3263568
Qi TFeng XGao BWang K(2024)An end-to-end Graph Convolutional Network for Semi-supervised Subspace Clustering via label self-expressivenessKnowledge-Based Systems10.1016/j.knosys.2024.111393286:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.111393
Tang QLi G(2024)Sparse L 0-norm least squares support vector machine with feature selectionInformation Sciences: an International Journal10.1016/j.ins.2024.120591670:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120591
Zhang XZhang PLiu YWang G(2024)A novel attribute reduction method with constraints on empirical risk and decision rule lengthInformation Sciences: an International Journal10.1016/j.ins.2024.120552670:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120552
Liu CQian Q(2024)Twin proximal support vector regression with heteroscedastic Gaussian noiseExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123840250:COnline publication date: 15-Sep-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123840
Warwicker JRebennack S(2024)Support vector machines within a bivariate mixed-integer linear programming frameworkExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122998245:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122998
Wu JLong MWang ZZhang YYang CChen D(2024)Smoke recognition in steelmaking converter imagesExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122649241:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122649
Kumar VSundaram S(2024)Utilization of information from CNN feature maps for offline word-level writer identificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121709238:PAOnline publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121709
Ye YXu ZZhang JChen WShao Y(2024)Twin support vector quantile regression▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.121239237:PAOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121239
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents