Abstract
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Baird, H. (1990). Document image defect models. In Proceedings, IAPR Workshop on Syntactic and Structural Pattern Recognition (pp. 38-46). Murray Hill, NJ.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler, (Ed.), Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144-152). Pittsburgh, PA: ACM Press.
Bottou, L. & Vapnik, V. N. (1992). Local learning algorithms. Neural Computation, 4:6, 888-900.
Bromley, J. & Säckinger, E. (1991). Neural-network and k-nearest-neighbor classifiers. Technical Report 11359-910819-16TM, AT &T.
Burges, C. J. C. (1999). Geometry and invariance in kernel based methods. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.). Advances in kernel methods-support vector learning (pp. 89-116). Cambridge, MA: MIT Press.
Burges, C. J. C. & Schölkopf, B. (1997). Improving the accuracy and speed of support vector learning machines. In M. Mozer, M. Jordan, & T. Petsche, (Eds.). Advances in neural information processing systems 9 (pp. 375-381). Cambridge, MA: MIT Press.
Burl, M. C. (2000). NASA volcanoe data set at UCI KDD Archive. (See http:/kdd.ics.uci.edu/databases/ volcanoes/volcanoes.html).
Burl, M. C. (2001). Mining large image collections: Architecture and algorithms. In R. Grossman, C. Kamath, V. Kumar, & R. Namburu (Eds.). Data mining for scientific and engineering applications. Series in Massive Computing. Cambridge, MA: Kluwer Academic Publishers.
Burl, M. C., Asker, L., Smyth, P., Fayyad, U., Perona, P., Crumpler, L., & Aubele, J. (1998). Learning to recognize volcanoes on Venus. Machine Learning, 30, 165-194.
Chapelle, O. & Schölkopf, B. (2000). Incorporating invariances in nonlinear SVMs. Presented at the NIPS2000 workshop on Learning with Kernels.
Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273-297.
Crampin, M. & Pirani, F. A. E. (1986). Applicable differential geometry. Cambridge, UK: Cambridge University Press.
DeCoste, D. & Burl, M. C. (2000). Distortion-invariant recognition via jittered queries. In Computer Vision and Pattern Recognition (CVPR-2000).
DeCoste, D. & Wagstaff, K. (2000). Alpha seeding for support vector machines. In International Conference on Knowledge Discovery and Data Mining (KDD-2000).
Drucker, H., Schapire, R., & Simard, P. (1993). Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7:4, 705-719.
Girosi, F. (1998). An equivalence between sparse approximation and support vector machines. Neural Computation, 10:6, 1455-1480.
Haussler, D. (1999). Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99-10, Computer Science Department, University of California at Santa Cruz.
Jaakkola, T. S. & Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In M. S. Kearns, S. A. Solla, & D. A. Cohn (Eds.). Advances in neural information processing systems 11. Cambridge, MA: MIT Press.
Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector machines (Schölkopf et al., 1999).
Keerthi, S., Shevade, S., Bhattacharyya, C., & Murthy, K. (1999). Improvements to Platt's SMO algorithm for SVM classifier design. Technical Report CD-99-14, Dept. of Mechanical and Production Engineering, National University of Singapore.
Kimeldorf, G. S. & Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41, 495-502.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. J. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541-551.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278-2324.
LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Müller, U. A., Säckinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié, & P. Gallinari (Eds.). Proceedings ICANN'95-International Conference on Artificial Neural Networks vol. II, pp. 53-60). Nanterre, France: EC2.
Oliver, N., Schölkopf, B., & Smola, A. J. (2000). Natural regularization in SVMs. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 51-60). Cambridge, MA: MIT Press.
Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.). Advances in kernel methods-support vector learning (pp. 185-208). Cambridge, MA: MIT Press.
Poggio, T. & Girosi, F. (1989). A theory of networks for approximation and learning. Technical Report AIM-1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts.
Poggio, T. & Vetter, T. (1992). Recognition and structure from one 2D model view: Observations on prototypes, object classes and symmetries. A.I. Memo No. 1347, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
Schölkopf, B. (1997). Support vector learning. R. Oldenbourg Verlag,M¨unchen. Doktorarbeit, TU Berlin. Download: http://www.kernel-machines.org.
Schölkopf, B., Burges, C., & Smola, A. (1999). Advances in kernel methods: support vector machines. Cambridge, MA: MIT Press.
Schülkopf, B., Burges, C., & Vapnik, V. (1995). Extracting support data for a given task. In U. M. Fayyad, & R. Uthurusamy (Eds.). In Proceedings, First International Conference on Knowledge Discovery & Data Mining, Menlo Park: AAAI Press.
Schölkopf, B., Burges, C., & Vapnik, V. (1996). Incorporating invariances in support vector learning machines. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, & B. Sendhoff (Eds.). Artificial neural networks-ICANN'96 (pp. 47-52). Berlin: Springer. Lecture Notes in Computer Science (Vol. 1112).
Schölkopf, B., Simard, P., Smola, A., & Vapnik, V. (1998a). Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, & S. Solla (Eds.). Advances in neural information processing systems 10 (pp. 640-646). Cambridge, MA: MIT Press.
Schölkopf, B., Smola, A., & Müller, K.-R. (1998b). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299-1319.
Simard, P., LeCun, Y., & Denker, J. (1993). Efficient pattern recognition using a new transformation distance. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.). Advances in neural information processing systems 5. Proceedings of the 1992 Conference (pp. 50-58). San Mateo, CA: Morgan Kaufmann.
Simard, P., Victorri, B., LeCun, Y., & Denker, J. (1992). Tangent prop-a formalism for specifying selected invariances in an adaptive network. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.). Advances in neural information processing systems 4, San Mateo, CA: Morgan Kaufmann.
Smola, A., Schölkopf, B., & Müller, K.-R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11, 637-649.
Teow, L.-N. & Loe, K.-F. (2000). Handwritten digit recognition with a novel vision model that extracts linearly separable features. In computer vision and pattern recognition (CVPR-2000).
Vapnik, V. (1995). The nature of statistical learning theory. NY: Springer.
Vapnik, V. (1998). Statistical learning theory. NY: Wiley.
Watkins, C. (2000). Dynamic alignment kernels. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 39-50). Cambridge, MA: MIT Press.
Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., & Müller, K.-R. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16:9, 799-807.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Decoste, D., Schölkopf, B. Training Invariant Support Vector Machines. Machine Learning 46, 161–190 (2002). https://doi.org/10.1023/A:1012454411458
Issue Date:
DOI: https://doi.org/10.1023/A:1012454411458