research-article

Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification

Authors:

Gunnar Rätsch,

Sebastian Mika,

Bernhard Schölkopf,

Klaus-Robert MüllerAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 24, Issue 9

Pages 1184 - 1199

https://doi.org/10.1109/TPAMI.2002.1033211

Published: 01 September 2002 Publication History

Abstract

We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm one-class leveraging starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

References

[1]

H. Drucker R. Schapire and P.Y. Simard, “Boosting Performance in Neural Networks,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 7, pp. 705-719, 1993.]]

[2]

Y.A. LeCun L.D. Jackel L. Bottou A. Brunot C. Cortes J.S. Denker H. Drucker I. Guyon U.A. Müller E. Säckinger P.Y. Simard and V.N. Vapnik, “Comparison of Learning Algorithms for Handwritten Digit Recognition,” Proc. Int'l Conf. Artificial Neural Networks '95, F. Fogelman-Soulié and P. Gallinari, eds., vol. II, pp. 53-60, 1995.]]

[3]

R. Maclin and D. Opitz, “An Empirical Evaluation of Bagging and Boosting,” Proc. 14th Nat'l Conf. Artifical Intelligence, pp. 546-551, 1997.]]

[4]

H. Schwenk and Y. Bengio, “AdaBoosting Neural Networks,” Proc. Int'l Conf. Artificial Neural Networks '97, W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, eds., vol. 1327, pp. 967-972, 1997.]]

Digital Library

[5]

E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithm: Bagging, Boosting and Variants,” Machine Learning, vol. 36, pp. 105-142, 1999.]]

Digital Library

[6]

T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning, vol. 40, no. 2, 1999.]]

Digital Library

[7]

N. Duffy and D.P. Helmbold, “Leveraging for Regression,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 208-219, 2000.]]

Digital Library

[8]

G. Rätsch, M. Warmuth S. Mika T. Onoda S. Lemm and K.-R. Müller, “Barrier Boosting,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 170-179, 2000.]]

Digital Library

[9]

A. Demiriz K.P. Bennett and J. Shawe-Taylor, “Linear Programming Boosting via Column Generation,” J. Machine Learning Research, 2001 (special issue on support vector machines and kernel methods).]]

Digital Library

[10]

S. Haykin, Neural Networks: A Comprehensive Foundation. New York: Macmillan, 1994.]]

Digital Library

[11]

C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.]]

Digital Library

[12]

“Neural Networks: Tricks of the Trade,” Lecture Notes in Computer Science, G. Orr, and K.-R.Müller, eds., vol. 1524, 1998.]]

[13]

B.E. Boser I.M. Guyon and V.N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proc. Fifth Ann. ACM Workshop Computational Learning Theory, D. Haussler, ed., pp. 144-152, 1992.]]

Digital Library

[14]

V.N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer Verlag, 1995.]]

Digital Library

[15]

B. Schölkopf, Support Vector Learning. Munich: Oldenbourg Verlag, 1997.]]

[16]

V.N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.]]

Digital Library

[17]

N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, U.K.: Cambridge Univ. Press, 2000.]]

Digital Library

[18]

B. Schölkopf and A.J. Smola, Learning with Kernels. Cambridge, Mass.: MIT Press, 2002.]]

[19]

C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Knowledge Discovery and Data Mining, vol. 2, no. 2, pp. 121-167, 1998.]]

Digital Library

[20]

K.-R. Müller S. Mika G. Rätsch K. Tsuda and B. Schölkopf, “An Introduction to Kernel-Based Learning Algorithms,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 181-201, 2001.]]

Digital Library

[21]

D.G. Luenberger, Linear and Nonlinear Programming, second ed. Reading, Mass.: Addison-Wesley, May 1984.]]

[22]

B. Schölkopf J. Platt J. Shawe-Taylor A.J. Smola and R.C. Williamson, “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, vol. 13, no. 7, pp. 1443-1471, 2001.]]

Digital Library

[23]

D. Tax and R. Duin, “Data Domain Description bySupport Vectors,” Proc. European Symp. Artificial Neural Network, M. Verleysen, ed., pp. 251-256, 1999.]]

[24]

C. Campbell and K.P. Bennett, “A Linear Programming Approach to Novelty Detection,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 395-401, 2001.]]

[25]

R.E. Schapire Y. Freund P.L. Bartlett and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, Oct. 1998.]]

[26]

Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,” J. Japanese Soc. Artificial Intelligence, vol. 14, no. 5, pp. 771-780, Sept. 1999. (Appeared in Japanese, translation by Naoki Abe.)]]

[27]

R.E. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-Rated Predictions,” Proc. Ann. Conf. Computer Learning Theory '98, pp. 80-91, 1998.]]

Digital Library

[28]

O.L. Mangasarian, “Arbitrary-Norm Separating Plane,” Operation Research Letters, vol. 24, no. 1, pp. 15-23, 1999.]]

Digital Library

[29]

L.G. Valiant, “A Theory of the Learnable,” Comm. ACM, vol. 27, no. 11, pp. 1134-1142, Nov. 1984.]]

Digital Library

[30]

R.E. Schapire, “The Design and Analysis of Efficient Learning Algorithms,” PhD thesis, MIT Press, 1992.]]

Digital Library

[31]

Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Proc. EuroCOLT: European Conf. Computational Learning Theory, 1994.]]

Digital Library

[32]

L. Breiman, “Prediction Games and Arcing Algorithms,” Neural Computation, vol. 11, no. 7, pp. 1493-1518, 1999. (Also Technical Report 504, Statistics Dept., Univ. of Calif., Berkeley.)]]

Digital Library

[33]

G. Rätsch and M.K. Warmuth, “Marginal Boosting,” NeuroCOLT2 Technical Report 97, Royal Holloway College, London, July 2001. (extended version accepted for COLT '02).]]

[34]

Y. Freund and R. Schapire, “Game Theory, On-Line Prediction and Boosting,” Proc. Ann. Conf. Computer Learning Theory, pp. 325-332, 1996.]]

Digital Library

[35]

G. Rätsch, “Robust Boosting via Convex Optimization,” PhD thesis, Univ. of Potsdam, 2001. http://mlg.anu.edu.au/~raetsch/thesis.ps.gz.]]

[36]

K.P. Bennett and O.L. Mangasarian, “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets,” Optimization Methods and Software, vol. 1, pp. 23-34, 1992.]]

[37]

S. Chen D. Donoho and M. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Scientific Computing, vol. 20, no. 1, pp. 33-61, 1999.]]

Digital Library

[38]

P. Bradley O. Mangasarian and J. Rosen, “Parsimonious Least Norm Approximation,” Computational Optimization and Applications, vol. 11, no. 1, pp. 5-21, 1998.]]

Digital Library

[39]

G. Rätsch, T. Onoda and K.-R. Müller, “Soft Margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287-320, Mar. 2001. (Also NeuroCOLT Technical Report NC-TR-1998-021.)]]

Digital Library

[40]

O.L. Mangasarian, “Mathematical Programming in Data Mining,” Data Mining and Knowledge Discovery, vol. 42, no. 1, pp. 183-201, 1997.]]

Digital Library

[41]

B. Schölkopf A.J. Smola and K.-R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, pp. 1299-1319, 1998.]]

Digital Library

[42]

G. Rätsch B. Schölkopf A.J. Smola S. Mika T. Onoda and K.-R. Müller, “Robust Ensemble Learning,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 207-219, Cambridge, Mass.: MIT Press, 2000.]]

[43]

R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, no. 2, pp. 197-227, 1990.]]

Digital Library

[44]

N. Duffy and D.P. Helmbold, “A Geometric Approach to Leveraging Weak Learners,” Proc. Computational Learning Theory: Fourth European Conf. (EuroCOLT '99), P. Fischer and H.U. Simon, eds., pp. 18-33, Mar. 1999.]]

Digital Library

[45]

A.J. Grove and D. Schuurmans, “Boosting in the Limit: Maximizing the Margin of Learned Ensembles,” Proc. 15th Nat'l Conf. Artifical Intelligence, 1998.]]

Digital Library

[46]

J. Kivinen and M. Warmuth, “Boosting as Entropy Projection,” Proc. 12th Ann. Conf. Computer Learning Theory, pp. 134-144, 1999.]]

Digital Library

[47]

J. Friedman T. Hastie and R.J. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, vol. 2, pp. 337-374, 2000. (with discussion, pp. 375-407, also technical report, Dept. of Statistics, Sequoia Hall, Stanford Univ.)]]

[48]

L. Mason J. Baxter P.L. Bartlett and M. Frean, “Functional Gradient Techniques for Combining Hypotheses,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 221-247, Cambridge, Mass.: MIT Press, 2000.]]

[49]

G. Rätsch S. Mika and M.K. Warmuth, “On the Convergence of Leveraging,” NeuroCOLT2 Technical Report 98, Royal Holloway College, London, Aug. 2001. (a shorter version accepted for NIPS '01).]]

[50]

T. Zhang, “A General Greedy Approximation Algorithm with Applications,” Advances in Neural Information Processing Systems, vol. 14, MIT Press, 2002 (in press).]]

[51]

L.M. Bregman, “The Relaxation Method for Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming,” USSR Computational Math. and Math. Physics, vol. 7, pp. 200-127, 1967.]]

[52]

Y. Censor and S.A. Zenios, “Parallel Optimization: Theory, Algorithms and Application,” Numerical Math. and Scientific Computation, Oxford Univ. Press, 1997.]]

Digital Library

[53]

M. Collins R.E. Schapire and Y. Singer, “Logistic Regression, Adaboost and Bregman Distances,” Proc. Ann. Conf. Computer Learning Theory, pp. 158-169, 2000.]]

Digital Library

[54]

S. Nash and A. Sofer, Linear and Nonlinear Programming. New York: McGraw-Hill, 1996.]]

[55]

G. Rätsch A. Demiriz and K. Bennett, “Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces,” Machine Learning, vol. 48, nos. 1-3, pp. 193-221, 2002. (Also NeuroCOLT2 Technical Report NC-TR-2000-085.)]]

Digital Library

[56]

R. Cominetti and J.-P. Dussault, “A Stable Exponential Penalty Algorithm with Superlinear Convergence,” J. Application Theory and Optimization, vol. 83, no. 2, Nov. 1994.]]

Digital Library

[57]

M. Doljansky and M. Teboulle, “An Interior Proximal Algorithm and the Exponential Multiplier Method for Semidefinite Programming,” SIAM J. Optimization, vol. 9, no. 1, pp. 1-13, 1998.]]

Digital Library

[58]

L. Mosheyev and M. Zibulevsky, “Penalty/Barrier Multiplier Algorithm for Semidefinite Programming,” Optimization Methods and Software, 1999.]]

[59]

Z.-Q. Luo and P. Tseng, “On the Convergence of Coordinate Descent Method for Convex Differentiable Minimization,” J. Optimization Theory and Applications, vol. 72, no. 1, pp. 7-35, 1992.]]

Digital Library

[60]

S. Della Pietra V. Della Pietra and J. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Apr. 1997.]]

Digital Library

[61]

T. Poggio and F. Girosi, “Extensions of a Theory of Networks for Approximation and Learning: Dimensionality Reduction and Clustering,” Technical Report AIM-1167, MIT-AI Lab, Mar. 1990.]]

Digital Library

[62]

A.J. Smola, “Learning with Kernels,” PhD thesis, Technische Universität Berlin, 1998.]]

[63]

S.L. Salzberg, “A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA,” Computational Applied Bioscience, vol. 13, no. 4, pp. 365-376, 1997.]]

[64]

S. Sonnenburg G. Rätsch A. Jagota and K. -R. Müller, “Splice-Site Recognition with Support Vector Machines,” to be published in Proc. Int'l Conf. Artificial Neural Networks '02, 2002.]]

Digital Library

[65]

P. Hayton B. Schölkopf L. Tarassenko and P. Anuzis, “Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 946-952, 2001.]]

Cited By

Baset ABecker CDerr KSarkar SKasera S(2022)AviSense: A Real-time System for Detection, Classification, and Analysis of Aviation SignalsACM Transactions on Sensor Networks10.1145/352608919:1(1-35)Online publication date: 8-Dec-2022
Wang BJiang TZhou XMa BZhao FWang Y(2021)Variance error of multi-classification based anomaly detection for time series dataJournal of Computational Methods in Sciences and Engineering10.3233/JCM-20469921:4(875-890)Online publication date: 1-Jan-2021
Ilyas IChu X(2019)Data CleaningundefinedOnline publication date: 9-Jul-2019
Show More Cited By

Index Terms

Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification
1. Computing methodologies
  1. Machine learning
2. Mathematics of computing
  1. Mathematical software

Recommendations

Multi-class Support Vector Machine (SVM) Classifiers -- An Application in Hypothyroid Detection and Classification
BIC-TA '11: Proceedings of the 2011 Sixth International Conference on Bio-Inspired Computing: Theories and Applications

The paper presents a Multi-class Support Vector Machine classifier and its application to hypothyroid detection and classification. Support Vector Machines (SVM) have been well known method in the machine learning community for binary classification ...
Bootstrap Sampling Based Data Cleaning and Maximum Entropy SVMs for Large Datasets
ICTAI '12: Proceedings of the 2012 IEEE 24th International Conference on Tools with Artificial Intelligence - Volume 01

Support Vector Machines (SVMs) is a popular machine learning algorithm based on Statistical Learning Theory (SLT). However, traditional solutions suffer from $O(n^{2})$ time complexity. In this paper, a novel two-stage informative pattern abstraction ...
One-class support higher order tensor machine classifier

One-class classification problems have been widely encountered in the fields that the negative class patterns are difficult to be collected, and the one-class support vector machine is one of the popular algorithms for solving them. However, one-class ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 24, Issue 9

September 2002

144 pages

ISSN:0162-8828

Issue’s Table of Contents

Copyright © Copyright © 2002 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 September 2002

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Baset ABecker CDerr KSarkar SKasera S(2022)AviSense: A Real-time System for Detection, Classification, and Analysis of Aviation SignalsACM Transactions on Sensor Networks10.1145/352608919:1(1-35)Online publication date: 8-Dec-2022
Wang BJiang TZhou XMa BZhao FWang Y(2021)Variance error of multi-classification based anomaly detection for time series dataJournal of Computational Methods in Sciences and Engineering10.3233/JCM-20469921:4(875-890)Online publication date: 1-Jan-2021
Ilyas IChu X(2019)Data CleaningundefinedOnline publication date: 9-Jul-2019
Krawczyk BMcInnes B(2018)Local ensemble learning from imbalanced and noisy data for word sense disambiguationPattern Recognition10.1016/j.patcog.2017.10.02878:C(103-119)Online publication date: 1-Jun-2018
Sharma SKumar PKumar K(2017)A-PNRProceedings of the 7th International Conference on Computer and Communication Technology10.1145/3154979.3154999(106-110)Online publication date: 24-Nov-2017
Krawczyk BCyganek B(2017)Selecting locally specialised classifiers for one-class classification ensemblesPattern Analysis & Applications10.1007/s10044-015-0505-z20:2(427-439)Online publication date: 1-May-2017
Krawczyk BWoźniak M(2016)Dynamic classifier selection for one-class classificationKnowledge-Based Systems10.1016/j.knosys.2016.05.054107:C(43-53)Online publication date: 1-Sep-2016
Ghafoori ZRajasegarar SErfani SKarunasekera SLeckie C(2016)Unsupervised Parameter Estimation for One-Class Support Vector MachinesProceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 965210.1007/978-3-319-31750-2_15(183-195)Online publication date: 19-Apr-2016
Kumar SDutta K(2016)Intrusion detection in mobile ad hoc networksSecurity and Communication Networks10.1002/sec.14849:14(2484-2556)Online publication date: 25-Sep-2016
Yao HCao HLi J(2015)Comprehensive outlier detection in wireless sensor network with fast optimization algorithm of classification modelInternational Journal of Distributed Sensor Networks10.1155/2015/3987612015(104-104)Online publication date: 1-Jan-2015
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents