Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification

Published: 01 September 2002 Publication History

Abstract

We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm one-class leveraging starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

References

[1]
H. Drucker R. Schapire and P.Y. Simard, “Boosting Performance in Neural Networks,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 7, pp. 705-719, 1993.]]
[2]
Y.A. LeCun L.D. Jackel L. Bottou A. Brunot C. Cortes J.S. Denker H. Drucker I. Guyon U.A. Müller E. Säckinger P.Y. Simard and V.N. Vapnik, “Comparison of Learning Algorithms for Handwritten Digit Recognition,” Proc. Int'l Conf. Artificial Neural Networks '95, F. Fogelman-Soulié and P. Gallinari, eds., vol. II, pp. 53-60, 1995.]]
[3]
R. Maclin and D. Opitz, “An Empirical Evaluation of Bagging and Boosting,” Proc. 14th Nat'l Conf. Artifical Intelligence, pp. 546-551, 1997.]]
[4]
H. Schwenk and Y. Bengio, “AdaBoosting Neural Networks,” Proc. Int'l Conf. Artificial Neural Networks '97, W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, eds., vol. 1327, pp. 967-972, 1997.]]
[5]
E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithm: Bagging, Boosting and Variants,” Machine Learning, vol. 36, pp. 105-142, 1999.]]
[6]
T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning, vol. 40, no. 2, 1999.]]
[7]
N. Duffy and D.P. Helmbold, “Leveraging for Regression,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 208-219, 2000.]]
[8]
G. Rätsch, M. Warmuth S. Mika T. Onoda S. Lemm and K.-R. Müller, “Barrier Boosting,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 170-179, 2000.]]
[9]
A. Demiriz K.P. Bennett and J. Shawe-Taylor, “Linear Programming Boosting via Column Generation,” J. Machine Learning Research, 2001 (special issue on support vector machines and kernel methods).]]
[10]
S. Haykin, Neural Networks: A Comprehensive Foundation. New York: Macmillan, 1994.]]
[11]
C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.]]
[12]
“Neural Networks: Tricks of the Trade,” Lecture Notes in Computer Science, G. Orr, and K.-R.Müller, eds., vol. 1524, 1998.]]
[13]
B.E. Boser I.M. Guyon and V.N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proc. Fifth Ann. ACM Workshop Computational Learning Theory, D. Haussler, ed., pp. 144-152, 1992.]]
[14]
V.N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer Verlag, 1995.]]
[15]
B. Schölkopf, Support Vector Learning. Munich: Oldenbourg Verlag, 1997.]]
[16]
V.N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.]]
[17]
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, U.K.: Cambridge Univ. Press, 2000.]]
[18]
B. Schölkopf and A.J. Smola, Learning with Kernels. Cambridge, Mass.: MIT Press, 2002.]]
[19]
C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Knowledge Discovery and Data Mining, vol. 2, no. 2, pp. 121-167, 1998.]]
[20]
K.-R. Müller S. Mika G. Rätsch K. Tsuda and B. Schölkopf, “An Introduction to Kernel-Based Learning Algorithms,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 181-201, 2001.]]
[21]
D.G. Luenberger, Linear and Nonlinear Programming, second ed. Reading, Mass.: Addison-Wesley, May 1984.]]
[22]
B. Schölkopf J. Platt J. Shawe-Taylor A.J. Smola and R.C. Williamson, “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, vol. 13, no. 7, pp. 1443-1471, 2001.]]
[23]
D. Tax and R. Duin, “Data Domain Description bySupport Vectors,” Proc. European Symp. Artificial Neural Network, M. Verleysen, ed., pp. 251-256, 1999.]]
[24]
C. Campbell and K.P. Bennett, “A Linear Programming Approach to Novelty Detection,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 395-401, 2001.]]
[25]
R.E. Schapire Y. Freund P.L. Bartlett and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, Oct. 1998.]]
[26]
Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,” J. Japanese Soc. Artificial Intelligence, vol. 14, no. 5, pp. 771-780, Sept. 1999. (Appeared in Japanese, translation by Naoki Abe.)]]
[27]
R.E. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-Rated Predictions,” Proc. Ann. Conf. Computer Learning Theory '98, pp. 80-91, 1998.]]
[28]
O.L. Mangasarian, “Arbitrary-Norm Separating Plane,” Operation Research Letters, vol. 24, no. 1, pp. 15-23, 1999.]]
[29]
L.G. Valiant, “A Theory of the Learnable,” Comm. ACM, vol. 27, no. 11, pp. 1134-1142, Nov. 1984.]]
[30]
R.E. Schapire, “The Design and Analysis of Efficient Learning Algorithms,” PhD thesis, MIT Press, 1992.]]
[31]
Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Proc. EuroCOLT: European Conf. Computational Learning Theory, 1994.]]
[32]
L. Breiman, “Prediction Games and Arcing Algorithms,” Neural Computation, vol. 11, no. 7, pp. 1493-1518, 1999. (Also Technical Report 504, Statistics Dept., Univ. of Calif., Berkeley.)]]
[33]
G. Rätsch and M.K. Warmuth, “Marginal Boosting,” NeuroCOLT2 Technical Report 97, Royal Holloway College, London, July 2001. (extended version accepted for COLT '02).]]
[34]
Y. Freund and R. Schapire, “Game Theory, On-Line Prediction and Boosting,” Proc. Ann. Conf. Computer Learning Theory, pp. 325-332, 1996.]]
[35]
G. Rätsch, “Robust Boosting via Convex Optimization,” PhD thesis, Univ. of Potsdam, 2001. http://mlg.anu.edu.au/~raetsch/thesis.ps.gz.]]
[36]
K.P. Bennett and O.L. Mangasarian, “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets,” Optimization Methods and Software, vol. 1, pp. 23-34, 1992.]]
[37]
S. Chen D. Donoho and M. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Scientific Computing, vol. 20, no. 1, pp. 33-61, 1999.]]
[38]
P. Bradley O. Mangasarian and J. Rosen, “Parsimonious Least Norm Approximation,” Computational Optimization and Applications, vol. 11, no. 1, pp. 5-21, 1998.]]
[39]
G. Rätsch, T. Onoda and K.-R. Müller, “Soft Margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287-320, Mar. 2001. (Also NeuroCOLT Technical Report NC-TR-1998-021.)]]
[40]
O.L. Mangasarian, “Mathematical Programming in Data Mining,” Data Mining and Knowledge Discovery, vol. 42, no. 1, pp. 183-201, 1997.]]
[41]
B. Schölkopf A.J. Smola and K.-R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, pp. 1299-1319, 1998.]]
[42]
G. Rätsch B. Schölkopf A.J. Smola S. Mika T. Onoda and K.-R. Müller, “Robust Ensemble Learning,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 207-219, Cambridge, Mass.: MIT Press, 2000.]]
[43]
R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, no. 2, pp. 197-227, 1990.]]
[44]
N. Duffy and D.P. Helmbold, “A Geometric Approach to Leveraging Weak Learners,” Proc. Computational Learning Theory: Fourth European Conf. (EuroCOLT '99), P. Fischer and H.U. Simon, eds., pp. 18-33, Mar. 1999.]]
[45]
A.J. Grove and D. Schuurmans, “Boosting in the Limit: Maximizing the Margin of Learned Ensembles,” Proc. 15th Nat'l Conf. Artifical Intelligence, 1998.]]
[46]
J. Kivinen and M. Warmuth, “Boosting as Entropy Projection,” Proc. 12th Ann. Conf. Computer Learning Theory, pp. 134-144, 1999.]]
[47]
J. Friedman T. Hastie and R.J. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, vol. 2, pp. 337-374, 2000. (with discussion, pp. 375-407, also technical report, Dept. of Statistics, Sequoia Hall, Stanford Univ.)]]
[48]
L. Mason J. Baxter P.L. Bartlett and M. Frean, “Functional Gradient Techniques for Combining Hypotheses,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 221-247, Cambridge, Mass.: MIT Press, 2000.]]
[49]
G. Rätsch S. Mika and M.K. Warmuth, “On the Convergence of Leveraging,” NeuroCOLT2 Technical Report 98, Royal Holloway College, London, Aug. 2001. (a shorter version accepted for NIPS '01).]]
[50]
T. Zhang, “A General Greedy Approximation Algorithm with Applications,” Advances in Neural Information Processing Systems, vol. 14, MIT Press, 2002 (in press).]]
[51]
L.M. Bregman, “The Relaxation Method for Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming,” USSR Computational Math. and Math. Physics, vol. 7, pp. 200-127, 1967.]]
[52]
Y. Censor and S.A. Zenios, “Parallel Optimization: Theory, Algorithms and Application,” Numerical Math. and Scientific Computation, Oxford Univ. Press, 1997.]]
[53]
M. Collins R.E. Schapire and Y. Singer, “Logistic Regression, Adaboost and Bregman Distances,” Proc. Ann. Conf. Computer Learning Theory, pp. 158-169, 2000.]]
[54]
S. Nash and A. Sofer, Linear and Nonlinear Programming. New York: McGraw-Hill, 1996.]]
[55]
G. Rätsch A. Demiriz and K. Bennett, “Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces,” Machine Learning, vol. 48, nos. 1-3, pp. 193-221, 2002. (Also NeuroCOLT2 Technical Report NC-TR-2000-085.)]]
[56]
R. Cominetti and J.-P. Dussault, “A Stable Exponential Penalty Algorithm with Superlinear Convergence,” J. Application Theory and Optimization, vol. 83, no. 2, Nov. 1994.]]
[57]
M. Doljansky and M. Teboulle, “An Interior Proximal Algorithm and the Exponential Multiplier Method for Semidefinite Programming,” SIAM J. Optimization, vol. 9, no. 1, pp. 1-13, 1998.]]
[58]
L. Mosheyev and M. Zibulevsky, “Penalty/Barrier Multiplier Algorithm for Semidefinite Programming,” Optimization Methods and Software, 1999.]]
[59]
Z.-Q. Luo and P. Tseng, “On the Convergence of Coordinate Descent Method for Convex Differentiable Minimization,” J. Optimization Theory and Applications, vol. 72, no. 1, pp. 7-35, 1992.]]
[60]
S. Della Pietra V. Della Pietra and J. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Apr. 1997.]]
[61]
T. Poggio and F. Girosi, “Extensions of a Theory of Networks for Approximation and Learning: Dimensionality Reduction and Clustering,” Technical Report AIM-1167, MIT-AI Lab, Mar. 1990.]]
[62]
A.J. Smola, “Learning with Kernels,” PhD thesis, Technische Universität Berlin, 1998.]]
[63]
S.L. Salzberg, “A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA,” Computational Applied Bioscience, vol. 13, no. 4, pp. 365-376, 1997.]]
[64]
S. Sonnenburg G. Rätsch A. Jagota and K. -R. Müller, “Splice-Site Recognition with Support Vector Machines,” to be published in Proc. Int'l Conf. Artificial Neural Networks '02, 2002.]]
[65]
P. Hayton B. Schölkopf L. Tarassenko and P. Anuzis, “Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 946-952, 2001.]]

Cited By

View all
  • (2022)AviSense: A Real-time System for Detection, Classification, and Analysis of Aviation SignalsACM Transactions on Sensor Networks10.1145/352608919:1(1-35)Online publication date: 8-Dec-2022
  • (2021)Variance error of multi-classification based anomaly detection for time series dataJournal of Computational Methods in Sciences and Engineering10.3233/JCM-20469921:4(875-890)Online publication date: 1-Jan-2021
  • (2019)Data CleaningundefinedOnline publication date: 9-Jul-2019
  • Show More Cited By

Index Terms

  1. Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
      IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 24, Issue 9
      September 2002
      144 pages

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 01 September 2002

      Author Tags

      1. Boosting
      2. SVMs
      3. novelty detection.
      4. one-class classification
      5. unsupervised learning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)AviSense: A Real-time System for Detection, Classification, and Analysis of Aviation SignalsACM Transactions on Sensor Networks10.1145/352608919:1(1-35)Online publication date: 8-Dec-2022
      • (2021)Variance error of multi-classification based anomaly detection for time series dataJournal of Computational Methods in Sciences and Engineering10.3233/JCM-20469921:4(875-890)Online publication date: 1-Jan-2021
      • (2019)Data CleaningundefinedOnline publication date: 9-Jul-2019
      • (2018)Local ensemble learning from imbalanced and noisy data for word sense disambiguationPattern Recognition10.1016/j.patcog.2017.10.02878:C(103-119)Online publication date: 1-Jun-2018
      • (2017)A-PNRProceedings of the 7th International Conference on Computer and Communication Technology10.1145/3154979.3154999(106-110)Online publication date: 24-Nov-2017
      • (2017)Selecting locally specialised classifiers for one-class classification ensemblesPattern Analysis & Applications10.1007/s10044-015-0505-z20:2(427-439)Online publication date: 1-May-2017
      • (2016)Dynamic classifier selection for one-class classificationKnowledge-Based Systems10.1016/j.knosys.2016.05.054107:C(43-53)Online publication date: 1-Sep-2016
      • (2016)Unsupervised Parameter Estimation for One-Class Support Vector MachinesProceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 965210.1007/978-3-319-31750-2_15(183-195)Online publication date: 19-Apr-2016
      • (2016)Intrusion detection in mobile ad hoc networksSecurity and Communication Networks10.1002/sec.14849:14(2484-2556)Online publication date: 25-Sep-2016
      • (2015)Comprehensive outlier detection in wireless sensor network with fast optimization algorithm of classification modelInternational Journal of Distributed Sensor Networks10.1155/2015/3987612015(104-104)Online publication date: 1-Jan-2015
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media