Article

Trading convexity for scalability

Authors:

Ronan Collobert,

Léon BottouAuthors Info & Claims

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 201 - 208

https://doi.org/10.1145/1143844.1143870

Published: 25 June 2006 Publication History

Abstract

Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs.

References

[1]

Bengio, Y., Roux, N. L., Vincent, P., Delalleau, O., & Marcotte, P. (2006). Convex neural networks. In Y. Weiss, B. Schöölkopf and J. Platt (Eds.), Advances in neural information processing systems 18, 123--130. Cambridge, MA: MIT Press.

[2]

Bennett, K., & Demiriz, A. (1998). Semi-supervised support vector machines. In M. S. Kearns, S. A. Solla and D. A. Cohn (Eds.), Advances in neural information processing systems 12, 368--374. Cambridge, MA: MIT Press.

Digital Library

[3]

Bie, T. D., & Cristianini, N. (2004). Convex methods for transduction. In S. Thrun, L. Saul and B. Schöölkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.

[4]

Chapelle, O., & Zien, A. (2005). Semi-supervised classification by low density separation. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics.

[5]

Ciarlet, P. G. (1990). Introduction à l'analyse numérique matricielle et à l'optimisation. Masson.

[6]

Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition, vol. 31 of Applications of mathematics. New York: Springer.

[7]

Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning (pp. 148--146). Morgan Kaufmann.

[8]

Fung, G., & Mangasarian, O. (2001). Semi-supervised support vector machines for unlabeled data classification. In Optimisation methods and software, 1--14. Boston: Kluwer Academic Publishers.

[9]

Joachims, T. (1999). Transductive inference for text classification using support vector machines. International Conference on Machine Learning, ICML.

Digital Library

[10]

Krause, N., & Singer, Y. (2004). Leveraging the margin more carefully. International Conference on Machine Learning, ICML.

Digital Library

[11]

Le Thi, H. A. (1994). Analyse numérique des algorithmes de l'optimisation d.c. approches locales et globale. codes et simulations numériques en grande dimension. applications. Doctoral dissertation, INSA, Rouen.

[12]

LeCun, Y., Bottou, L., Orr, G. B., & Müüller, K.-R. (1998). Efficient backprop. In G. Orr and K.-R. Müller (Eds.), Neural networks: Tricks of the trade, 9--50. Springer.

Digital Library

[13]

Lewis, D. D., Yang, Y., Rose, T., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361--397.

Digital Library

[14]

Liu, Y., Shen, X., & Doss, H. (2005). Multicategory ψψ-learning and support vector machine: Computational tools. Journal of Computational & Graphical Statistics, 14, 219--236.

[15]

Mason, L., Bartlett, P. L., & Baxter, J. (2000). Improved generalization through explicit optimization of margins. Machine Learning, 38, 243--255.

Digital Library

[16]

Pérez-Cruz, F., Navia-Vázquez, A., Figueiras-Vidal, A. R., & Artéés-Rodríguez, A. (2002). Empirical risk minimization for support vector classifiers. IEEE Transactions on Neural Networks, 14, 296--303.

Digital Library

[17]

Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges and A. Smola (Eds.), Advances in kernel methods. The MIT Press.

Digital Library

[18]

Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge, MA: MIT Press.

[19]

Shen, X., Tseng, G. C., Zhang, X., & Wong, W. H. (2003). On (psi)-learning. Journal of the American Statistical Association, 98, 724--734.

[20]

Smola, A. J., Vishwanathan, S. V. N., & Hofmann, T. (2005). Kernel methods for missing variables. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics.

[21]

Steinwart, I. (2003). Sparseness of support vector machines. Journal of Machine Learning Research, 4, 1071--1105.

Digital Library

[22]

Steinwart, I., & Scovel, C. (2005). Fast rates to bayes for kernel machines. In L. K. Saul, Y. Weiss and L. Bottou (Eds.), Advances in neural information processing systems 17, 1345--1352. Cambridge, MA: MIT Press.

[23]

Vapnik, V. (1995). The nature of statistical learning theory. Springer. Second edition.

Digital Library

[24]

Xu, L., Neufeld, J., Larson, B., & Schuurmans, D. (2005). Maximum margin clustering. In L. K. Saul, Y. Weiss and L. Bottou (Eds.), Advances in neural information processing systems 17, 1537--1544. Cambridge, MA: MIT Press.

[25]

Yuille, A. L., & Rangarajan, A. (2002). The concave-convex procedure (CCCP). Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press.

Digital Library

Cited By

Zhai ZGu BDeng CHuang H(2025)Global Model Selection via Solution Paths for Robust Support Vector MachineIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.334676547:3(1331-1347)Online publication date: Mar-2025
https://doi.org/10.1109/TPAMI.2023.3346765
Zhang YWang DZhang L(2025)Power System Transient Stability Assessment Based on Double-Stage Support Vector MachineThe Proceedings of the 19th Annual Conference of China Electrotechnical Society10.1007/978-981-96-1379-3_24(223-234)Online publication date: 7-Jan-2025
https://doi.org/10.1007/978-981-96-1379-3_24
Tran CZhu KVan Hentenryck PFioretto FLarson K(2024)On the effects of fairness to adversarial vulnerabilityProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/58(521-529)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/58
Show More Cited By

Index Terms

Trading convexity for scalability

Recommendations

Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Support Vector Machines (SVMs) are among the most popular and successful classification algorithms. Kernel SVMs often reach state-of-the-art accuracies, but suffer from the curse of kernelization due to linear model growth with data size on noisy data. ...
Terracini convexity
Abstract
We present a generalization of the notion of neighborliness to non-polyhedral convex cones. Although a definition of neighborliness is available in the non-polyhedral case in the literature, it is fairly restrictive as it requires all the low-...
Extension of M-Convexity and L-Convexity to Polyhedral Convex Functions

The concepts of M-convex and L-convex functions were proposed by Murota in 1996 as two mutually conjugate classes of discrete functions over integer lattice points. M/L-convex functions are deeply connected with the well-solvability in nonlinear ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '06: Proceedings of the 23rd international conference on Machine learning

June 2006

1154 pages

ISBN:1595933832

DOI:10.1145/1143844

Program Chairs:
William Cohen,
Andrew Moore

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

192
Total Citations
View Citations
958
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhai ZGu BDeng CHuang H(2025)Global Model Selection via Solution Paths for Robust Support Vector MachineIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.334676547:3(1331-1347)Online publication date: Mar-2025
https://doi.org/10.1109/TPAMI.2023.3346765
Zhang YWang DZhang L(2025)Power System Transient Stability Assessment Based on Double-Stage Support Vector MachineThe Proceedings of the 19th Annual Conference of China Electrotechnical Society10.1007/978-981-96-1379-3_24(223-234)Online publication date: 7-Jan-2025
https://doi.org/10.1007/978-981-96-1379-3_24
Tran CZhu KVan Hentenryck PFioretto FLarson K(2024)On the effects of fairness to adversarial vulnerabilityProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/58(521-529)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/58
Sonbhadra SAgarwal SNagabhushan P(2024)Pinball-OCSVM for Early-Stage COVID-19 Diagnosis with Limited Posteroanterior Chest X-Ray ImagesInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142457002738:03Online publication date: 1-Apr-2024
https://doi.org/10.1142/S0218001424570027
Le Thi HLuu HDinh T(2024)Online Stochastic DCA With Applications to Principal Component AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321355835:5(7035-7047)Online publication date: May-2024
https://doi.org/10.1109/TNNLS.2022.3213558
Fu SWang XTang JLan STian Y(2024)Generalized robust loss functions for machine learningNeural Networks10.1016/j.neunet.2023.12.013171(200-214)Online publication date: Mar-2024
https://doi.org/10.1016/j.neunet.2023.12.013
Hou ZTang JLi YFu STian Y(2024)MVQSInformation Sciences: an International Journal10.1016/j.ins.2024.120467675:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120467
Tang LZhao PPan ZDuan XPardalos P(2024)A two-stage denoising framework for zero-shot learning with noisy labelsInformation Sciences10.1016/j.ins.2023.119852654(119852)Online publication date: Jan-2024
https://doi.org/10.1016/j.ins.2023.119852
Tang JLiu BFu STian YKou G(2024)Advancing robust regression: Addressing asymmetric noise with the BLINEX loss functionInformation Fusion10.1016/j.inffus.2024.102463110(102463)Online publication date: Oct-2024
https://doi.org/10.1016/j.inffus.2024.102463
Huang LShao YLv XLi C(2024)Large-scale robust regression with truncated loss via majorization-minimization algorithmEuropean Journal of Operational Research10.1016/j.ejor.2024.04.028319:2(494-504)Online publication date: Dec-2024
https://doi.org/10.1016/j.ejor.2024.04.028
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten