Article

Free access

Second-order kernel online convex optimization with adaptive sketching

Authors:

Daniele Calandriello,

Alessandro Lazaric,

Michal ValkoAuthors Info & Claims

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

Pages 645 - 653

Published: 06 August 2017 Publication History

PDF eReader Publisher Site

Abstract

Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only O(t) time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal O(√T) regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve O(log(Det(K))) regret, which we show scales as O(d_eff log T), where d_eff is the effective dimension of the problem and is usually much smaller than O(√T). The main drawback of second-order methods is their much higher O(t²) space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves O(d_eff log T) regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix K_t, and show that for a chosen parameter γ ≤ 1 our Sketched-KONS reduces the space and time complexity by a factor of γ² to O(t²γ²) space and time per iteration, while incurring only 1/γ times more regret.

References

[1]

Alaoui, Ahmed El and Mahoney, Michael W. Fast randomized kernel methods with statistical guarantees. In Neural Information Processing Systems, 2015.

Digital Library

[2]

Calandriello, Daniele, Lazaric, Alessandro, and Valko, Michal. Distributed sequential sampling for kernel matrix approximation. In International Conference on Artificial Intelligence and Statistics, 2017.

[3]

Cavallanti, Giovanni, Cesa-Bianchi, Nicolo, and Gentile, Claudio. Tracking the best hyperplane with a simple budget perceptron. Machine Learning, 69(2-3):143-167, 2007.

Digital Library

[4]

Cesa-Bianchi, Nicolo, Conconi, Alex, and Gentile, Claudio. A second-order perceptron algorithm. SIAM Journal on Computing, 34(3):640-668, 2005.

Digital Library

[5]

Cohen, Michael B, Musco, Cameron, and Pachocki, Jakub. Online row sampling. International Workshop on Approximation, Randomization, and Combinatorial Optimization, 2016.

[6]

Dekel, Ofer, Shalev-Shwartz, Shai, and Singer, Yoram. The forgetron: A kernel-based perceptron on a budget. SIAM Journal on Computing, 37(5):1342-1372, 2008.

Digital Library

[7]

Duchi, John, Hazan, Elad, and Singer, Yoram. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.

Digital Library

[8]

Gammerman, Alex, Kalnishkan, Yuri, and Vovk, Vladimir. On-line prediction with kernels and the complexity approximation principle. In Uncertainty in Artificial Intelligence, 2004.

Digital Library

[9]

Ghashami, Mina, Liberty, Edo, Phillips, Jeff M, and Woodruff, David P. Frequent directions: Simple and deterministic matrix sketching. SIAM Journal on Computing, 45(5):1762-1792, 2016a.

[10]

Ghashami, Mina, Perry, Daniel J, and Phillips, Jeff. Streaming kernel principal component analysis. In International Conference on Artificial Intelligence and Statistics, 2016b.

[11]

Hazan, Elad, Kalai, Adam, Kale, Satyen, and Agarwal, Amit. Logarithmic regret algorithms for online convex optimization. In Conference on Learning Theory, 2006.

Digital Library

[12]

Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.

[13]

Kivinen, J., Smola, A.J., and Williamson, R.C. Online Learning with Kernels. IEEE Transactions on Signal Processing, 52(8), 2004.

Digital Library

[14]

Le, Quoc, Sarlós, Tamás, and Smola, Alex J. Fastfood - Approximating kernel expansions in loglinear time. In International Conference on Machine Learning, 2013.

Digital Library

[15]

Le, Trung, Nguyen, Tu, Nguyen, Vu, and Phung, Dinh. Dual Space Gradient Descent for Online Learning. In Neural Information Processing Systems, 2016.

Digital Library

[16]

Lu, Jing, Hoi, Steven C.H., Wang, Jialei, Zhao, Peilin, and Liu, Zhi-Yong. Large scale online kernel learning. Journal of Machine Learning Research, 17(47):1-43, 2016.

Digital Library

[17]

Luo, Haipeng, Agarwal, Alekh, Cesa-Bianchi, Nicolo, and Langford, John. Efficient second-order online learning via sketching. Neural Information Processing Systems, 2016.

Digital Library

[18]

Nesterov, Yurii and Nemirovskii, Arkadii. Interior-point polynomial algorithms in convex programming. Society for Industrial and Applied Mathematics, 1994.

[19]

Orabona, Francesco and Crammer, Koby. New adaptive algorithms for online classification. In Neural Information Processing Systems, 2010.

Digital Library

[20]

Orabona, Francesco, Keshet, Joseph, and Caputo, Barbara. The projectron: a bounded kernel-based perceptron. In International Conference on Machine learning, 2008.

Digital Library

[21]

Rudi, Alessandro, Camoriano, Raffaello, and Rosasco, Lorenzo. Less is more: Nyström computational regularization. In Neural Information Processing Systems, 2015.

Digital Library

[22]

Schölkopf, Bernhard and Smola, Alexander J. Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, 2001.

Digital Library

[23]

Srinivas, Niranjan, Krause, Andreas, Seeger, Matthias, and Kakade, Sham M. Gaussian process optimization in the bandit setting: No regret and experimental design. In International Conference on Machine Learning, 2010.

Digital Library

[24]

Tropp, Joel Aaron. Freedman's inequality for matrix martingales. Electronic Communications in Probability, 16:262-270, 2011.

[25]

Wang, Zhuang, Crammer, Koby, and Vucetic, Slobodan. Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale svm training. Journal of Machine Learning Research, 13(Oct):3103-3131, 2012.

Digital Library

[26]

Zhdanov, Fedor and Kalnishkan, Yuri. An identity for kernel ridge regression. In Algorithmic Learning Theory, 2010.

Digital Library

[27]

Zhu, C. and Xu, H. Online gradient descent in function space. ArXiv:1512.02394, 2015.

[28]

Zinkevich, Martin. Online convex programming and generalized infinitesimal gradient ascent. In International Conference on Machine Learning, 2003.

Digital Library

Cited By

Li JLiao SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Nearly optimal algorithms with sublinear computational complexity for online kernel regressionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619223(19743-19766)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619223
Yang PLi PDemartini GZuccon GCulpepper JHuang ZTong H(2021)Adversarial Kernel Sampling on Class-imbalanced Data StreamsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482227(2352-2362)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482227
Calandriello DCarratino LLazaric AValko MRosasco LDaumé HSingh A(2020)Near-linear time Gaussian process optimization with adaptive batching and resparsificationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525059(1295-1305)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525059
Show More Cited By

Second-order kernel online convex optimization with adaptive sketching
1. Computing methodologies
2. Theory of computation

Recommendations

Second-order global optimality conditions for convex composite optimization

In recent years second-order sufficient conditions of an isolated local minimizer for convex composite optimization problems have been established. In this paper, second-order optimality conditions are obtained of aglobal minimizer for convex composite ...
Second-order Karush---Kuhn---Tucker optimality conditions for set-valued optimization

In this paper, we propose the concept of a second-order composed contingent derivative for set-valued maps, discuss its relationship to the second-order contingent derivative and investigate some of its special properties. By virtue of the second-order ...
On connections between approximate second-order directional derivative and second-order Dini derivative for convex functions

For a real-valued convex functionf, the existence of the second-order Dini derivative assures that of the limit of the approximate second-order directional derivativefźź (x0;d, d) whenź ź 0+ and both values are the same. The aim of the present work is ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

August 2017

4208 pages

Publisher

JMLR.org

Publication History

Published: 06 August 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
87
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)15

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li JLiao SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Nearly optimal algorithms with sublinear computational complexity for online kernel regressionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619223(19743-19766)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619223
Yang PLi PDemartini GZuccon GCulpepper JHuang ZTong H(2021)Adversarial Kernel Sampling on Class-imbalanced Data StreamsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482227(2352-2362)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482227
Calandriello DCarratino LLazaric AValko MRosasco LDaumé HSingh A(2020)Near-linear time Gaussian process optimization with adaptive batching and resparsificationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525059(1295-1305)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525059
Jézéquell RGaillard PRudi AWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Efficient online learning with kernels for adversarial large scale problemsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455133(9432-9441)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3455133
Xu SZhang XLiao SZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)New Online Kernel Ridge Regression via Incremental Predictive SamplingProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358004(791-800)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358004
Calandriello DRosasco L(2018)Statistical and computational trade-offs in kernel k-meansProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327608(9379-9389)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327608
Calandriello DLazaric AValko M(2017)Efficient second-order online kernel learning with adaptive embeddingProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295362(6142-6151)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3295222.3295362

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten