Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3305381.3305448guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Second-order kernel online convex optimization with adaptive sketching

Published: 06 August 2017 Publication History

Abstract

Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only O(t) time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal O(√T) regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve O(log(Det(K))) regret, which we show scales as O(deff log T), where deff is the effective dimension of the problem and is usually much smaller than O(√T). The main drawback of second-order methods is their much higher O(t2) space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves O(deff log T) regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix Kt, and show that for a chosen parameter γ ≤ 1 our Sketched-KONS reduces the space and time complexity by a factor of γ2 to O(t2γ2) space and time per iteration, while incurring only 1/γ times more regret.

References

[1]
Alaoui, Ahmed El and Mahoney, Michael W. Fast randomized kernel methods with statistical guarantees. In Neural Information Processing Systems, 2015.
[2]
Calandriello, Daniele, Lazaric, Alessandro, and Valko, Michal. Distributed sequential sampling for kernel matrix approximation. In International Conference on Artificial Intelligence and Statistics, 2017.
[3]
Cavallanti, Giovanni, Cesa-Bianchi, Nicolo, and Gentile, Claudio. Tracking the best hyperplane with a simple budget perceptron. Machine Learning, 69(2-3):143-167, 2007.
[4]
Cesa-Bianchi, Nicolo, Conconi, Alex, and Gentile, Claudio. A second-order perceptron algorithm. SIAM Journal on Computing, 34(3):640-668, 2005.
[5]
Cohen, Michael B, Musco, Cameron, and Pachocki, Jakub. Online row sampling. International Workshop on Approximation, Randomization, and Combinatorial Optimization, 2016.
[6]
Dekel, Ofer, Shalev-Shwartz, Shai, and Singer, Yoram. The forgetron: A kernel-based perceptron on a budget. SIAM Journal on Computing, 37(5):1342-1372, 2008.
[7]
Duchi, John, Hazan, Elad, and Singer, Yoram. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.
[8]
Gammerman, Alex, Kalnishkan, Yuri, and Vovk, Vladimir. On-line prediction with kernels and the complexity approximation principle. In Uncertainty in Artificial Intelligence, 2004.
[9]
Ghashami, Mina, Liberty, Edo, Phillips, Jeff M, and Woodruff, David P. Frequent directions: Simple and deterministic matrix sketching. SIAM Journal on Computing, 45(5):1762-1792, 2016a.
[10]
Ghashami, Mina, Perry, Daniel J, and Phillips, Jeff. Streaming kernel principal component analysis. In International Conference on Artificial Intelligence and Statistics, 2016b.
[11]
Hazan, Elad, Kalai, Adam, Kale, Satyen, and Agarwal, Amit. Logarithmic regret algorithms for online convex optimization. In Conference on Learning Theory, 2006.
[12]
Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
[13]
Kivinen, J., Smola, A.J., and Williamson, R.C. Online Learning with Kernels. IEEE Transactions on Signal Processing, 52(8), 2004.
[14]
Le, Quoc, Sarlós, Tamás, and Smola, Alex J. Fastfood - Approximating kernel expansions in loglinear time. In International Conference on Machine Learning, 2013.
[15]
Le, Trung, Nguyen, Tu, Nguyen, Vu, and Phung, Dinh. Dual Space Gradient Descent for Online Learning. In Neural Information Processing Systems, 2016.
[16]
Lu, Jing, Hoi, Steven C.H., Wang, Jialei, Zhao, Peilin, and Liu, Zhi-Yong. Large scale online kernel learning. Journal of Machine Learning Research, 17(47):1-43, 2016.
[17]
Luo, Haipeng, Agarwal, Alekh, Cesa-Bianchi, Nicolo, and Langford, John. Efficient second-order online learning via sketching. Neural Information Processing Systems, 2016.
[18]
Nesterov, Yurii and Nemirovskii, Arkadii. Interior-point polynomial algorithms in convex programming. Society for Industrial and Applied Mathematics, 1994.
[19]
Orabona, Francesco and Crammer, Koby. New adaptive algorithms for online classification. In Neural Information Processing Systems, 2010.
[20]
Orabona, Francesco, Keshet, Joseph, and Caputo, Barbara. The projectron: a bounded kernel-based perceptron. In International Conference on Machine learning, 2008.
[21]
Rudi, Alessandro, Camoriano, Raffaello, and Rosasco, Lorenzo. Less is more: Nyström computational regularization. In Neural Information Processing Systems, 2015.
[22]
Schölkopf, Bernhard and Smola, Alexander J. Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, 2001.
[23]
Srinivas, Niranjan, Krause, Andreas, Seeger, Matthias, and Kakade, Sham M. Gaussian process optimization in the bandit setting: No regret and experimental design. In International Conference on Machine Learning, 2010.
[24]
Tropp, Joel Aaron. Freedman's inequality for matrix martingales. Electronic Communications in Probability, 16:262-270, 2011.
[25]
Wang, Zhuang, Crammer, Koby, and Vucetic, Slobodan. Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale svm training. Journal of Machine Learning Research, 13(Oct):3103-3131, 2012.
[26]
Zhdanov, Fedor and Kalnishkan, Yuri. An identity for kernel ridge regression. In Algorithmic Learning Theory, 2010.
[27]
Zhu, C. and Xu, H. Online gradient descent in function space. ArXiv:1512.02394, 2015.
[28]
Zinkevich, Martin. Online convex programming and generalized infinitesimal gradient ascent. In International Conference on Machine Learning, 2003.

Cited By

View all
  • (2023)Nearly optimal algorithms with sublinear computational complexity for online kernel regressionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619223(19743-19766)Online publication date: 23-Jul-2023
  • (2021)Adversarial Kernel Sampling on Class-imbalanced Data StreamsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482227(2352-2362)Online publication date: 26-Oct-2021
  • (2020)Near-linear time Gaussian process optimization with adaptive batching and resparsificationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525059(1295-1305)Online publication date: 13-Jul-2020
  • Show More Cited By
  1. Second-order kernel online convex optimization with adaptive sketching

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70
      August 2017
      4208 pages

      Publisher

      JMLR.org

      Publication History

      Published: 06 August 2017

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 18 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Nearly optimal algorithms with sublinear computational complexity for online kernel regressionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619223(19743-19766)Online publication date: 23-Jul-2023
      • (2021)Adversarial Kernel Sampling on Class-imbalanced Data StreamsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482227(2352-2362)Online publication date: 26-Oct-2021
      • (2020)Near-linear time Gaussian process optimization with adaptive batching and resparsificationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525059(1295-1305)Online publication date: 13-Jul-2020
      • (2019)Efficient online learning with kernels for adversarial large scale problemsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455133(9432-9441)Online publication date: 8-Dec-2019
      • (2019)New Online Kernel Ridge Regression via Incremental Predictive SamplingProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358004(791-800)Online publication date: 3-Nov-2019
      • (2018)Statistical and computational trade-offs in kernel k-meansProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327608(9379-9389)Online publication date: 3-Dec-2018
      • (2017)Efficient second-order online kernel learning with adaptive embeddingProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295362(6142-6151)Online publication date: 4-Dec-2017

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media