research-article

Public Access

The computational power of optimization in online learning

Authors:

Tomer KorenAuthors Info & Claims

STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

Pages 128 - 141

https://doi.org/10.1145/2897518.2897536

Published: 19 June 2016 Publication History

Abstract

We consider the fundamental problem of prediction with expert advice where the experts are “optimizable”: there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to N experts in total O(√N) computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is Θ(N). These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle—i.e., an efficient empirical risk minimizer—allows to learn a finite hypothesis class of size N in time O(logN).

We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is Θ(√N), yielding again a quadratic improvement upon the oracle-free setting, where Θ(N) is known to be tight.

References

[1]

S. Aaronson. Lower bounds for local search by quantum arguments. SIAM Journal on Computing, 35(4):804– 824, 2006.

Digital Library

[2]

I. Adler. The equivalence of linear programs and zerosum games. International Journal of Game Theory, 42 (1):165–177, 2013.

Digital Library

[3]

A. Agarwal. Computational Trade-offs in Statistical Learning. PhD thesis, EECS Department, University of California, Berkeley, 2012.

Digital Library

[4]

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In Proceedings of the 31st International Conference on Machine Learning (ICML’14), pages 1638–1646, 2014.

[5]

D. Aldous. Minimization algorithms and random walk on the d-cube. The Annals of Probability, pages 403– 413, 1983.

[6]

S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(6):121–164, 2012.

[7]

B. Awerbuch and R. Kleinberg. Online linear optimization and adaptive routing. J. Comput. Syst. Sci., 74(1): 97–114, 2008.

Digital Library

[8]

A. Blum. Separating distribution-free and mistakebound learning models over the boolean domain. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science, pages 211–218. IEEE, 1990.

Digital Library

[9]

A. Blum and Y. Mansour. From external to internal regret. Journal of Machine Learning Research, 8:1307– 1324, 2007.

Digital Library

[10]

F. Brandt, F. Fischer, and P. Harrenstein. On the rate of convergence of fictitious play. Theory of Computing Systems, 53(1):41–52, 2013.

[11]

G. W. Brown. Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13 (1):374–376, 1951.

[12]

N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.

[13]

K. L. Clarkson, E. Hazan, and D. P. Woodruff. Sublinear optimization for machine learning. J. ACM, 59(5): 23:1–23:49, 2012.

Digital Library

[14]

V. Dani and T. P. Hayes. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pages 937–943. Society for Industrial and Applied Mathematics, 2006.

Digital Library

[15]

C. Daskalakis and Q. Pan. A counter-example to karlin’s strong conjecture for fictitious play. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 11–20. IEEE, 2014.

Digital Library

[16]

C. Daskalakis, A. Deckelbaum, and A. Kim. Nearoptimal no-regret algorithms for zero-sum games. In Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’11, pages 235–254, 2011.

Digital Library

[17]

M. Dud´ık, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. Efficient optimal learning for contextual bandits. In UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14-17, 2011, pages 169–178, 2011.

[18]

E. Even-Dar, S. M. Kakade, and Y. Mansour. Online markov decision processes. Math. Oper. Res., 34(3): 726–736, 2009.

Digital Library

[19]

E. Even-dar, Y. Mansour, and U. Nadav. On the convergence of regret minimization dynamics in concave games. In Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, STOC ’09, pages 523–532, New York, NY, USA, 2009. ACM.

Digital Library

[20]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, 1997.

Digital Library

[21]

Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79–103, 1999.

[22]

Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. Using and combining predictors that specialize. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 334–343. ACM, 1997.

Digital Library

[23]

E. Gofer, N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for branching experts. In COLT’13, pages 618–638, 2013.

[24]

M. D. Grigoriadis and L. G. Khachiyan. A sublineartime randomized approximation algorithm for matrix games. Operations Research Letters, 18:53–58, 1995.

Digital Library

[25]

J. Hannan. Approximation to bayes risk in repeated play. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games, volume 3, pages 97–139, 1957.

[26]

S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5): 1127–1150, 2000.

[27]

E. Hazan. Introduction to Online Convex Optimization. 2014. URL http://ocobook.cs.princeton.edu/.

[28]

E. Hazan and S. Kale. Online submodular minimization. J. Mach. Learn. Res., 13(1):2903–2922, 2012.

Digital Library

[29]

E. Hazan and T. Koren. The computational power of optimization in online learning. arXiv preprint arXiv:1504.02089, 2015.

[30]

E. Hazan, S. Kale, and S. Shalev-Shwartz. Near-optimal algorithms for online matrix prediction. In COLT, pages 38.1–38.13, 2012.

[31]

D. P. Helmbold and M. K. Warmuth. Learning permutations with exponential weights. J. Mach. Learn. Res., 10:1705–1736, 2009.

Digital Library

[32]

S. Kakade and A. T. Kalai. From batch to transductive online learning. In Advances in Neural Information Processing Systems, pages 611–618, 2006.

[33]

S. M. Kakade, A. T. Kalai, and K. Ligett. Playing games with approximation algorithms. SIAM J. Comput., 39(3):1088–1106, 2009.

Digital Library

[34]

A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.

Digital Library

[35]

R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma. Regret bounds for sleeping experts and bandits. Machine learning, 80(2-3):245–272, 2010.

Digital Library

[36]

N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108 (2):212–261, 1994. ISSN 0890-5401.

Digital Library

[37]

H. B. McMahan and A. Blum. Online geometric optimization in the bandit setting against an adaptive adversary. In Learning theory, pages 109–123. Springer, 2004.

[38]

Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103(1):127–152, 2005.

Digital Library

[39]

N. Nisan, T. Roughgarden, E. Tardos, and V. V. Vazirani. Algorithmic Game Theory. Cambridge University Press, New York, NY, USA, 2007. ISBN 0521872820.

[40]

J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54(2):296–301, 1951.

[41]

S. Shalev-Shwartz and N. Srebro. SVM optimization: Inverse dependence on training set size. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pages 928–935, 2008.

Digital Library

[42]

S. Shalev-Shwartz, O. Shamir, and E. Tromer. Using more data to speed-up training time. In AISTATS, volume 22, pages 1019–1027, 2012.

[43]

M. Warmuth and D. Kuzmin. Online variance minimization. In G. Lugosi and H. Simon, editors, Learning Theory, volume 4005 of Lecture Notes in Computer Science, pages 514–528. Springer Berlin Heidelberg, 2006.

Digital Library

[44]

M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928–936, 2003.

Cited By

Li YZhang CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Agnostic interactive imitation learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693249(29278-29323)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693249
Yu XRosing TGuo Y(2024)Evolve: Enhancing Unsupervised Continual Learning with Multiple Experts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00236(2355-2366)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00236
Golowich NMoitra ARohatgi D(2024)Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00117(1953-1967)Online publication date: 27-Oct-2024
https://doi.org/10.1109/FOCS61266.2024.00117
Show More Cited By

Index Terms

The computational power of optimization in online learning
1. Computing methodologies
  1. Machine learning
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Fast rates for nonparametric online learning: from realizability to learning in games
STOC 2022: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing

We study fast rates of convergence in the setting of nonparametric online regression, namely where regret is defined with respect to an arbitrary function class which has bounded complexity. Our contributions are two-fold:
- In the realizable setting of ...
Reinforcement Learning with Restrictions on the Action Set

Consider a two-player normal-form game repeated over time. We introduce an adaptive learning procedure, where the players only observe their own realized payoff at each stage. We assume that agents do not know their own payoff function and have no ...
From Duels to Battlefields: Computing Equilibria of Blotto and Other Games
In the well-studied Colonel Blotto game, players must divide a pool of troops among a set of battlefields with the goal of winning a majority. Despite the importance of this game, only a few solutions for special variants of the problem are known. We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

June 2016

1141 pages

ISBN:9781450341325

DOI:10.1145/2897518

General Chair:
Daniel Wichs
Northeastern, USA
,
Program Chair:
Yishay Mansour
Tel Aviv

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

STOC '16

Sponsor:

SIGACT

STOC '16: Symposium on Theory of Computing

June 19 - 21, 2016

MA, Cambridge, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25

Sponsor:
sigact

57th Annual ACM Symposium on Theory of Computing (STOC 2025)

June 23 - 27, 2025

Prague , Czech Republic

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
1,028
Total Downloads

Downloads (Last 12 months)156
Downloads (Last 6 weeks)23

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YZhang CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Agnostic interactive imitation learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693249(29278-29323)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693249
Yu XRosing TGuo Y(2024)Evolve: Enhancing Unsupervised Continual Learning with Multiple Experts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00236(2355-2366)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00236
Golowich NMoitra ARohatgi D(2024)Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00117(1953-1967)Online publication date: 27-Oct-2024
https://doi.org/10.1109/FOCS61266.2024.00117
Banihashem KHajiaghayi MShin SSpringer MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)An improved relaxation for oracle-efficient adversarial contextual banditsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669071(67471-67486)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669071
Liu HWei CZimmert JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Bypassing the simulatorProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668391(52086-52131)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668391
Block ASimchowitz MTedrake ROh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Smoothed online learning for prediction in piecewise affine systemsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667926(41663-41674)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667926
Hadiji HSachs Svan Even TKoolen WOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Towards characterizing the first-order query complexity of learning (approximate) nash equilibria in zero-sum matrix gamesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666708(13356-13373)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666708
Peng BRubinstein A(2023)Near Optimal Memory-Regret Tradeoff for Online Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00069(1171-1194)Online publication date: 6-Nov-2023
https://doi.org/10.1109/FOCS57990.2023.00069
Li YZhang CKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)On efficient online imitation learning via classificationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602616(32383-32397)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602616
Block ASimchowitz MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Efficient and near-optimal smoothed online learning for generalized linear functionsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600813(7477-7489)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600813
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten