Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2897518.2897536acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Public Access

The computational power of optimization in online learning

Published: 19 June 2016 Publication History

Abstract

We consider the fundamental problem of prediction with expert advice where the experts are “optimizable”: there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to N experts in total O(√N) computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is Θ(N). These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle—i.e., an efficient empirical risk minimizer—allows to learn a finite hypothesis class of size N in time O(logN).
We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is Θ(√N), yielding again a quadratic improvement upon the oracle-free setting, where Θ(N) is known to be tight.

References

[1]
S. Aaronson. Lower bounds for local search by quantum arguments. SIAM Journal on Computing, 35(4):804– 824, 2006.
[2]
I. Adler. The equivalence of linear programs and zerosum games. International Journal of Game Theory, 42 (1):165–177, 2013.
[3]
A. Agarwal. Computational Trade-offs in Statistical Learning. PhD thesis, EECS Department, University of California, Berkeley, 2012.
[4]
A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In Proceedings of the 31st International Conference on Machine Learning (ICML’14), pages 1638–1646, 2014.
[5]
D. Aldous. Minimization algorithms and random walk on the d-cube. The Annals of Probability, pages 403– 413, 1983.
[6]
S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(6):121–164, 2012.
[7]
B. Awerbuch and R. Kleinberg. Online linear optimization and adaptive routing. J. Comput. Syst. Sci., 74(1): 97–114, 2008.
[8]
A. Blum. Separating distribution-free and mistakebound learning models over the boolean domain. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science, pages 211–218. IEEE, 1990.
[9]
A. Blum and Y. Mansour. From external to internal regret. Journal of Machine Learning Research, 8:1307– 1324, 2007.
[10]
F. Brandt, F. Fischer, and P. Harrenstein. On the rate of convergence of fictitious play. Theory of Computing Systems, 53(1):41–52, 2013.
[11]
G. W. Brown. Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13 (1):374–376, 1951.
[12]
N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
[13]
K. L. Clarkson, E. Hazan, and D. P. Woodruff. Sublinear optimization for machine learning. J. ACM, 59(5): 23:1–23:49, 2012.
[14]
V. Dani and T. P. Hayes. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pages 937–943. Society for Industrial and Applied Mathematics, 2006.
[15]
C. Daskalakis and Q. Pan. A counter-example to karlin’s strong conjecture for fictitious play. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 11–20. IEEE, 2014.
[16]
C. Daskalakis, A. Deckelbaum, and A. Kim. Nearoptimal no-regret algorithms for zero-sum games. In Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’11, pages 235–254, 2011.
[17]
M. Dud´ık, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. Efficient optimal learning for contextual bandits. In UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14-17, 2011, pages 169–178, 2011.
[18]
E. Even-Dar, S. M. Kakade, and Y. Mansour. Online markov decision processes. Math. Oper. Res., 34(3): 726–736, 2009.
[19]
E. Even-dar, Y. Mansour, and U. Nadav. On the convergence of regret minimization dynamics in concave games. In Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, STOC ’09, pages 523–532, New York, NY, USA, 2009. ACM.
[20]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, 1997.
[21]
Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79–103, 1999.
[22]
Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. Using and combining predictors that specialize. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 334–343. ACM, 1997.
[23]
E. Gofer, N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for branching experts. In COLT’13, pages 618–638, 2013.
[24]
M. D. Grigoriadis and L. G. Khachiyan. A sublineartime randomized approximation algorithm for matrix games. Operations Research Letters, 18:53–58, 1995.
[25]
J. Hannan. Approximation to bayes risk in repeated play. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games, volume 3, pages 97–139, 1957.
[26]
S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5): 1127–1150, 2000.
[27]
E. Hazan. Introduction to Online Convex Optimization. 2014. URL http://ocobook.cs.princeton.edu/.
[28]
E. Hazan and S. Kale. Online submodular minimization. J. Mach. Learn. Res., 13(1):2903–2922, 2012.
[29]
E. Hazan and T. Koren. The computational power of optimization in online learning. arXiv preprint arXiv:1504.02089, 2015.
[30]
E. Hazan, S. Kale, and S. Shalev-Shwartz. Near-optimal algorithms for online matrix prediction. In COLT, pages 38.1–38.13, 2012.
[31]
D. P. Helmbold and M. K. Warmuth. Learning permutations with exponential weights. J. Mach. Learn. Res., 10:1705–1736, 2009.
[32]
S. Kakade and A. T. Kalai. From batch to transductive online learning. In Advances in Neural Information Processing Systems, pages 611–618, 2006.
[33]
S. M. Kakade, A. T. Kalai, and K. Ligett. Playing games with approximation algorithms. SIAM J. Comput., 39(3):1088–1106, 2009.
[34]
A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
[35]
R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma. Regret bounds for sleeping experts and bandits. Machine learning, 80(2-3):245–272, 2010.
[36]
N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108 (2):212–261, 1994. ISSN 0890-5401.
[37]
H. B. McMahan and A. Blum. Online geometric optimization in the bandit setting against an adaptive adversary. In Learning theory, pages 109–123. Springer, 2004.
[38]
Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103(1):127–152, 2005.
[39]
N. Nisan, T. Roughgarden, E. Tardos, and V. V. Vazirani. Algorithmic Game Theory. Cambridge University Press, New York, NY, USA, 2007. ISBN 0521872820.
[40]
J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54(2):296–301, 1951.
[41]
S. Shalev-Shwartz and N. Srebro. SVM optimization: Inverse dependence on training set size. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pages 928–935, 2008.
[42]
S. Shalev-Shwartz, O. Shamir, and E. Tromer. Using more data to speed-up training time. In AISTATS, volume 22, pages 1019–1027, 2012.
[43]
M. Warmuth and D. Kuzmin. Online variance minimization. In G. Lugosi and H. Simon, editors, Learning Theory, volume 4005 of Lecture Notes in Computer Science, pages 514–528. Springer Berlin Heidelberg, 2006.
[44]
M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928–936, 2003.

Cited By

View all
  • (2024)Agnostic interactive imitation learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693249(29278-29323)Online publication date: 21-Jul-2024
  • (2024)Evolve: Enhancing Unsupervised Continual Learning with Multiple Experts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00236(2355-2366)Online publication date: 3-Jan-2024
  • (2024)Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00117(1953-1967)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. The computational power of optimization in online learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing
      June 2016
      1141 pages
      ISBN:9781450341325
      DOI:10.1145/2897518
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 June 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Best-response dynamics
      2. Learning in games
      3. Local search
      4. Online learning
      5. Optimization oracles
      6. Zero-sum games

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      STOC '16
      Sponsor:
      STOC '16: Symposium on Theory of Computing
      June 19 - 21, 2016
      MA, Cambridge, USA

      Acceptance Rates

      Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

      Upcoming Conference

      STOC '25
      57th Annual ACM Symposium on Theory of Computing (STOC 2025)
      June 23 - 27, 2025
      Prague , Czech Republic

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)156
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 28 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Agnostic interactive imitation learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693249(29278-29323)Online publication date: 21-Jul-2024
      • (2024)Evolve: Enhancing Unsupervised Continual Learning with Multiple Experts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00236(2355-2366)Online publication date: 3-Jan-2024
      • (2024)Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00117(1953-1967)Online publication date: 27-Oct-2024
      • (2023)An improved relaxation for oracle-efficient adversarial contextual banditsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669071(67471-67486)Online publication date: 10-Dec-2023
      • (2023)Bypassing the simulatorProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668391(52086-52131)Online publication date: 10-Dec-2023
      • (2023)Smoothed online learning for prediction in piecewise affine systemsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667926(41663-41674)Online publication date: 10-Dec-2023
      • (2023)Towards characterizing the first-order query complexity of learning (approximate) nash equilibria in zero-sum matrix gamesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666708(13356-13373)Online publication date: 10-Dec-2023
      • (2023)Near Optimal Memory-Regret Tradeoff for Online Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00069(1171-1194)Online publication date: 6-Nov-2023
      • (2022)On efficient online imitation learning via classificationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602616(32383-32397)Online publication date: 28-Nov-2022
      • (2022)Efficient and near-optimal smoothed online learning for generalized linear functionsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600813(7477-7489)Online publication date: 28-Nov-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media