Abstract
Prediction from expert advice is a fundamental problem in machine learning. A major pillar of the field is the existence of learning algorithms whose average loss approaches that of the best expert in hindsight (in other words, whose average regret approaches zero). Traditionally the regret of online algorithms was bounded in terms of the number of prediction rounds.
Cesa-Bianchi, Mansour and Stoltz (Mach. Learn. 66(2–3):21–352, 2007) posed the question whether it is be possible to bound the regret of an online algorithm by the variation of the observed costs. In this paper we resolve this question, and prove such bounds in the fully adversarial setting, in two important online learning scenarios: prediction from expert advice, and online linear optimization.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Allenberg-Neeman, C., & Neeman, B. (2004). Full information game with gains and losses. In 15th international conference on algorithmic learning theory.
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2003). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 48–77.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.
Cesa-Bianchi, N., Mansour, Y., & Stoltz, G. (2007). Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2–3), 21–352.
Cover, T. (1991). Universal portfolios. Mathematical Finance, 1, 1–19.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Hannan, J. (1957). Approximation to Bayes risk in repeated play. In M. Dresher, A. W. Tucker, & P. Wolfe (Eds.), Contributions to the theory of games (Vol. III, pp. 97–139).
Hazan, E., & Kale, S. (2009a). On stochastic and worst-case models for investing. In Advances in neural information processing systems (NIPS) (Vol. 22).
Hazan, E., & Kale, S. (2009b). Better algorithms for benign bandits. In ACM-SIAM symposium on discrete algorithms (SODA09).
Helmbold, D. P., Kivinen, J., & Warmuth, M. K. (1999). Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6), 1291–1304.
Herbster, M., & Warmuth, M. K. (2001). Tracking the best linear predictor. Journal of Machine Learning Research, 1, 281–309.
Kalai, A., & Vempala, S. (2005). Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3), 291–307.
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1), 1–63.
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261.
Vovk, V. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2), 153–173.
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ICML (pp. 928–936).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Sham Kakade and Ping Li.
Work done while S. Kale was at Microsoft Research.
Rights and permissions
About this article
Cite this article
Hazan, E., Kale, S. Extracting certainty from uncertainty: regret bounded by variation in costs. Mach Learn 80, 165–188 (2010). https://doi.org/10.1007/s10994-010-5175-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5175-x