Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3045390.3045500guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Estimating the maximum expected value through Gaussian approximation

Published: 19 June 2016 Publication History

Abstract

This paper is about the estimation of the maximum expected value of a set of independent random variables. The performance of several learning algorithms (e.g., Q-learning) is affected by the accuracy of such estimation. Unfortunately, no unbiased estimator exists. The usual approach of taking the maximum of the sample means leads to large overestimates that may significantly harm the performance of the learning algorithm. Recent works have shown that the cross validation estimator--which is negatively biased--outperforms the maximum estimator in many sequential decision-making scenarios. On the other hand, the relative performance of the two estimators is highly problem-dependent. In this paper, we propose a new estimator for the maximum expected value, based on a weighted average of the sample means, where the weights are computed using Gaussian approximations for the distributions of the sample means. We compare the proposed estimator with the other state-of-the-art methods both theoretically, by deriving upper bounds to the bias and the variance of the estimator, and empirically, by testing the performance on different sequential learning problems.

References

[1]
Auer, Peter, Cesa-Bianchi, Nicolo, and Fischer, Paul. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235-256, 2002.
[2]
Blumenthal, Saul and Cohen, Arthur. Estimation of the larger of two normal means. Journal of the American Statistical Association, 63(323):861-876, 1968.
[3]
Dhariyal, I., Sharma, D., and Krishnamoorthy, K. Nonexistence of unbiased estimators of ordered parameters. Statistics, 1985.
[4]
Lee, Daewoo, Defourny, Boris, and Powell, Warren B. Bias-corrected q-learning to control max-operator bias in q-learning. In Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on, pp. 93-99. IEEE, 2013.
[5]
Lee, Donghun and Powell, Warren B. An intelligent battery controller using bias-corrected q-learning. In AAAI. Citeseer, 2012.
[6]
Smith, James E and Winkler, Robert L. The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52(3):311-322, 2006.
[7]
Stone, Mervyn. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society. Series B (Methodological), pp. 111-147, 1974.
[8]
Van den Steen, Eric. Rational overoptimism (and other biases). American Economic Review, pp. 1141-1151, 2004.
[9]
van Hasselt, Hado. Double q-learning. In Advances in Neural Information Processing Systems, pp. 2613-2621, 2010.
[10]
van Hasselt, Hado. Estimating the maximum expected value: an analysis of (nested) cross-validation and the maximum sample average. arXiv preprint arXiv:1302.7175, 2013.
[11]
van Hasselt, Hado, Arthur, Guez, and David, Silver. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015. URL http://arxiv.org/abs/1509.06461.
[12]
Xu, Min, Qin, Tao, and Liu, Tie-Yan. Estimation bias in multi-armed bandit algorithms for search advertising. In Advances in Neural Information Processing Systems, pp. 2400-2408, 2013.

Cited By

View all
  • (2019)Interleaved Q-Learning with Partially Coupled Training ProcessProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331726(449-457)Online publication date: 8-May-2019
  • (2018)Sequential test for the lowest meanProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327530(6335-6345)Online publication date: 3-Dec-2018
  • (2018)Efficient Convention Emergence through Decoupled Reinforcement Social Learning with Teacher-Student MechanismProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237501(795-803)Online publication date: 9-Jul-2018
  • Show More Cited By
  1. Estimating the maximum expected value through Gaussian approximation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48
    June 2016
    3077 pages

    Publisher

    JMLR.org

    Publication History

    Published: 19 June 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Interleaved Q-Learning with Partially Coupled Training ProcessProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331726(449-457)Online publication date: 8-May-2019
    • (2018)Sequential test for the lowest meanProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327530(6335-6345)Online publication date: 3-Dec-2018
    • (2018)Efficient Convention Emergence through Decoupled Reinforcement Social Learning with Teacher-Student MechanismProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237501(795-803)Online publication date: 9-Jul-2018
    • (2017)Weighted double Q-learningProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172372(3455-3461)Online publication date: 19-Aug-2017

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media