Article

Estimating the maximum expected value through Gaussian approximation

Authors:

Carlo D'Eramo,

Alessandro Nuara,

Marcello RestelliAuthors Info & Claims

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

Pages 1032 - 1040

Published: 19 June 2016 Publication History

Publisher Site

Abstract

This paper is about the estimation of the maximum expected value of a set of independent random variables. The performance of several learning algorithms (e.g., Q-learning) is affected by the accuracy of such estimation. Unfortunately, no unbiased estimator exists. The usual approach of taking the maximum of the sample means leads to large overestimates that may significantly harm the performance of the learning algorithm. Recent works have shown that the cross validation estimator--which is negatively biased--outperforms the maximum estimator in many sequential decision-making scenarios. On the other hand, the relative performance of the two estimators is highly problem-dependent. In this paper, we propose a new estimator for the maximum expected value, based on a weighted average of the sample means, where the weights are computed using Gaussian approximations for the distributions of the sample means. We compare the proposed estimator with the other state-of-the-art methods both theoretically, by deriving upper bounds to the bias and the variance of the estimator, and empirically, by testing the performance on different sequential learning problems.

References

[1]

Auer, Peter, Cesa-Bianchi, Nicolo, and Fischer, Paul. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235-256, 2002.

Crossref

Google Scholar

[2]

Blumenthal, Saul and Cohen, Arthur. Estimation of the larger of two normal means. Journal of the American Statistical Association, 63(323):861-876, 1968.

Google Scholar

[3]

Dhariyal, I., Sharma, D., and Krishnamoorthy, K. Nonexistence of unbiased estimators of ordered parameters. Statistics, 1985.

Google Scholar

[4]

Lee, Daewoo, Defourny, Boris, and Powell, Warren B. Bias-corrected q-learning to control max-operator bias in q-learning. In Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on, pp. 93-99. IEEE, 2013.

Google Scholar

[5]

Lee, Donghun and Powell, Warren B. An intelligent battery controller using bias-corrected q-learning. In AAAI. Citeseer, 2012.

Crossref

Google Scholar

[6]

Smith, James E and Winkler, Robert L. The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52(3):311-322, 2006.

Crossref

Google Scholar

[7]

Stone, Mervyn. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society. Series B (Methodological), pp. 111-147, 1974.

Google Scholar

[8]

Van den Steen, Eric. Rational overoptimism (and other biases). American Economic Review, pp. 1141-1151, 2004.

Google Scholar

[9]

van Hasselt, Hado. Double q-learning. In Advances in Neural Information Processing Systems, pp. 2613-2621, 2010.

Crossref

Google Scholar

[10]

van Hasselt, Hado. Estimating the maximum expected value: an analysis of (nested) cross-validation and the maximum sample average. arXiv preprint arXiv:1302.7175, 2013.

Google Scholar

[11]

van Hasselt, Hado, Arthur, Guez, and David, Silver. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015. URL http://arxiv.org/abs/1509.06461.

Crossref

Google Scholar

[12]

Xu, Min, Qin, Tao, and Liu, Tie-Yan. Estimation bias in multi-armed bandit algorithms for search advertising. In Advances in Neural Information Processing Systems, pp. 2400-2408, 2013.

Crossref

Google Scholar

Cited By

View all

He MGuo HElkind EVeloso MAgmon NTaylor M(2019)Interleaved Q-Learning with Partially Coupled Training ProcessProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331726(449-457)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331726
Kaufmann EKoolen WGarivier A(2018)Sequential test for the lowest meanProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327530(6335-6345)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327530
Wang YLu WHao JWei JLeung HAndre EKoenig SDastani MSukthankar G(2018)Efficient Convention Emergence through Decoupled Reinforcement Social Learning with Teacher-Student MechanismProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237501(795-803)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237501
Show More Cited By

Estimating the maximum expected value through Gaussian approximation
1. Computing methodologies

Recommendations

Estimating Jakes' Doppler power spectrum parameters using the whittle approximation

We derive methods for asymptotic maximum likelihood (ML) estimation of Jakes' Doppler power spectrum parameters from complex noisy estimates of the fading channel. We consider both single-input single-output (SISO) and smart-antenna scenarios and ...
Sample Size for Maximum Likelihood Estimates of Gaussian Model
CAIP 2015: Proceedings, Part II, of the 16th International Conference on Computer Analysis of Images and Patterns - Volume 9257

Significant properties of maximum likelihood ML estimate are consistency, normality and efficiency. However, it has been proven that these properties are valid when the sample size approaches infinity. Many researches warn that a behavior of ML ...
Maximum likelihood estimation of the parameters of discrete fractionally differenced Gaussian noise process

A maximum-likelihood estimation procedure is constructed for estimating the parameters of discrete fractionally differenced Gaussian noise from an observation set of finite size N . The procedure does not involve the computation of any matrix inverse or ...

Comments

Information & Contributors

Information

Published In

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

June 2016

3077 pages

Publisher

JMLR.org

Publication History

Published: 19 June 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

He MGuo HElkind EVeloso MAgmon NTaylor M(2019)Interleaved Q-Learning with Partially Coupled Training ProcessProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331726(449-457)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331726
Kaufmann EKoolen WGarivier A(2018)Sequential test for the lowest meanProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327530(6335-6345)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327530
Wang YLu WHao JWei JLeung HAndre EKoenig SDastani MSukthankar G(2018)Efficient Convention Emergence through Decoupled Reinforcement Social Learning with Teacher-Student MechanismProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237501(795-803)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237501
Zhang ZPan ZKochenderfer M(2017)Weighted double Q-learningProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172372(3455-3461)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3172077.3172372

Abstract

References

Cited By

Recommendations

Estimating Jakes' Doppler power spectrum parameters using the whittle approximation

Sample Size for Maximum Likelihood Estimates of Gaussian Model

Maximum likelihood estimation of the parameters of discrete fractionally differenced Gaussian noise process

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations