Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3023549.3023592guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Parametric return density estimation for reinforcement learning

Published: 08 July 2010 Publication History
  • Get Citation Alerts
  • Abstract

    Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision-making rules in terms of the expected returns. However, especially for risk management purposes, other risk-sensitive criteria such as the value-at-risk or the expected shortfall are sometimes preferred in real applications. Here, we describe a parametric method for estimating density of the returns, which allows us to handle various criteria in a unified manner. We first extend the Bellman equation for the conditional expected return to cover a conditional probability density of the returns. Then we derive an extension of the TD-learning algorithm for estimating the return densities in an unknown environment. As test instances, several parametric density estimation algorithms are presented for the Gaussian, Laplace, and skewed Laplace distributions. We show that these algorithms lead to risk-sensitive as well as robust RL paradigms through numerical experiments.

    References

    [1]
    S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251-276, 1998.
    [2]
    S. Amari and H. Nagaoka. Method of Information Geometry. Oxford University Press, 2000.
    [3]
    S. Amari, H. Park, and K. Fukumizu. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 12(6):1399-1409, 2000.
    [4]
    P. Artzner, F. Delbaen, J. M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9:203-228, 1999.
    [5]
    J. A. Bagnell. Learning Decisions: Robustness, Uncertainty, and Approximation. PhD thesis, Robotics Institute, Carnegie Mellon University, 2004.
    [6]
    J. A. Bagnell and J. G. Schneider. Covariant policy search. In Proceedings of the International Joint Conference on Artificial Intelligence, July 2003.
    [7]
    D. P. Bertsekas. Dynamic Programming and Optimal Control, Volumes 1 and 2. Athena Scientific, 1995.
    [8]
    C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
    [9]
    R. Dearden, N. Friedman, and S. Russell. Bayesian Q-learning. In National Conference on Artificial Intelligence, pages 761-768, 1998.
    [10]
    B. Defourny, D. Ernst, and L. Wehenkel. Risk-aware decision making and dynamic programming. In NIPS 2008 Workshop on Model Uncertainty and Risk in RL, 2008.
    [11]
    Y. Engel, S. Mannor, and R. Meir. Reinforcement learning with Gaussian processes. In International Conference on Machine Learning, pages 201-208, 2005.
    [12]
    P. Geibel and F. Wysotzki. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research, 24:81-108, 2005.
    [13]
    M. Gordy. A comparative anatomy of credit risk models. Journal of Banking and Finance, 24:119-149, 2000.
    [14]
    M. Heger. Consideration of risk in reinforcement learning. In International Conference on Machine Learning, pages 105-111, 1994.
    [15]
    S. Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002.
    [16]
    R. Koenker. Quantile Regression. Cambridge University Press, 2005.
    [17]
    D. G. Luenberger. Investment Science. Oxford University Press, 1998.
    [18]
    O. Mihatsch and R. Neuneier. Risk-sensitive reinforcement learning. Machine Learning, 49(2-3):267-290, 2002.
    [19]
    T. Minka. Expectation propagation for approximate Bayesian inference. In Conference on Uncertainty in Artificial Intelligence, pages 362-369, 2001.
    [20]
    T. Morimura, M. Sugiyama, H. Kashima, H. Hachiya, and T. Tanaka. Nonparametric return distribution approximation for reinforcement learning. In International Conference on Machine Learning, 2010. to appear.
    [21]
    J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 2006.
    [22]
    M. Sato, H. Kimura, and S. Kobayahi. TD algorithm for the variance of return and mean-variance reinforcement learning. The IEICE Transactions on Information and Systems (Japanese Edition), 16(3):353-362, 2001.
    [23]
    M. Sugiyama, H. Hachiya, H. Kashima, and T. Morimura. Least absolute policy iteration for robust value function approximation. In IEEE International Conference on Robotics and Automation, pages 699-704, 2009.
    [24]
    R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, 1998.
    [25]
    C. J. C. H. Watkins and P. Dayan. Technical note: Q-learning. Machine Learning, 8:279-292, 1992.
    [26]
    D. J. White. Mean, variance, and probabilistic criteria in finite markov decision processes: A review. Journal of Optimization Theory and Applications, 56(1):1-29, 1988.

    Cited By

    View all
    • (2023)Distributional model equivalence for risk-sensitive reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668589(56531-56552)Online publication date: 10-Dec-2023
    • (2023)Invariance in policy optimisation and partial identifiability in reward learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619736(32033-32058)Online publication date: 23-Jul-2023
    • (2023)The statistical benefits of quantile temporal-difference learning for value estimationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619622(29210-29231)Online publication date: 23-Jul-2023
    • Show More Cited By

    Index Terms

    1. Parametric return density estimation for reinforcement learning
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      UAI'10: Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence
      July 2010
      751 pages
      ISBN:9780974903965
      • Editors:
      • Peter Grunwald,
      • Peter Spirtes

      Publisher

      AUAI Press

      Arlington, Virginia, United States

      Publication History

      Published: 08 July 2010

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Distributional model equivalence for risk-sensitive reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668589(56531-56552)Online publication date: 10-Dec-2023
      • (2023)Invariance in policy optimisation and partial identifiability in reward learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619736(32033-32058)Online publication date: 23-Jul-2023
      • (2023)The statistical benefits of quantile temporal-difference learning for value estimationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619622(29210-29231)Online publication date: 23-Jul-2023
      • (2023)Policy Fairness and Unknown Bias Dynamics in Sequential AllocationsProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623262(1-10)Online publication date: 30-Oct-2023
      • (2022)Distributional reinforcement learning for risk-sensitive policiesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602516(30977-30989)Online publication date: 28-Nov-2022
      • (2022)The nature of temporal difference errors in multi-step distributional reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602464(30265-30276)Online publication date: 28-Nov-2022
      • (2019)Value function in frequency domain and the characteristic value iteration algorithmProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455613(14808-14819)Online publication date: 8-Dec-2019
      • (2018)Exploration by distributional reinforcement learningProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304889.3305037(2710-2716)Online publication date: 13-Jul-2018
      • (2017)A Distributional Perspective on Reinforcement LearningProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305381.3305428(449-458)Online publication date: 6-Aug-2017

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media