Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration

Jung, Tobias; Stone, Peter

doi:10.1007/978-3-642-15880-3_44

Tobias Jung²³ &
Peter Stone²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6321))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2572 Accesses
7 Citations

Abstract

We present an implementation of model-based online reinforcement learning (RL) for continuous domains with deterministic transitions that is specifically designed to achieve low sample complexity. To achieve low sample complexity, since the environment is unknown, an agent must intelligently balance exploration and exploitation, and must be able to rapidly generalize from observations. While in the past a number of related sample efficient RL algorithms have been proposed, to allow theoretical analysis, mainly model-learners with weak generalization capabilities were considered. Here, we separate function approximation in the model learner (which does require samples) from the interpolation in the planner (which does not require samples). For model-learning we apply Gaussian processes regression (GP) which is able to automatically adjust itself to the complexity of the problem (via Bayesian hyperparameter selection) and, in practice, often able to learn a highly accurate model from very little data. In addition, a GP provides a natural way to determine the uncertainty of its predictions, which allows us to implement the “optimism in the face of uncertainty” principle used to efficiently control exploration. Our method is evaluated on four common benchmark domains.

Download to read the full chapter text

Chapter PDF

Model-Based Reinforcement Learning from PILCO to PETS

Towards Offline Reinforcement Learning with Pessimistic Value Priors

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bernstein, A., Shimkin, N.: Adaptive-resolution reinforcement learning with efficient exploration. Machine Learning (2010), doi:10.1007/s10994-010-5186-7 (published online: May 5, 2010)
Google Scholar
Boone, G.: Minimum-time control of the acrobot. In: Proc. of IEEE International Conference on Robotics and Automation, vol. 4, pp. 3281–3287 (1997)
Google Scholar
Brafman, R., Tennenholtz, M.: R-MAX, a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3, 213–231 (2002)
Article MathSciNet Google Scholar
Busoniu, L., Ernst, D., De Schutter, B., Babuska, R.: Online least-squares policy iteration for reinforcement learning control. In: American Control Conference, ACC 2010 (2010)
Google Scholar
Davies, S.: Multidimensional triangulation and interpolation for reinforcement learning. In: NIPS 9. Morgan, San Francisco (1996)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7-9), 1508–1524 (2009)
Article Google Scholar
Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proc. of ICML 20, pp. 154–161 (2003)
Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. JMLR 6, 503–556 (2005)
MathSciNet Google Scholar
Grüne, L.: An adaptive grid scheme for the discrete Hamilton-Jacobi-Bellman equation. Numerische Mathematik 75, 319–337 (1997)
Article MATH MathSciNet Google Scholar
Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: The 7th Symposium on Abstraction, Reformulation and Approximation (2007)
Google Scholar
Jung, T., Polani, D.: Learning robocup-keepaway with kernels. JMLR: Workshop and Conference Proceedings (Gaussian Processes in Practice) 1, 33–57 (2007)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. JMLR 4, 1107–1149 (2003)
Article MathSciNet Google Scholar
Li, L., Littman, M.L., Mansley, C.R.: Online exploration in least-squares policy iteration. In: Proc. of 8th AAMAS (2009)
Google Scholar
Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49, 291–323 (2002)
Article MATH Google Scholar
Nouri, A., Littman, M.L.: Multi-resolution exploration in continuous spaces. In: NIPS 21 (2008)
Google Scholar
Quiñonero-Candela, J., Rasmussen, C.E., Williams, C.K.I.: Approximation methods for gaussian process regression. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Learning Machines, pp. 203–223. MIT Press, Cambridge (2007)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Riedmiller, M.: Neural fitted q-iteration. In: Proc. of 16th ECML (2005)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas at Austin,
Tobias Jung & Peter Stone

Authors

Tobias Jung
View author publications
You can also search for this author in PubMed Google Scholar
Peter Stone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Avenida de los Castros, s/n, 39071, Santander, Spain
José Luis Balcázar
Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018, Barcelona, Spain
Francesco Bonchi
Yahoo! Research Barcelona, Avinguda Diagnonal 177, 08018, Barcelona, Spain
Aristides Gionis
TAO, CNRS-INRIA-LRI, Université Paris-Sud, 91405, Orsay, France
Michèle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung, T., Stone, P. (2010). Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15880-3_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-15880-3_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15879-7
Online ISBN: 978-3-642-15880-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration

Abstract

Chapter PDF

Similar content being viewed by others

Model-Based Reinforcement Learning from PILCO to PETS

Towards Offline Reinforcement Learning with Pessimistic Value Priors

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration

Abstract

Chapter PDF

Similar content being viewed by others

Model-Based Reinforcement Learning from PILCO to PETS

Towards Offline Reinforcement Learning with Pessimistic Value Priors

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation