Abstract
We present an implementation of model-based online reinforcement learning (RL) for continuous domains with deterministic transitions that is specifically designed to achieve low sample complexity. To achieve low sample complexity, since the environment is unknown, an agent must intelligently balance exploration and exploitation, and must be able to rapidly generalize from observations. While in the past a number of related sample efficient RL algorithms have been proposed, to allow theoretical analysis, mainly model-learners with weak generalization capabilities were considered. Here, we separate function approximation in the model learner (which does require samples) from the interpolation in the planner (which does not require samples). For model-learning we apply Gaussian processes regression (GP) which is able to automatically adjust itself to the complexity of the problem (via Bayesian hyperparameter selection) and, in practice, often able to learn a highly accurate model from very little data. In addition, a GP provides a natural way to determine the uncertainty of its predictions, which allows us to implement the “optimism in the face of uncertainty” principle used to efficiently control exploration. Our method is evaluated on four common benchmark domains.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bernstein, A., Shimkin, N.: Adaptive-resolution reinforcement learning with efficient exploration. Machine Learning (2010), doi:10.1007/s10994-010-5186-7 (published online: May 5, 2010)
Boone, G.: Minimum-time control of the acrobot. In: Proc. of IEEE International Conference on Robotics and Automation, vol. 4, pp. 3281–3287 (1997)
Brafman, R., Tennenholtz, M.: R-MAX, a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3, 213–231 (2002)
Busoniu, L., Ernst, D., De Schutter, B., Babuska, R.: Online least-squares policy iteration for reinforcement learning control. In: American Control Conference, ACC 2010 (2010)
Davies, S.: Multidimensional triangulation and interpolation for reinforcement learning. In: NIPS 9. Morgan, San Francisco (1996)
Deisenroth, M.P., Rasmussen, C.E., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7-9), 1508–1524 (2009)
Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proc. of ICML 20, pp. 154–161 (2003)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. JMLR 6, 503–556 (2005)
Grüne, L.: An adaptive grid scheme for the discrete Hamilton-Jacobi-Bellman equation. Numerische Mathematik 75, 319–337 (1997)
Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: The 7th Symposium on Abstraction, Reformulation and Approximation (2007)
Jung, T., Polani, D.: Learning robocup-keepaway with kernels. JMLR: Workshop and Conference Proceedings (Gaussian Processes in Practice) 1, 33–57 (2007)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. JMLR 4, 1107–1149 (2003)
Li, L., Littman, M.L., Mansley, C.R.: Online exploration in least-squares policy iteration. In: Proc. of 8th AAMAS (2009)
Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49, 291–323 (2002)
Nouri, A., Littman, M.L.: Multi-resolution exploration in continuous spaces. In: NIPS 21 (2008)
Quiñonero-Candela, J., Rasmussen, C.E., Williams, C.K.I.: Approximation methods for gaussian process regression. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Learning Machines, pp. 203–223. MIT Press, Cambridge (2007)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Riedmiller, M.: Neural fitted q-iteration. In: Proc. of 16th ECML (2005)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jung, T., Stone, P. (2010). Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15880-3_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-15880-3_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15879-7
Online ISBN: 978-3-642-15880-3
eBook Packages: Computer ScienceComputer Science (R0)