Abstract
In many Reinforcement Learning (RL) domains there is a high cost for generating experience in order to evaluate an agent’s performance. An appealing approach to reducing the number of expensive evaluations is Bayesian Optimization (BO), which is a framework for global optimization of noisy and costly to evaluate functions. Prior work in a number of RL domains has demonstrated the effectiveness of BO for optimizing parametric policies. However, those approaches completely ignore the state-transition sequence of policy executions and only consider the total reward achieved. In this paper, we study how to more effectively incorporate all of the information observed during policy executions into the BO framework. In particular, our approach uses the observed data to learn approximate transitions models that allow for Monte-Carlo predictions of policy returns. The models are then incorporated into the BO framework as a type of prior on policy returns, which can better inform the BO process. The resulting algorithm provides a new approach for leveraging learned models in RL even when there is no planner available for exploiting those models. We demonstrate the effectiveness of our algorithm in four benchmark domains, which have dynamics of variable complexity. Results indicate that our algorithm effectively combines model based predictions to improve the data efficiency of model free BO methods, and is robust to modeling errors when parts of the domain cannot be modeled successfully.
Chapter PDF
Similar content being viewed by others
Keywords
- Reinforcement Learn
- Reward Function
- Policy Parameter
- Bayesian Optimization
- Bayesian Optimization Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Lizotte, D., Wang, T., Bowling, M., Schuurmans, D.: Automatic gait optimization with gaussian process regression. In: IJCAI’07: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 944–949. Morgan Kaufmann, San Francisco (2007)
Lizotte, D.: Practical Bayesian Optimization. PhD thesis, University of Alberta (2008)
Brochu, E., Cora, V., de Freitas, N.: A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023 (2009)
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: International Conference on Machine Learning, pp. 201–208 (2005)
Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: UAI (1999)
Strens, M.J.A.: A Bayesian framework for reinforcement learning. In: International Conference on Machine Learning, pp. 943–950 (2000)
Duff, M.: Design for an optimal probe. In: International Conference on Machine Learning (2003)
Lagoudakis, M.G., Parr, R., Bartlett, L.: Least-squares policy iteration. Journal of Machine Learning Research 4 (2003)
Sutton, R., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research 15(1), 351–381 (2001)
Mockus, J.: Application of bayesian approach to numerical methods of global and stochastic optimization. Global Optimization 4(4), 347–365 (1994)
Vazquez, E., Bect, J.: Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and Inference 140(11), 3088–3095 (2010)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2005)
Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wilson, A., Fern, A., Tadepalli, P. (2010). Incorporating Domain Models into Bayesian Optimization for RL. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-15939-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)