Incorporating Domain Models into Bayesian Optimization for RL

Wilson, Aaron; Fern, Alan; Tadepalli, Prasad

doi:10.1007/978-3-642-15939-8_30

Aaron Wilson²³,
Alan Fern²³ &
Prasad Tadepalli²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6323))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3794 Accesses
5 Citations

Abstract

In many Reinforcement Learning (RL) domains there is a high cost for generating experience in order to evaluate an agent’s performance. An appealing approach to reducing the number of expensive evaluations is Bayesian Optimization (BO), which is a framework for global optimization of noisy and costly to evaluate functions. Prior work in a number of RL domains has demonstrated the effectiveness of BO for optimizing parametric policies. However, those approaches completely ignore the state-transition sequence of policy executions and only consider the total reward achieved. In this paper, we study how to more effectively incorporate all of the information observed during policy executions into the BO framework. In particular, our approach uses the observed data to learn approximate transitions models that allow for Monte-Carlo predictions of policy returns. The models are then incorporated into the BO framework as a type of prior on policy returns, which can better inform the BO process. The resulting algorithm provides a new approach for leveraging learned models in RL even when there is no planner available for exploiting those models. We demonstrate the effectiveness of our algorithm in four benchmark domains, which have dynamics of variable complexity. Results indicate that our algorithm effectively combines model based predictions to improve the data efficiency of model free BO methods, and is robust to modeling errors when parts of the domain cannot be modeled successfully.

Download to read the full chapter text

Chapter PDF

Towards Offline Reinforcement Learning with Pessimistic Value Priors

A Survey on Constraining Policy Updates Using the KL Divergence

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Lizotte, D., Wang, T., Bowling, M., Schuurmans, D.: Automatic gait optimization with gaussian process regression. In: IJCAI’07: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 944–949. Morgan Kaufmann, San Francisco (2007)
Google Scholar
Lizotte, D.: Practical Bayesian Optimization. PhD thesis, University of Alberta (2008)
Google Scholar
Brochu, E., Cora, V., de Freitas, N.: A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023 (2009)
Google Scholar
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: International Conference on Machine Learning, pp. 201–208 (2005)
Google Scholar
Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: UAI (1999)
Google Scholar
Strens, M.J.A.: A Bayesian framework for reinforcement learning. In: International Conference on Machine Learning, pp. 943–950 (2000)
Google Scholar
Duff, M.: Design for an optimal probe. In: International Conference on Machine Learning (2003)
Google Scholar
Lagoudakis, M.G., Parr, R., Bartlett, L.: Least-squares policy iteration. Journal of Machine Learning Research 4 (2003)
Article Google Scholar
Sutton, R., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research 15(1), 351–381 (2001)
MATH MathSciNet Google Scholar
Mockus, J.: Application of bayesian approach to numerical methods of global and stochastic optimization. Global Optimization 4(4), 347–365 (1994)
Article MATH MathSciNet Google Scholar
Vazquez, E., Bect, J.: Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and Inference 140(11), 3088–3095 (2010)
Article Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2005)
Google Scholar
Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Oregon State University School of EECS,
Aaron Wilson, Alan Fern & Prasad Tadepalli

Authors

Aaron Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Alan Fern
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Tadepalli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Avenida de los Castros, s/n, 39071, Santander, Spain
José Luis Balcázar
Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018, Barcelona, Spain
Francesco Bonchi
Yahoo! Research Barcelona, Avinguda Diagnonal 177, 08018, Barcelona, Spain
Aristides Gionis
TAO, CNRS-INRIA-LRI, Université Paris-Sud, 91405, Orsay, France
Michèle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilson, A., Fern, A., Tadepalli, P. (2010). Incorporating Domain Models into Bayesian Optimization for RL. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-15939-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incorporating Domain Models into Bayesian Optimization for RL

Abstract

Chapter PDF

Similar content being viewed by others

Towards Offline Reinforcement Learning with Pessimistic Value Priors

A Survey on Constraining Policy Updates Using the KL Divergence

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Incorporating Domain Models into Bayesian Optimization for RL

Abstract

Chapter PDF

Similar content being viewed by others

Towards Offline Reinforcement Learning with Pessimistic Value Priors

A Survey on Constraining Policy Updates Using the KL Divergence

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation