Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1143844.1143963acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Published: 25 June 2006 Publication History

Abstract

Inference in Markov Decision Processes has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. A particularly interesting aspect of the approach is that any existing inference technique in DBNs now becomes available for answering behavioral question--including those on continuous, factorial, or hierarchical state representations. Here we present an Expectation Maximization algorithm for computing optimal policies. Unlike previous approaches we can show that this actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time. The algorithm is generic in that any inference technique can be utilized in the E-step. We demonstrate this for exact inference on a discrete maze and Gaussian belief state propagation in continuous stochastic optimal control problems.

References

[1]
Attias, H. (2003). Planning by probabilistic inference. Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics.
[2]
Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI 1995) (pp. 1104--1111).
[3]
Bui, H., Venkatesh, S., & West, G. (2002). Policy recognition in the abstract hidden markov models. Journal of Artificial Intelligence Research, 17, 451--499.
[4]
Ghahramani, Z., & Jordan, M. I. (1995). Factorial hidden Markov models. Advances in Neural Information Processing Systems, NIPS (pp. 472--478). MIT Press.
[5]
Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research (JAIR), 19, 399--468.
[6]
Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. Proc. of Uncertainty in Artificial Intelligence (UAI 1998) (pp. 220--229).
[7]
Koller, D., & Parr, R. (1999). Computing factored value functions for policies in structured MDPs. Proc. of the 16th Int. Joint Conf. on Artificial Intelligence (IJCAI) (pp. 1332--1339).
[8]
Kveton, B., & Hauskrecht, M. (2005). An MCMC approach to solving hybrid factored MDPs. Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI 2005).
[9]
Minka, T. (2001). A family of algorithms for approximate bayesian inference. PhD thesis, MIT.
[10]
Murphy, K. (2002). Dynamic bayesian networks: Representation, inference and learning. PhD Thesis, UC Berkeley, Computer Science Division. See particularly the chapter on DBN at http://www.cs.ubc.ca/murphyk/Papers/dbnchapter.pdf.
[11]
Ng, A. Y., Parr, R., & Koller, D. (1999). Policy search via density estimation. Advances in Neural Information Processing Systems 12 (pp. 1022--1028).
[12]
Sutton, R., & Barto, A. (1998). Reinforcement learning. MIT Press, Cambridge.
[13]
Verma, D., & Rao, R. P. N. (2006). Goal-based imitation as probabilistic inference over graphical models. Advances in Neural Information Processing Systems 18 (NIPS 2005).

Cited By

View all
  • (2024)Difference of Convex Functions Programming for Policy Optimization in Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663153(2339-2341)Online publication date: 6-May-2024
  • (2024)A Probabilistic Treatment of (PO)MDPs with Multiplicative Reward Structure2024 European Control Conference (ECC)10.23919/ECC64448.2024.10591009(1-7)Online publication date: 25-Jun-2024
  • (2024)Constrained Stein Variational Trajectory OptimizationIEEE Transactions on Robotics10.1109/TRO.2024.342842840(3602-3619)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;
Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)5
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Difference of Convex Functions Programming for Policy Optimization in Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663153(2339-2341)Online publication date: 6-May-2024
  • (2024)A Probabilistic Treatment of (PO)MDPs with Multiplicative Reward Structure2024 European Control Conference (ECC)10.23919/ECC64448.2024.10591009(1-7)Online publication date: 25-Jun-2024
  • (2024)Constrained Stein Variational Trajectory OptimizationIEEE Transactions on Robotics10.1109/TRO.2024.342842840(3602-3619)Online publication date: 2024
  • (2024)Learning a Generalizable Trajectory Sampling Distribution for Model Predictive ControlIEEE Transactions on Robotics10.1109/TRO.2024.337002640(2111-2127)Online publication date: 2024
  • (2024)Probabilistic control and majorisation of optimal controlSystems & Control Letters10.1016/j.sysconle.2024.105837190(105837)Online publication date: Aug-2024
  • (2023)Reward shaping using directed graph convolution neural networks for reinforcement learning and gamesFrontiers in Physics10.3389/fphy.2023.131046711Online publication date: 9-Nov-2023
  • (2023)Flexible intentions: An Active Inference theoryFrontiers in Computational Neuroscience10.3389/fncom.2023.112869417Online publication date: 20-Mar-2023
  • (2023)Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation MaximizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.319024634:9(5268-5282)Online publication date: Sep-2023
  • (2023)Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312916034:8(4776-4790)Online publication date: Aug-2023
  • (2023)Improving Actor–Critic Reinforcement Learning via Hamiltonian Monte Carlo MethodIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32156144:6(1642-1653)Online publication date: Dec-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media