Article

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Authors:

Marc Toussaint,

Amos StorkeyAuthors Info & Claims

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 945 - 952

https://doi.org/10.1145/1143844.1143963

Published: 25 June 2006 Publication History

Get Access

Abstract

Inference in Markov Decision Processes has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. A particularly interesting aspect of the approach is that any existing inference technique in DBNs now becomes available for answering behavioral question--including those on continuous, factorial, or hierarchical state representations. Here we present an Expectation Maximization algorithm for computing optimal policies. Unlike previous approaches we can show that this actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time. The algorithm is generic in that any inference technique can be utilized in the E-step. We demonstrate this for exact inference on a discrete maze and Gaussian belief state propagation in continuous stochastic optimal control problems.

References

[1]

Attias, H. (2003). Planning by probabilistic inference. Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics.

Google Scholar

[2]

Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI 1995) (pp. 1104--1111).

Digital Library

Google Scholar

[3]

Bui, H., Venkatesh, S., & West, G. (2002). Policy recognition in the abstract hidden markov models. Journal of Artificial Intelligence Research, 17, 451--499.

Digital Library

Google Scholar

[4]

Ghahramani, Z., & Jordan, M. I. (1995). Factorial hidden Markov models. Advances in Neural Information Processing Systems, NIPS (pp. 472--478). MIT Press.

Digital Library

Google Scholar

[5]

Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research (JAIR), 19, 399--468.

Digital Library

Google Scholar

[6]

Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. Proc. of Uncertainty in Artificial Intelligence (UAI 1998) (pp. 220--229).

Digital Library

Google Scholar

[7]

Koller, D., & Parr, R. (1999). Computing factored value functions for policies in structured MDPs. Proc. of the 16th Int. Joint Conf. on Artificial Intelligence (IJCAI) (pp. 1332--1339).

Digital Library

Google Scholar

[8]

Kveton, B., & Hauskrecht, M. (2005). An MCMC approach to solving hybrid factored MDPs. Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI 2005).

Digital Library

Google Scholar

[9]

Minka, T. (2001). A family of algorithms for approximate bayesian inference. PhD thesis, MIT.

Digital Library

Google Scholar

[10]

Murphy, K. (2002). Dynamic bayesian networks: Representation, inference and learning. PhD Thesis, UC Berkeley, Computer Science Division. See particularly the chapter on DBN at http://www.cs.ubc.ca/murphyk/Papers/dbnchapter.pdf.

Digital Library

Google Scholar

[11]

Ng, A. Y., Parr, R., & Koller, D. (1999). Policy search via density estimation. Advances in Neural Information Processing Systems 12 (pp. 1022--1028).

Google Scholar

[12]

Sutton, R., & Barto, A. (1998). Reinforcement learning. MIT Press, Cambridge.

Digital Library

Google Scholar

[13]

Verma, D., & Rao, R. P. N. (2006). Goal-based imitation as probabilistic inference over graphical models. Advances in Neural Information Processing Systems 18 (NIPS 2005).

Google Scholar

Cited By

View all

Kumar ADastani MSichman JAlechina NDignum V(2024)Difference of Convex Functions Programming for Policy Optimization in Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663153(2339-2341)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663153
Lefebvre T(2024)A Probabilistic Treatment of (PO)MDPs with Multiplicative Reward Structure2024 European Control Conference (ECC)10.23919/ECC64448.2024.10591009(1-7)Online publication date: 25-Jun-2024
https://doi.org/10.23919/ECC64448.2024.10591009
Power TBerenson D(2024)Constrained Stein Variational Trajectory OptimizationIEEE Transactions on Robotics10.1109/TRO.2024.342842840(3602-3619)Online publication date: 2024
https://doi.org/10.1109/TRO.2024.3428428
Show More Cited By

Index Terms

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Recommendations

Continuous-Time Markov Decision Processes with State-Dependent Discount Factors

We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are ...
Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...

Comments

Information & Contributors

Information

Published In

ICML '06: Proceedings of the 23rd international conference on Machine learning

June 2006

1154 pages

ISBN:1595933832

DOI:10.1145/1143844

Program Chairs:
William Cohen,
Andrew Moore

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;

Overall Acceptance Rate 140 of 548 submissions, 26%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

106
Total Citations
View Citations
1,051
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)5

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kumar ADastani MSichman JAlechina NDignum V(2024)Difference of Convex Functions Programming for Policy Optimization in Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663153(2339-2341)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663153
Lefebvre T(2024)A Probabilistic Treatment of (PO)MDPs with Multiplicative Reward Structure2024 European Control Conference (ECC)10.23919/ECC64448.2024.10591009(1-7)Online publication date: 25-Jun-2024
https://doi.org/10.23919/ECC64448.2024.10591009
Power TBerenson D(2024)Constrained Stein Variational Trajectory OptimizationIEEE Transactions on Robotics10.1109/TRO.2024.342842840(3602-3619)Online publication date: 2024
https://doi.org/10.1109/TRO.2024.3428428
Power TBerenson D(2024)Learning a Generalizable Trajectory Sampling Distribution for Model Predictive ControlIEEE Transactions on Robotics10.1109/TRO.2024.337002640(2111-2127)Online publication date: 2024
https://doi.org/10.1109/TRO.2024.3370026
Lefebvre T(2024)Probabilistic control and majorisation of optimal controlSystems & Control Letters10.1016/j.sysconle.2024.105837190(105837)Online publication date: Aug-2024
https://doi.org/10.1016/j.sysconle.2024.105837
Sang JAhmad Khan ZYin HWang Y(2023)Reward shaping using directed graph convolution neural networks for reinforcement learning and gamesFrontiers in Physics10.3389/fphy.2023.131046711Online publication date: 9-Nov-2023
https://doi.org/10.3389/fphy.2023.1310467
Priorelli MStoianov I(2023)Flexible intentions: An Active Inference theoryFrontiers in Computational Neuroscience10.3389/fncom.2023.112869417Online publication date: 20-Mar-2023
https://doi.org/10.3389/fncom.2023.1128694
Mallick PChen Z(2023)Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation MaximizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.319024634:9(5268-5282)Online publication date: Sep-2023
https://doi.org/10.1109/TNNLS.2022.3190246
Bai CLiu PLiu KWang LZhao YHan LWang Z(2023)Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312916034:8(4776-4790)Online publication date: Aug-2023
https://doi.org/10.1109/TNNLS.2021.3129160
Xu DFekri F(2023)Improving Actor–Critic Reinforcement Learning via Hamiltonian Monte Carlo MethodIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32156144:6(1642-1653)Online publication date: Dec-2023
https://doi.org/10.1109/TAI.2022.3215614
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Continuous-Time Markov Decision Processes with State-Dependent Discount Factors

Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

Comments

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Continuous-Time Markov Decision Processes with State-Dependent Discount Factors

Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations