Article

Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes

Authors:

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy

Pages 595 - 599

Published: 22 May 2006 Publication History

Abstract

In this article, we consider a compact representation of multidimensional Markov Decision Processes based on Graphs (GMDP). The states and actions of a GMDP are multidimensional and attached to the vertices of a graph allowing the representation of local dynamics and rewards. This approach is in the line of approaches based on Dynamic Bayesian Networks. For policy optimisation, a direct application of the Policy Iteration algorithm, of exponential complexity in the number of nodes of the graph, is not possible for such high dimensional problems and we propose an approximate version of this algorithm derived from the GMDP representation. We do not try to approximate directly the value function, as usually done, but we rather propose an approximation of the occupation measure of the model, based on the mean field principle. Then, we use it to compute the value function and derive approximate policy evaluation and policy improvement methods. Their combination yields an approximate Policy Iteration algorithm of linear complexity in terms of the number of nodes of the graph. Comparisons with the optimal solution, when available, and with a naive short-term policy demonstrate the quality of the proposed procedure.

References

[1]

E. Altman, Constrained Markov Decision Processes, Chapman & Hall / CRC, 1999.

Google Scholar

[2]

C. Boutilier, R. Dearden, and M. Goldszmidt, 'Stochastic dynamic programming with factored representations', Artificial Intelligence, 121(1), 49-107, (2000).

Digital Library

Google Scholar

[3]

D. Chandler, Introduction to Modern Statistical Mechanics, Oxford University Press, 1987.

Google Scholar

[4]

R. K. Chornei, H. Daduna, and P. S. Knopov, 'Controlled markov fields with finite state space on graphs', Stochastic Models, (21), 847-874, (2005).

Crossref

Google Scholar

[5]

D. P. de Farias and B. Van Roy, 'On constraint sampling in the linear programming approach to approximate dynamic programming', Math. of Op. Research, 29(3), 462-478, (2004).

Digital Library

Google Scholar

[6]

N. Forsell and R. Sabbadin, 'Approximate linear-programming algorithms for gmdp', in Proc' ECAI'06., (2006).

Digital Library

Google Scholar

[7]

C. Guestrin, D. Koller, R. Parr, and S. Venkataraman, 'Efficient solution algorithms for factored MDPs', Journal of Artificial Intelligence Research, 19, 399-468, (2003).

Digital Library

Google Scholar

[8]

C. E. Guestrin, Planning under uncertainty in complex structured environments , Ph.D. dissertation, Stanford University, 2003.

Digital Library

Google Scholar

[9]

M. L. Puterman, Markov Decision Processes, John Wiley and Sons, New York, 1994.

Digital Library

Google Scholar

Cited By

View all

Higuera-Chan CJasso-Fuentes HMinjárez-Sosa J(2016)Discrete-Time Control for Systems of Interacting Objects with Unknown Random Disturbance DistributionsApplied Mathematics and Optimization10.1007/s00245-015-9312-674:1(197-227)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1007/s00245-015-9312-6
Forsell NSabbadin R(2006)Approximate linear-programming algorithms for graph-based Markov decision processesProceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy10.5555/1567016.1567144(590-594)Online publication date: 22-May-2006
https://dl.acm.org/doi/10.5555/1567016.1567144

Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic reasoning algorithms
    2. Probabilistic representations

Recommendations

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

This paper deals with a mean-variance problem for finite horizon semi-Markov decision processes. The state and action spaces are Borel spaces, while the reward function may be unbounded. The goal is to seek an optimal policy with minimal finite horizon ...
Mean Field Markov Decision Processes
Abstract
We consider mean-field control problems in discrete time with discounted reward, infinite time horizon and compact state and action space. The existence of optimal policies is shown and the limiting mean-field problem is derived when the number of ...
Policy Bounds for Markov Decision Processes

This paper demonstrates how a Markov decision process MDP can be approximated to generate a policy bound, i.e., a function that bounds the optimal policy from below or from above for all states. We present sufficient conditions for several ...

Comments

Information & Contributors

Information

Published In

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy

May 2006

865 pages

ISBN:1586036424

Editors:
Gerhard Brewka
Leipzig University, Germany
,
Silvia Coradeschi
Örebro University, Sweden
,
Anna Perini
SRA, ITC-irst, Trento, Italy
,
Paolo Traverso
SRA, ITC-irst, Trento, Italy

Publisher

IOS Press

Netherlands

Publication History

Published: 22 May 2006

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Higuera-Chan CJasso-Fuentes HMinjárez-Sosa J(2016)Discrete-Time Control for Systems of Interacting Objects with Unknown Random Disturbance DistributionsApplied Mathematics and Optimization10.1007/s00245-015-9312-674:1(197-227)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1007/s00245-015-9312-6
Forsell NSabbadin R(2006)Approximate linear-programming algorithms for graph-based Markov decision processesProceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy10.5555/1567016.1567144(590-594)Online publication date: 22-May-2006
https://dl.acm.org/doi/10.5555/1567016.1567144

Abstract

References

Cited By

Recommendations

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

Mean Field Markov Decision Processes

Policy Bounds for Markov Decision Processes

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations