Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1567016.1567145guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes

Published: 22 May 2006 Publication History

Abstract

In this article, we consider a compact representation of multidimensional Markov Decision Processes based on Graphs (GMDP). The states and actions of a GMDP are multidimensional and attached to the vertices of a graph allowing the representation of local dynamics and rewards. This approach is in the line of approaches based on Dynamic Bayesian Networks. For policy optimisation, a direct application of the Policy Iteration algorithm, of exponential complexity in the number of nodes of the graph, is not possible for such high dimensional problems and we propose an approximate version of this algorithm derived from the GMDP representation. We do not try to approximate directly the value function, as usually done, but we rather propose an approximation of the occupation measure of the model, based on the mean field principle. Then, we use it to compute the value function and derive approximate policy evaluation and policy improvement methods. Their combination yields an approximate Policy Iteration algorithm of linear complexity in terms of the number of nodes of the graph. Comparisons with the optimal solution, when available, and with a naive short-term policy demonstrate the quality of the proposed procedure.

References

[1]
E. Altman, Constrained Markov Decision Processes, Chapman & Hall / CRC, 1999.
[2]
C. Boutilier, R. Dearden, and M. Goldszmidt, 'Stochastic dynamic programming with factored representations', Artificial Intelligence, 121(1), 49-107, (2000).
[3]
D. Chandler, Introduction to Modern Statistical Mechanics, Oxford University Press, 1987.
[4]
R. K. Chornei, H. Daduna, and P. S. Knopov, 'Controlled markov fields with finite state space on graphs', Stochastic Models, (21), 847-874, (2005).
[5]
D. P. de Farias and B. Van Roy, 'On constraint sampling in the linear programming approach to approximate dynamic programming', Math. of Op. Research, 29(3), 462-478, (2004).
[6]
N. Forsell and R. Sabbadin, 'Approximate linear-programming algorithms for gmdp', in Proc' ECAI'06., (2006).
[7]
C. Guestrin, D. Koller, R. Parr, and S. Venkataraman, 'Efficient solution algorithms for factored MDPs', Journal of Artificial Intelligence Research, 19, 399-468, (2003).
[8]
C. E. Guestrin, Planning under uncertainty in complex structured environments , Ph.D. dissertation, Stanford University, 2003.
[9]
M. L. Puterman, Markov Decision Processes, John Wiley and Sons, New York, 1994.

Cited By

View all
  • (2016)Discrete-Time Control for Systems of Interacting Objects with Unknown Random Disturbance DistributionsApplied Mathematics and Optimization10.1007/s00245-015-9312-674:1(197-227)Online publication date: 1-Aug-2016
  • (2006)Approximate linear-programming algorithms for graph-based Markov decision processesProceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy10.5555/1567016.1567144(590-594)Online publication date: 22-May-2006
  1. Mean Field Approximation of the Policy Iteration Algorithm for Graph-based Markov Decision Processes

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
      May 2006
      865 pages
      ISBN:1586036424

      Publisher

      IOS Press

      Netherlands

      Publication History

      Published: 22 May 2006

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Discrete-Time Control for Systems of Interacting Objects with Unknown Random Disturbance DistributionsApplied Mathematics and Optimization10.1007/s00245-015-9312-674:1(197-227)Online publication date: 1-Aug-2016
      • (2006)Approximate linear-programming algorithms for graph-based Markov decision processesProceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy10.5555/1567016.1567144(590-594)Online publication date: 22-May-2006

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media