Article

Decomposition techniques for planning in stochastic domains

Authors:

Shieu-Hong LinAuthors Info & Claims

IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Pages 1121 - 1127

Published: 20 August 1995 Publication History

Abstract

This paper is concerned with modeling planning problems involving uncertainty as discrete-time, finite-state stochastic automata Solving planning problems is reduced to computing policies for Markov decision processes. Classical methods for solving Markov decision processes cannot cope with the size of the state spaces for typical problems encountered in practice. As an alternative, we investigate methods that decompose global planning problems into a number of local problems solve the local problems separately and then combine the local solutions to generate a global solution. We present algorithms that decompose planning problems into smaller problems given an arbitrary partition of the state space. The local problems are interpreted as Markov decision processes and solutions to the local problems are interpreted as policies restricted to the subsets of the state space defined by the partition. One algorithm relies on constructing and solving an abstract version of the original decision problem. A second algorithm iteratively approximates parameters of the local problems to converge to an optimal solution. We show how properties of a specified partition affect the time and storage required for these algorithms.

References

[1]

{Bellman 1961} Richard Bellman Adaptive Control Processes Princeton University Press Princeton New Jersey, 1961.

[2]

{Boutilier et al, 1995} Craig Boutilier Richard Dearden and Moises Goldszmidt Exploiting structure in policy construction In Proceedings of the 1995 International Joint Conference on Artificial Intelligence, 1995.

[3]

{Caines and Wang 1990} Peter E Caines and S Wang COCOLOG A conditional observer and controller logic for finite machines In Proceedings of the 29th IEEE Conference on Decision and Control Hawaii 1990.

[4]

{Chvatal, 1980} Vasek Chvatal Linear Programmming W H Freeman and Company, 1980.

[5]

{Dantzig and Wolfe, 1960} George Dantzig and Philip Wolfe Decomposition principle for dynamic programs Operations Research 8(1) 101-111, 1900.

[6]

{Dean and Kanazawa, 1989} Thomas Dean and Keiji Kanazawa A model for reasoning about persistence and causation Computational Intelligence, 5(3) 112-150, 1989.

[7]

{Dean and Lin 1995} Thomas Dean and Shieu-Hong Lin Decomposition techniques for planning in stochastic domains Technical Report CS-95-08, Brown University Department of Computer Science 1995.

[8]

{Dean et al, 1993} Thomas Dean, Leslie Kaelbling, Jak Kirman, and Ann Nicholson Planning with deadlines in stochastic domains In Proceedings AAAI 93, pages 574-579 AAAI, 1993.

[9]

{Dean et al, 1995} Thomas Dean, Leslie Kaelbling, Jak Kirman aud Ann Nicholson Planning under time constraints in stochastic domains To appear in Artificial Intelligence, 1995.

[10]

{D'Epenoux, 1963} F D'Epenoux Sur un probleme de production et de stockage dans l'aleatoire Management Science, 10 98-108, 1963.

[11]

{Derman, 1970} Cyrus Derman Finite State Markovian Decision Processes Cambridge University Press New York, 1970.

[12]

{Fikes and Nilsson 1971} Richard Fikes and Nils J Nilsson Strip* A new approach to the application of theorem proving to problem solving Artificial Intelligence, 2 189-208, 1971.

[13]

{Howard 1960} Ronald A Howard Dynamic Programming and Markov Processes MIT Press, Cambridge, Massachusetts 1960.

[14]

{Kaelbling, 1993} Leslie Park Kaelbling Hierarchical learning in stochastic domains A preliminary report In Proceedings Tenth International Conference on Machine Learning, 1993.

[15]

{Knoblock, 1991} Craig A Knoblock Search reduction in hierarchical problem solving In Proceedings AAAI-91, pages 686-691 AAAI, 1991.

[16]

{Korf, 1985} Richard Korf Macro-operators a weak method for learning Artificial Intelligence 26-35 77 1985.

[17]

{Kushner and Chen 1974} Harold J Kushner and Ching-Hui Chen Decomposition of systems governed by Markov chains IEEE Transactions on Automatic Control AC-19(5) 501-507, 1974.

[18]

{Kushner and Kleinman, 1971} H J Kushner and A J Kleinman Mathematical programming and the control of Markov chains International Journal on Control 13(5) 801-820, 1971.

[19]

{Lasdon, 1970} Leon S Lasdon Optimization Theory for Large Systems. Macmillan Company 1970.

[20]

{Lin and Dean 1991} Shreu-Hong Lin and Thomas Dean Exploiting locality in temporal reasoning In E Sandewall and C Backstrom edilors Current Trends in AI Planning, Amsterdam, 1994 IOS Press.

[21]

{Moore and Atkeson, 1995} Andrew W Moore and Christopher G Atkeson The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spares To appear in Machine Learning 1995.

[22]

{MS Bezaraa, 1990} H D Sherah MS Bazaraa J Harvis Linear Programming and Network Flours john Wiley & Sons New York 1990.

[23]

{Papadimitriou and Tsitsiklis 1987} Christos H Papadimitriou and John N Tsitsiklis The complexity of Markov chain decision processes Mathematics of Operations Research 12(3)441-460 1987.

[24]

{Puterman, 1994} Martin L Puterman Markov Decision Processes John Wiley & Sons, New York 1994.

[25]

{Sacerdott, 1974} Earl Sacerdott Planning in a hierarchy of abstraction spaces Artificial Intelligence 7 231-272 1974.

[26]

{Zhong and Wonham, 1990} H Zhong and W M Wonham On the consistency of hierarchical supervision in discrete-event systems IEEE Transactions on automatic Control, 35(10) 1125-1134, 1990.

Cited By

Chafik SLarach ADaoui C(2018)Parallel Hierarchical Pre-Gauss-Seidel Value Iteration AlgorithmInternational Journal of Decision Support System Technology10.4018/IJDSST.201804010110:2(1-22)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.4018/IJDSST.2018040101
Kamar EHorvitz EWeiss GYolum PBordini RElkind E(2015)Planning for Crowdsourcing Hierarchical TasksProceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems10.5555/2772879.2773301(1191-1199)Online publication date: 4-May-2015
https://dl.acm.org/doi/10.5555/2772879.2773301
Mousavi SGhazanfari BMozayani NJahed-Motlagh M(2014)Automatic abstraction controller in reinforcement learning agent via automataApplied Soft Computing10.1016/j.asoc.2014.08.07125:C(118-128)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1016/j.asoc.2014.08.071
Show More Cited By

Decomposition techniques for planning in stochastic domains
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
    2. Stochastic processes

Recommendations

Decomposition Techniques for Planning in Stochastic Domains
Planning and acting in partially observable stochastic domains

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable ...
Planning and Acting in Partially Observable Stochastic Domains

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

August 1995

2077 pages

ISBN:1558603638

Sponsors

Societe Canadienne pour I'etude de I intelligence par ordinateur
Canadian Society for Computational Studies of Intelligence
AAAI: American Association for Artificial Intelligence
The International Joint Conferences on Artificial Intelligence, Inc.

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 20 August 1995

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chafik SLarach ADaoui C(2018)Parallel Hierarchical Pre-Gauss-Seidel Value Iteration AlgorithmInternational Journal of Decision Support System Technology10.4018/IJDSST.201804010110:2(1-22)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.4018/IJDSST.2018040101
Kamar EHorvitz EWeiss GYolum PBordini RElkind E(2015)Planning for Crowdsourcing Hierarchical TasksProceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems10.5555/2772879.2773301(1191-1199)Online publication date: 4-May-2015
https://dl.acm.org/doi/10.5555/2772879.2773301
Mousavi SGhazanfari BMozayani NJahed-Motlagh M(2014)Automatic abstraction controller in reinforcement learning agent via automataApplied Soft Computing10.1016/j.asoc.2014.08.07125:C(118-128)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1016/j.asoc.2014.08.071
Schultink ECavallo RParkes D(2008)Economic hierarchical Q-learningProceedings of the 23rd national conference on Artificial intelligence - Volume 210.5555/1620163.1620179(689-695)Online publication date: 13-Jul-2008
https://dl.acm.org/doi/10.5555/1620163.1620179
Meuleau NBrafman RBenazera E(2006)Stochastic over-subscription planning using hierarchies of MDPsProceedings of the Sixteenth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037104.3037121(121-130)Online publication date: 6-Jun-2006
https://dl.acm.org/doi/10.5555/3037104.3037121
Marthi BRussell SAndre D(2006)A compact, hierarchically optimal Q-function decompositionProceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence10.5555/3020419.3020460(332-340)Online publication date: 13-Jul-2006
https://dl.acm.org/doi/10.5555/3020419.3020460
Jonsson ABarto A(2006)Causal Graph Based Decomposition of Factored MDPsThe Journal of Machine Learning Research10.5555/1248547.12486287(2259-2301)Online publication date: 1-Dec-2006
https://dl.acm.org/doi/10.5555/1248547.1248628
McMahan HGordon G(2005)Fast exact planning in Markov decision processesProceedings of the Fifteenth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037062.3037082(151-160)Online publication date: 5-Jun-2005
https://dl.acm.org/doi/10.5555/3037062.3037082
Nair RTambe M(2005)Hybrid BDI-POMDP framework for multiagent teamingJournal of Artificial Intelligence Research10.5555/1622503.162251223:1(367-420)Online publication date: 1-Apr-2005
https://dl.acm.org/doi/10.5555/1622503.1622512
Zhang WZhang N(2005)Restricted value iterationJournal of Artificial Intelligence Research10.5555/1622503.162250723:1(123-165)Online publication date: 1-Feb-2005
https://dl.acm.org/doi/10.5555/1622503.1622507
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents