Article

Free access

Hierarchical solution of Markov decision processes using macro-actions

Authors:

Milos Hauskrecht,

Nicolas Meuleau,

Leslie Pack Kaelbling,

Thomas Dean, and

Craig BoutilierAuthors Info & Claims

UAI'98: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

July 1998

Pages 220 - 229

Published: 24 July 1998 Publication History

PDF eReader Publisher Site

Abstract

We investigate the use of temporally abstract actions, or macro-actions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macro-actions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macro-actions only, and that significantly reduces the size of the state space. This is achieved by treating macroactions as local policies that act in certain regions of state space, and by restricting states in the abstract MDP to those at the boundaries of regions. The abstract MDP approximates the original and can be solved more efficiently. We discuss several ways in which macro-actions can be generated to ensure good solution quality. Finally, we consider ways in which macro-actions can be reused to solve multiple, related MDPs; and we show that this can justify the computational overhead of macro-action generation.

References

[1]

R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, 1957.

Digital Library

[2]

D. P. Bertsekas and J. N. Tsitsiklis. Neuro-dynamic Programming . Athena, 1996.

Digital Library

[3]

C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. IJCAI-95, pp.1104-1111, Montreal, 1995.

Digital Library

[4]

T. Dean and R. Givan. Model minimization in Markov decision processes. AAAI-97, pp.106-111, Providence, 1997.

Digital Library

[5]

T. Dean, L. P. Kaelbling, J. Kirman, and A. Nicholson. Planning under time constraints in stochastic domains. Artif. In-tell. , 76:35-74, 1995.

Digital Library

[6]

T. Dean and S.-H. Lin. Decomposition techniques for planning in stochastic domains. IJCAI-95, pp. 1121-1127, Montreal, 1995.

Digital Library

[7]

R. Dearden and C. Boutilier. Abstraction and approximate decision theoretic planning. Artif. Intell., 89:219-283, 1997.

Digital Library

[8]

R. Fikes, P. Hart, and N. Nilsson. Learning and executing generalized robot plans: Artif. Intell., 3:251-288, 1972.

Digital Library

[9]

J. P. Forestier, P. Varaiya. Multilayer control of large Markov chains. IEEE Trans. on Aut. Control, 23:298-304, 1978.

[10]

M. Hauskrecht. Planning with temporally abstract actions. Technical report, CS-98-01, Brown University, Providence, 1998.

[11]

R. A. Howard. Dynamic Programming and Markov Processes . MIT Press, 1960.

[12]

L. Pack Kaelbling. Hierarchical reinforcement learning: Preliminary results. ICML-93, pp.167-173, Amherst, 1993.

[13]

R. Korf. Macro-operators: A weak method for leaming. Artif. Intell., 26:35-77, 1985.

Digital Library

[14]

H. J. Kushner and C.-H. Chen. Decomposition of systems governed by Markov chains. IEEE Trans. Automatic Control , 19(5):501-507, 1974.

[15]

J.E. Laird, A. Newell, P. S. Rosenbloom. SOAR: An architecture for general intelligence. Art. Intell., 33:1-64, 1987.

Digital Library

[16]

S. Minton. Selectively generalizing plans for problem solving. IJCAI-85, pp.596-599, Boston, 1985.

Digital Library

[17]

R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In M. Mozer, M. Jordan, T. Petsche, eds., NIPS-11. MIT Press, 1998.

Digital Library

[18]

R. Parr. Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Processes. In this proceedings, 1998.

[19]

R. Parr. Hierarchical control and learning with hierarchies of machines. Chapters 1-3, under preparation, 1998.

[20]

D. Precup and R. S. Sutton. Multi-time models for temporally abstract planning. In M. Mozer, M. Jordan, and T. Petsche, eds., NIPS-11. MIT Press, 1998.

Digital Library

[21]

D. Precup, R. S. Sutton, and S. Singh. Theoretical results on reinforcement learning with temporally abstract behaviors. 10th Eur. Conf. Mach. Learn., Chemnitz, 1998.

Digital Library

[22]

M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994.

Digital Library

[23]

R. S. Sutton. TD models: Modeling the world at a mixture of time scales. In ICML-95, pp.531-539, Lake Tahoe, 1995.

[24]

S. Thmn and A. Schwartz. Finding structure in reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, eds., NIPS-7, pp.385-392, MIT Press, 1995.

Cited By

Mladenov MMeshi OOoi JSchuurmans DBoutilier C(2019)Advantage amplification in slowly evolving latent-state environmentsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367481(3165-3172)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367481
Bacon PPrecup D(2018)Constructing Temporal Abstractions Autonomously in Reinforcement LearningAI Magazine10.1609/aimag.v39i1.278039:1(39-50)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1609/aimag.v39i1.2780
François-Lavet VHenderson PIslam RBellemare MPineau J(2018)An Introduction to Deep Reinforcement LearningFoundations and Trends® in Machine Learning10.1561/220000007111:3-4(219-354)Online publication date: 20-Dec-2018
https://dl.acm.org/doi/10.1561/2200000071
Show More Cited By

Index Terms

Hierarchical solution of Markov decision processes using macro-actions

Recommendations

Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions

This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes Dec-POMDPs are general models for ...
Read More
Variability Sensitive Markov Decision Processes

Considered are time-average Markov Decision Processes MDPs with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in ...
Read More
Solution Procedures for Partially Observed Markov Decision Processes

We present three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process POMDP. Each algorithm integrates a successive approximations algorithm for the POMDP due to A. Smallwood and E. Sondik ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

UAI'98: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

July 1998

538 pages

ISBN:155860555X

Editors:
Gregory F. Cooper
University of Pittsburgh, Pittsburgh, Pennsylvania
,
Serafín Moral
Universidad de Granada, Granada, Spain

Sponsors

NEC
HUGIN: Hugin Expert A/S
Information Extraction and Transportation
Microsoft Research: Microsoft Research
AT&T: AT&T Labs Research

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 24 July 1998

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
65
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Other Metrics

View Author Metrics

Citations

Cited By

Mladenov MMeshi OOoi JSchuurmans DBoutilier C(2019)Advantage amplification in slowly evolving latent-state environmentsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367481(3165-3172)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367481
Bacon PPrecup D(2018)Constructing Temporal Abstractions Autonomously in Reinforcement LearningAI Magazine10.1609/aimag.v39i1.278039:1(39-50)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1609/aimag.v39i1.2780
François-Lavet VHenderson PIslam RBellemare MPineau J(2018)An Introduction to Deep Reinforcement LearningFoundations and Trends® in Machine Learning10.1561/220000007111:3-4(219-354)Online publication date: 20-Dec-2018
https://dl.acm.org/doi/10.1561/2200000071
Tessler CGivony SZahavy TMankowitz DMannor SSingh SMarkovitch S(2017)A deep hierarchical approach to lifelong learning in minecraftProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298239.3298465(1553-1561)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3298239.3298465
Wray KWitwicki SZilberstein S(2017)Online decision-making for scalable autonomous systemsProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171837.3171954(4768-4774)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3171837.3171954
(2017)Deliberation for autonomous robotsArtificial Intelligence10.1016/j.artint.2014.11.003247:C(10-44)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1016/j.artint.2014.11.003
Mankowitz DMann TMannor S(2016)Adaptive Skills Adaptive Partitions (ASAP)Proceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157096.3157275(1596-1604)Online publication date: 5-Dec-2016
https://dl.acm.org/doi/10.5555/3157096.3157275
Rong NHalpern JSaxena A(2016)MDPs with unawareness in roboticsProceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence10.5555/3020948.3021013(627-636)Online publication date: 25-Jun-2016
https://dl.acm.org/doi/10.5555/3020948.3021013
Walraven ESpaan M(2016)Planning under uncertainty for aggregated electric vehicle charging with renewable energy supplyProceedings of the Twenty-second European Conference on Artificial Intelligence10.3233/978-1-61499-672-9-904(904-912)Online publication date: 29-Aug-2016
https://dl.acm.org/doi/10.3233/978-1-61499-672-9-904
Bai AWu FChen X(2015)Online Planning for Large Markov Decision Processes with Hierarchical DecompositionACM Transactions on Intelligent Systems and Technology10.1145/27173166:4(1-28)Online publication date: 15-Jul-2015
https://dl.acm.org/doi/10.1145/2717316
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents