Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2074094.2074120guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Hierarchical solution of Markov decision processes using macro-actions

Published: 24 July 1998 Publication History
  • Get Citation Alerts
  • Abstract

    We investigate the use of temporally abstract actions, or macro-actions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macro-actions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macro-actions only, and that significantly reduces the size of the state space. This is achieved by treating macroactions as local policies that act in certain regions of state space, and by restricting states in the abstract MDP to those at the boundaries of regions. The abstract MDP approximates the original and can be solved more efficiently. We discuss several ways in which macro-actions can be generated to ensure good solution quality. Finally, we consider ways in which macro-actions can be reused to solve multiple, related MDPs; and we show that this can justify the computational overhead of macro-action generation.

    References

    [1]
    R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, 1957.
    [2]
    D. P. Bertsekas and J. N. Tsitsiklis. Neuro-dynamic Programming . Athena, 1996.
    [3]
    C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. IJCAI-95, pp.1104-1111, Montreal, 1995.
    [4]
    T. Dean and R. Givan. Model minimization in Markov decision processes. AAAI-97, pp.106-111, Providence, 1997.
    [5]
    T. Dean, L. P. Kaelbling, J. Kirman, and A. Nicholson. Planning under time constraints in stochastic domains. Artif. In-tell. , 76:35-74, 1995.
    [6]
    T. Dean and S.-H. Lin. Decomposition techniques for planning in stochastic domains. IJCAI-95, pp. 1121-1127, Montreal, 1995.
    [7]
    R. Dearden and C. Boutilier. Abstraction and approximate decision theoretic planning. Artif. Intell., 89:219-283, 1997.
    [8]
    R. Fikes, P. Hart, and N. Nilsson. Learning and executing generalized robot plans: Artif. Intell., 3:251-288, 1972.
    [9]
    J. P. Forestier, P. Varaiya. Multilayer control of large Markov chains. IEEE Trans. on Aut. Control, 23:298-304, 1978.
    [10]
    M. Hauskrecht. Planning with temporally abstract actions. Technical report, CS-98-01, Brown University, Providence, 1998.
    [11]
    R. A. Howard. Dynamic Programming and Markov Processes . MIT Press, 1960.
    [12]
    L. Pack Kaelbling. Hierarchical reinforcement learning: Preliminary results. ICML-93, pp.167-173, Amherst, 1993.
    [13]
    R. Korf. Macro-operators: A weak method for leaming. Artif. Intell., 26:35-77, 1985.
    [14]
    H. J. Kushner and C.-H. Chen. Decomposition of systems governed by Markov chains. IEEE Trans. Automatic Control , 19(5):501-507, 1974.
    [15]
    J.E. Laird, A. Newell, P. S. Rosenbloom. SOAR: An architecture for general intelligence. Art. Intell., 33:1-64, 1987.
    [16]
    S. Minton. Selectively generalizing plans for problem solving. IJCAI-85, pp.596-599, Boston, 1985.
    [17]
    R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In M. Mozer, M. Jordan, T. Petsche, eds., NIPS-11. MIT Press, 1998.
    [18]
    R. Parr. Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Processes. In this proceedings, 1998.
    [19]
    R. Parr. Hierarchical control and learning with hierarchies of machines. Chapters 1-3, under preparation, 1998.
    [20]
    D. Precup and R. S. Sutton. Multi-time models for temporally abstract planning. In M. Mozer, M. Jordan, and T. Petsche, eds., NIPS-11. MIT Press, 1998.
    [21]
    D. Precup, R. S. Sutton, and S. Singh. Theoretical results on reinforcement learning with temporally abstract behaviors. 10th Eur. Conf. Mach. Learn., Chemnitz, 1998.
    [22]
    M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994.
    [23]
    R. S. Sutton. TD models: Modeling the world at a mixture of time scales. In ICML-95, pp.531-539, Lake Tahoe, 1995.
    [24]
    S. Thmn and A. Schwartz. Finding structure in reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, eds., NIPS-7, pp.385-392, MIT Press, 1995.

    Cited By

    View all
    • (2019)Advantage amplification in slowly evolving latent-state environmentsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367481(3165-3172)Online publication date: 10-Aug-2019
    • (2018)Constructing Temporal Abstractions Autonomously in Reinforcement LearningAI Magazine10.1609/aimag.v39i1.278039:1(39-50)Online publication date: 1-Mar-2018
    • (2018)An Introduction to Deep Reinforcement LearningFoundations and Trends® in Machine Learning10.1561/220000007111:3-4(219-354)Online publication date: 20-Dec-2018
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    UAI'98: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
    July 1998
    538 pages
    ISBN:155860555X

    Sponsors

    • NEC
    • HUGIN: Hugin Expert A/S
    • Information Extraction and Transportation
    • Microsoft Research: Microsoft Research
    • AT&T: AT&T Labs Research

    Publisher

    Morgan Kaufmann Publishers Inc.

    San Francisco, CA, United States

    Publication History

    Published: 24 July 1998

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)2

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Advantage amplification in slowly evolving latent-state environmentsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367481(3165-3172)Online publication date: 10-Aug-2019
    • (2018)Constructing Temporal Abstractions Autonomously in Reinforcement LearningAI Magazine10.1609/aimag.v39i1.278039:1(39-50)Online publication date: 1-Mar-2018
    • (2018)An Introduction to Deep Reinforcement LearningFoundations and Trends® in Machine Learning10.1561/220000007111:3-4(219-354)Online publication date: 20-Dec-2018
    • (2017)A deep hierarchical approach to lifelong learning in minecraftProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298239.3298465(1553-1561)Online publication date: 4-Feb-2017
    • (2017)Online decision-making for scalable autonomous systemsProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171837.3171954(4768-4774)Online publication date: 19-Aug-2017
    • (2017)Deliberation for autonomous robotsArtificial Intelligence10.1016/j.artint.2014.11.003247:C(10-44)Online publication date: 1-Jun-2017
    • (2016)Adaptive Skills Adaptive Partitions (ASAP)Proceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157096.3157275(1596-1604)Online publication date: 5-Dec-2016
    • (2016)MDPs with unawareness in roboticsProceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence10.5555/3020948.3021013(627-636)Online publication date: 25-Jun-2016
    • (2016)Planning under uncertainty for aggregated electric vehicle charging with renewable energy supplyProceedings of the Twenty-second European Conference on Artificial Intelligence10.3233/978-1-61499-672-9-904(904-912)Online publication date: 29-Aug-2016
    • (2015)Online Planning for Large Markov Decision Processes with Hierarchical DecompositionACM Transactions on Intelligent Systems and Technology10.1145/27173166:4(1-28)Online publication date: 15-Jul-2015
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media