Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1273496.1273585acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning state-action basis functions for hierarchical MDPs

Published: 20 June 2007 Publication History
  • Get Citation Alerts
  • Abstract

    This paper introduces a new approach to action-value function approximation by learning basis functions from a spectral decomposition of the state-action manifold. This paper extends previous work on using Laplacian bases for value function approximation by using the actions of the agent as part of the representation when creating basis functions. The approach results in a nonlinear learned representation particularly suited to approximating action-value functions, without incurring the wasteful duplication of state bases in previous work. We discuss two techniques to create state-action graphs: off-policy and on-policy. We show that these graphs have a greater expressive power and have better performance over state-based Laplacian basis functions in domains modeled as Semi-Markov Decision Processes (SMDPs). We present a simple graph partitioning method to scale the approach to large discrete MDPs.

    References

    [1]
    Chung, F. (1997). Spectral Graph Theory. Number 92 in CBMS Regional Conference Series in Mathematics. American Mathematical Society.
    [2]
    Chung, F. (2005). Laplacians and the Cheeger Inequailty for Directed Graphs. Annals of Combinatorics, 9, 1--19.
    [3]
    Golub, G., & Loan, C. V. (1989). Matrix Computations. Baltimore, MD: The Johns Hopkins University Press. 2nd edition.
    [4]
    Johns, J., & Mahadevan, S. (2007). Constructing basis functions from directed graphs for value function approximation. Proceedings of Twenty-fourth International Conference on Machine Learning (ICML).
    [5]
    Johns, J., Mahadevan, S., & Wang, C. (2007). Compact spectral bases for value function approximation using kronecker factorization. National Conference on Artificial Intelligence (AAAI).
    [6]
    Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20, 359--392.
    [7]
    Keller, P., Mannor, S., & Precup, D. (2006). Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning. New York, NY: ACM Press.
    [8]
    Mahadevan, S. (2005). Proto-Value Functions: Developmental Reinforcement Learning. Proceedings of the 22nd International Conference on Machine Learning (pp. 553--560). New York, NY: ACM Press.
    [9]
    Mahadevan, S., Maggioni, M., Ferguson, K., & Osentoski, S. (2006). Learning Representation and Control in Continuous Markov Decision Processes. Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
    [10]
    Meyer, C. (1989). Uncoupling the Perron Eigenvector Problem. Linear Algebra and its Applications, 114/115, 69--94.
    [11]
    Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank Citation Ranking: Bringing Order to the Web (Technical Report). Stanford University.
    [12]
    Precup, D., Sutton, R., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. Proceedings of the 17th International Conference on Machine Learning (pp. 759--766). Morgan Kaufmann.
    [13]
    Ravindran, B., & Barto, A. (2003). Smdp Homomorphisms: An Algebraic Approach to Abstraction in Semi Markov Decision Processes. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 03) (pp. 1011--1016). AAAI Press.
    [14]
    Smart, W. (2004). Explicit manifold representations for value-function approximation in reinforcement learning. Prceedings of the 8th International Symposium on Artificial Intelligence and mathematics.
    [15]
    Sutton, R., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181--211.

    Cited By

    View all
    • (2018)Manifold Regularized Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2017.265094329:4(932-943)Online publication date: May-2018
    • (2010)Basis function construction for hierarchical reinforcement learningProceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 110.5555/1838206.1838305(747-754)Online publication date: 10-May-2010
    • (2008)Transfer of task representation in reinforcement learning using policy-based proto-value functionsProceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 310.5555/1402821.1402864(1329-1332)Online publication date: 12-May-2008
    • Show More Cited By
    1. Learning state-action basis functions for hierarchical MDPs

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ICML '07: Proceedings of the 24th international conference on Machine learning
        June 2007
        1233 pages
        ISBN:9781595937933
        DOI:10.1145/1273496
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        • Machine Learning Journal

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 June 2007

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Article

        Conference

        ICML '07 & ILP '07
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 140 of 548 submissions, 26%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2018)Manifold Regularized Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2017.265094329:4(932-943)Online publication date: May-2018
        • (2010)Basis function construction for hierarchical reinforcement learningProceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 110.5555/1838206.1838305(747-754)Online publication date: 10-May-2010
        • (2008)Transfer of task representation in reinforcement learning using policy-based proto-value functionsProceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 310.5555/1402821.1402864(1329-1332)Online publication date: 12-May-2008
        • (2008)Representation Discovery using Harmonic AnalysisSynthesis Lectures on Artificial Intelligence and Machine Learning10.2200/S00130ED1V01Y200806AIM0042:1(1-147)Online publication date: Jan-2008
        • (2008)Geodesic Gaussian kernels for value function approximationAutonomous Robots10.1007/s10514-008-9095-625:3(287-304)Online publication date: 1-Oct-2008
        • (2007)Learning to plan using harmonic analysis of diffusion modelsProceedings of the Seventeenth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037176.3037206(224-231)Online publication date: 22-Sep-2007

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media