Article

Learning state-action basis functions for hierarchical MDPs

Authors:

Sarah Osentoski,

Sridhar MahadevanAuthors Info & Claims

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 705 - 712

https://doi.org/10.1145/1273496.1273585

Published: 20 June 2007 Publication History

Abstract

This paper introduces a new approach to action-value function approximation by learning basis functions from a spectral decomposition of the state-action manifold. This paper extends previous work on using Laplacian bases for value function approximation by using the actions of the agent as part of the representation when creating basis functions. The approach results in a nonlinear learned representation particularly suited to approximating action-value functions, without incurring the wasteful duplication of state bases in previous work. We discuss two techniques to create state-action graphs: off-policy and on-policy. We show that these graphs have a greater expressive power and have better performance over state-based Laplacian basis functions in domains modeled as Semi-Markov Decision Processes (SMDPs). We present a simple graph partitioning method to scale the approach to large discrete MDPs.

References

[1]

Chung, F. (1997). Spectral Graph Theory. Number 92 in CBMS Regional Conference Series in Mathematics. American Mathematical Society.

[2]

Chung, F. (2005). Laplacians and the Cheeger Inequailty for Directed Graphs. Annals of Combinatorics, 9, 1--19.

[3]

Golub, G., & Loan, C. V. (1989). Matrix Computations. Baltimore, MD: The Johns Hopkins University Press. 2nd edition.

[4]

Johns, J., & Mahadevan, S. (2007). Constructing basis functions from directed graphs for value function approximation. Proceedings of Twenty-fourth International Conference on Machine Learning (ICML).

Digital Library

[5]

Johns, J., Mahadevan, S., & Wang, C. (2007). Compact spectral bases for value function approximation using kronecker factorization. National Conference on Artificial Intelligence (AAAI).

Digital Library

[6]

Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20, 359--392.

Digital Library

[7]

Keller, P., Mannor, S., & Precup, D. (2006). Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning. New York, NY: ACM Press.

Digital Library

[8]

Mahadevan, S. (2005). Proto-Value Functions: Developmental Reinforcement Learning. Proceedings of the 22nd International Conference on Machine Learning (pp. 553--560). New York, NY: ACM Press.

Digital Library

[9]

Mahadevan, S., Maggioni, M., Ferguson, K., & Osentoski, S. (2006). Learning Representation and Control in Continuous Markov Decision Processes. Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.

Digital Library

[10]

Meyer, C. (1989). Uncoupling the Perron Eigenvector Problem. Linear Algebra and its Applications, 114/115, 69--94.

[11]

Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank Citation Ranking: Bringing Order to the Web (Technical Report). Stanford University.

[12]

Precup, D., Sutton, R., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. Proceedings of the 17th International Conference on Machine Learning (pp. 759--766). Morgan Kaufmann.

Digital Library

[13]

Ravindran, B., & Barto, A. (2003). Smdp Homomorphisms: An Algebraic Approach to Abstraction in Semi Markov Decision Processes. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 03) (pp. 1011--1016). AAAI Press.

Digital Library

[14]

Smart, W. (2004). Explicit manifold representations for value-function approximation in reinforcement learning. Prceedings of the 8th International Symposium on Artificial Intelligence and mathematics.

[15]

Sutton, R., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181--211.

Digital Library

Cited By

Li HLiu DWang D(2018)Manifold Regularized Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2017.265094329:4(932-943)Online publication date: May-2018
https://doi.org/10.1109/TNNLS.2017.2650943
Osentoski SMahadevan SLuck MSen S(2010)Basis function construction for hierarchical reinforcement learningProceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 110.5555/1838206.1838305(747-754)Online publication date: 10-May-2010
https://dl.acm.org/doi/10.5555/1838206.1838305
Ferrante ELazaric ARestelli M(2008)Transfer of task representation in reinforcement learning using policy-based proto-value functionsProceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 310.5555/1402821.1402864(1329-1332)Online publication date: 12-May-2008
https://dl.acm.org/doi/10.5555/1402821.1402864
Show More Cited By

Learning state-action basis functions for hierarchical MDPs
1. Computing methodologies
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

Basis function construction for hierarchical reinforcement learning
AAMAS '10: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Much past work on solving Markov decision processes (MDPs) using reinforcement learning (RL) has relied on combining parameter estimation methods with hand-designed function approximation architectures for representing value functions. Recently, there ...
Bayes-adaptive hierarchical MDPs

Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) ...
Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
Special issue on Recent advances on machine learning and Cybernetics

As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '07: Proceedings of the 24th international conference on Machine learning

June 2007

1233 pages

ISBN:9781595937933

DOI:10.1145/1273496

Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Machine Learning Journal

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICML '07 & ILP '07

Sponsor:

ICML '07 & ILP '07: The 24th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 20 - 24, 2007

Oregon, Corvalis, USA

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
254
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li HLiu DWang D(2018)Manifold Regularized Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2017.265094329:4(932-943)Online publication date: May-2018
https://doi.org/10.1109/TNNLS.2017.2650943
Osentoski SMahadevan SLuck MSen S(2010)Basis function construction for hierarchical reinforcement learningProceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 110.5555/1838206.1838305(747-754)Online publication date: 10-May-2010
https://dl.acm.org/doi/10.5555/1838206.1838305
Ferrante ELazaric ARestelli M(2008)Transfer of task representation in reinforcement learning using policy-based proto-value functionsProceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 310.5555/1402821.1402864(1329-1332)Online publication date: 12-May-2008
https://dl.acm.org/doi/10.5555/1402821.1402864
Mahadevan S(2008)Representation Discovery using Harmonic AnalysisSynthesis Lectures on Artificial Intelligence and Machine Learning10.2200/S00130ED1V01Y200806AIM0042:1(1-147)Online publication date: Jan-2008
https://doi.org/10.2200/S00130ED1V01Y200806AIM004
Sugiyama MHachiya HTowell CVijayakumar S(2008)Geodesic Gaussian kernels for value function approximationAutonomous Robots10.1007/s10514-008-9095-625:3(287-304)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.1007/s10514-008-9095-6
Mahadevan SOsentoski SJohns JFerguson KWang C(2007)Learning to plan using harmonic analysis of diffusion modelsProceedings of the Seventeenth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037176.3037206(224-231)Online publication date: 22-Sep-2007
https://dl.acm.org/doi/10.5555/3037176.3037206

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents