Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1143844.1143877acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

Published: 25 June 2006 Publication History

Abstract

Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (FMDPs). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose SDYNA, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. SDYNA integrates incremental planning algorithms based on FMDPs with supervised learning techniques building structured representations of the problem. We describe SPITI, an instantiation of SDYNA, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm. We show that SPITI can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms by exploiting the generalization property of decision tree induction algorithms.

References

[1]
Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting Structure in Policy Construction. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) (pp. 1104--1111). Montreal.
[2]
Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic Dynamic Programming with Factored Representations. Artificial Intelligence, 121, 49--107.
[3]
Chickering, D. M., Heckerman, D., & Meek, C. (1997). A Bayesian Approach to Learning Bayesian Networks with Local Structure. Proceedings of the 13th International Conference on Uncertainty in Artificial Intelligence (pp. 80--89).
[4]
Dean, T., & Kanazawa, K. (1989). A Model for Reasoning about Persistence and Causation. Computational Intelligence, 5, 142--150.
[5]
Friedman, N., & Goldszmidt, M. (1998). Learning Bayesian Networks with Local Structure. Learning and Inference in Graphical Models. M. I. Jordan ed.
[6]
Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research, 19, 399--468.
[7]
Guestrin, C., Patrascu, R., & Schuurmans, D. (2002). Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs. ICML-2002 The Nineteenth International Conference on Machine Learning (pp. 235--242).
[8]
Hoey, J., St-Aubin, R., Hu, A., & Boutilier, C. (1999). SPUDD: Stochastic Planning using Decision Diagrams. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 279--288). Morgan Kaufmann.
[9]
McCallum, A. K. (1995). Reinforcement Learning with Selective Perception and Hidden State. Doctoral dissertation, Department of Computer Science, University of Rochester, USA.
[10]
Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1, 81--106.
[11]
Sallans, B., & Hinton, G. E. (2004). Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research, 5, 1063--1088.
[12]
St-Aubin, R., Hoey, J., & Boutilier, C. (2000). APRI-CODD: Approximate Policy Construction Using Decision Diagrams. NIPS (pp. 1089--1095).
[13]
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning (pp. 216--224). San Mateo, CA. Morgan Kaufmann.
[14]
Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
[15]
Utgoff, P. (1986). Incremental Induction of Decision Trees. Machine Learning, 4, 161--186.
[16]
Utgoff, P. E., Nerkman, N. C., & Clouse, J. A. (1997). Decision Tree Induction Based on Efficient Tree Restructuring. Machine Learning, 29, 5--44.
[17]
Watkins, C. J. C. H. (1989). Learning with Delayed Rewards. Doctoral dissertation, Psychology Department, University of Cambridge, England.

Cited By

View all
  • (2024)Multi-UAV Reinforcement Learning for Data Collection in Cellular MIMO NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2024.343022823:10(15462-15476)Online publication date: Oct-2024
  • (2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
  • (2024)A survey on interpretable reinforcement learningMachine Learning10.1007/s10994-024-06543-w113:8(5847-5890)Online publication date: 19-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;
Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)5
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-UAV Reinforcement Learning for Data Collection in Cellular MIMO NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2024.343022823:10(15462-15476)Online publication date: Oct-2024
  • (2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
  • (2024)A survey on interpretable reinforcement learningMachine Learning10.1007/s10994-024-06543-w113:8(5847-5890)Online publication date: 19-Apr-2024
  • (2023)MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement LearningMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26412-2_16(251-266)Online publication date: 17-Mar-2023
  • (2022)Factored Reinforcement Learning for Auto-scaling in Tandem QueuesNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789809(1-7)Online publication date: 25-Apr-2022
  • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
  • (2020)Towards minimax optimal reinforcement learning in factored Markov decision processesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497394(19896-19907)Online publication date: 6-Dec-2020
  • (2020)Multi-Agent Actor Critic for Channel Allocation in Heterogeneous NetworksInternational Journal of Mobile Computing and Multimedia Communications10.4018/IJMCMC.202001010211:1(23-41)Online publication date: Jan-2020
  • (2020)Reinforcement LearningA Guided Tour of Artificial Intelligence Research10.1007/978-3-030-06164-7_12(389-414)Online publication date: 8-May-2020
  • (2019)Structure learning for safe policy improvementProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367521(3453-3459)Online publication date: 10-Aug-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media