Article

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

Authors:

Olivier Sigaud,

Pierre-Henri WuilleminAuthors Info & Claims

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 257 - 264

https://doi.org/10.1145/1143844.1143877

Published: 25 June 2006 Publication History

Abstract

Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (FMDPs). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose SDYNA, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. SDYNA integrates incremental planning algorithms based on FMDPs with supervised learning techniques building structured representations of the problem. We describe SPITI, an instantiation of SDYNA, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm. We show that SPITI can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms by exploiting the generalization property of decision tree induction algorithms.

References

[1]

Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting Structure in Policy Construction. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) (pp. 1104--1111). Montreal.

Digital Library

[2]

Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic Dynamic Programming with Factored Representations. Artificial Intelligence, 121, 49--107.

Digital Library

[3]

Chickering, D. M., Heckerman, D., & Meek, C. (1997). A Bayesian Approach to Learning Bayesian Networks with Local Structure. Proceedings of the 13th International Conference on Uncertainty in Artificial Intelligence (pp. 80--89).

Digital Library

[4]

Dean, T., & Kanazawa, K. (1989). A Model for Reasoning about Persistence and Causation. Computational Intelligence, 5, 142--150.

Digital Library

[5]

Friedman, N., & Goldszmidt, M. (1998). Learning Bayesian Networks with Local Structure. Learning and Inference in Graphical Models. M. I. Jordan ed.

Digital Library

[6]

Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research, 19, 399--468.

[7]

Guestrin, C., Patrascu, R., & Schuurmans, D. (2002). Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs. ICML-2002 The Nineteenth International Conference on Machine Learning (pp. 235--242).

Digital Library

[8]

Hoey, J., St-Aubin, R., Hu, A., & Boutilier, C. (1999). SPUDD: Stochastic Planning using Decision Diagrams. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 279--288). Morgan Kaufmann.

Digital Library

[9]

McCallum, A. K. (1995). Reinforcement Learning with Selective Perception and Hidden State. Doctoral dissertation, Department of Computer Science, University of Rochester, USA.

Digital Library

[10]

Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1, 81--106.

Digital Library

[11]

Sallans, B., & Hinton, G. E. (2004). Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research, 5, 1063--1088.

Digital Library

[12]

St-Aubin, R., Hoey, J., & Boutilier, C. (2000). APRI-CODD: Approximate Policy Construction Using Decision Diagrams. NIPS (pp. 1089--1095).

[13]

Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning (pp. 216--224). San Mateo, CA. Morgan Kaufmann.

Digital Library

[14]

Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.

Digital Library

[15]

Utgoff, P. (1986). Incremental Induction of Decision Trees. Machine Learning, 4, 161--186.

Digital Library

[16]

Utgoff, P. E., Nerkman, N. C., & Clouse, J. A. (1997). Decision Tree Induction Based on Efficient Tree Restructuring. Machine Learning, 29, 5--44.

Digital Library

[17]

Watkins, C. J. C. H. (1989). Learning with Delayed Rewards. Doctoral dissertation, Psychology Department, University of Cambridge, England.

Cited By

Diaz-Vilor CAbdelhady AEltawil AJafarkhani H(2024)Multi-UAV Reinforcement Learning for Data Collection in Cellular MIMO NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2024.343022823:10(15462-15476)Online publication date: Oct-2024
https://doi.org/10.1109/TWC.2024.3430228
Xu JLiu BZhao XWang X(2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
https://doi.org/10.1016/j.ejor.2023.11.039
Glanois CWeng PZimmer MLi DYang THao JLiu W(2024)A survey on interpretable reinforcement learningMachine Learning10.1007/s10994-024-06543-w113:8(5847-5890)Online publication date: 19-Apr-2024
https://doi.org/10.1007/s10994-024-06543-w
Show More Cited By

Index Terms

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

Recommendations

Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

<P>A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs referred to, in general, as Markov decision problems or MDPs. ...
Reinforcement learning for factored markov decision processes
Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '06: Proceedings of the 23rd international conference on Machine learning

June 2006

1154 pages

ISBN:1595933832

DOI:10.1145/1143844

Program Chairs:
William Cohen,
Andrew Moore

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
816
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Diaz-Vilor CAbdelhady AEltawil AJafarkhani H(2024)Multi-UAV Reinforcement Learning for Data Collection in Cellular MIMO NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2024.343022823:10(15462-15476)Online publication date: Oct-2024
https://doi.org/10.1109/TWC.2024.3430228
Xu JLiu BZhao XWang X(2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
https://doi.org/10.1016/j.ejor.2023.11.039
Glanois CWeng PZimmer MLi DYang THao JLiu W(2024)A survey on interpretable reinforcement learningMachine Learning10.1007/s10994-024-06543-w113:8(5847-5890)Online publication date: 19-Apr-2024
https://doi.org/10.1007/s10994-024-06543-w
Milani SZhang ZTopin NShi ZKamhoua CPapalexakis EFang F(2023)MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement LearningMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26412-2_16(251-266)Online publication date: 17-Mar-2023
https://doi.org/10.1007/978-3-031-26412-2_16
Tournaire TJin YAghasaryan ACastel-Taleb HHyon E(2022)Factored Reinforcement Learning for Auto-scaling in Tandem QueuesNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789809(1-7)Online publication date: 25-Apr-2022
https://doi.org/10.1109/NOMS54207.2022.9789809
Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Tian YQian JSra SLarochelle HRanzato MHadsell RBalcan MLin H(2020)Towards minimax optimal reinforcement learning in factored Markov decision processesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497394(19896-19907)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497394
Zhao NLiu ZCheng YTian C(2020)Multi-Agent Actor Critic for Channel Allocation in Heterogeneous NetworksInternational Journal of Mobile Computing and Multimedia Communications10.4018/IJMCMC.202001010211:1(23-41)Online publication date: Jan-2020
https://doi.org/10.4018/IJMCMC.2020010102
Buffet OPietquin OWeng P(2020)Reinforcement LearningA Guided Tour of Artificial Intelligence Research10.1007/978-3-030-06164-7_12(389-414)Online publication date: 8-May-2020
https://doi.org/10.1007/978-3-030-06164-7_12
Simão TSpaan M(2019)Structure learning for safe policy improvementProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367521(3453-3459)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367521
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents