Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery

Ure, N. Kemal; Geramifard, Alborz; Chowdhary, Girish; How, Jonathan P.

doi:10.1007/978-3-642-33486-3_7

N. Kemal Ure²¹,
Alborz Geramifard²¹,
Girish Chowdhary²¹ &
…
Jonathan P. How²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7524))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5312 Accesses
10 Citations

Abstract

Solving large scale sequential decision making problems without prior knowledge of the state transition model is a key problem in the planning literature. One approach to tackle this problem is to learn the state transition model online using limited observed measurements. We present an adaptive function approximator (incremental Feature Dependency Discovery (iFDD)) that grows the set of features online to approximately represent the transition model. The approach leverages existing feature-dependencies to build a sparse representation of the state transition model. Theoretical analysis and numerical simulations in domains with state space sizes varying from thousands to millions are used to illustrate the benefit of using iFDD for incrementally building transition models in a planning framework.

Download to read the full chapter text

Chapter PDF

Probabilistic System Modeling for Complex Systems Operating in Uncertain Environments

Complexity Bounds for Deterministic Partially Observed Markov Decision Processes

Article 30 October 2024

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Singh, S.P., Barto, A.G., Bradtke, S.J.: Learning to act using real-timedynamicprogramming. Artificial Intelligience 72(1-2), 81–138 (1995)
Article Google Scholar
Asmuth, J., Li, L., Littman, M., Nouri, A., Wingate, D.: A Bayesian sampling approach to exploration in reinforcement learning. In: Proceedings of the Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2009), pp. 19–26. AUAI Press, Corvallis (2009)
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. I-II. Athena Scientific, Belmont (2007)
Google Scholar
Bertuccelli, L.F., How, J.P.: Robust Markov decision processes using sigma point sampling. In: American Control Conference (ACC), June 11-13, pp. 5003–5008 (2008)
Google Scholar
Bertuccelli, L.F.: Robust Decision-Making with Model Uncertainty in Aerospace Systems. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (September 2008)
Google Scholar
Bethke, B., How, J.P., Vian, J.: Multi-UAV Persistent Surveillance With Communication Constraints and Health Management. In: AIAA Guidance, Navigation, and Control Conference (GNC) (August 2009) (AIAA 2009-5654)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research (JMLR) 3, 213–231 (2001)
MathSciNet Google Scholar
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press (2010)
Google Scholar
Delage, E., Mannor, S.: Percentile Optimization for Markov Decision Processes with Parameter Uncertainty. Subm. to Operations Research (2007)
Google Scholar
Vasquez, T.F.D., Laugier, C.: Incremental Learning of Statistical Motion Patterns With Growing Hidden Markov Models. IEEE Transcations on Intelligent Transportation Systems 10(3) (2009)
Google Scholar
Fox, E.B.: Bayesian Nonparametric Learning of Complex Dynamical Phenomena. PhD thesis, Massachusetts Institute of Technology, Cambridge MA (December 2009)
Google Scholar
Geramifard, A.: Practical Reinforcement Learning Using Representation Learning and Safe Exploration for Large Scale Markov Decision Processes. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics (February 2012)
Google Scholar
Geramifard, A., Doshi, F., Redding, J., Roy, N., How, J.: Online discovery of feature dependencies. In: Getoor, L., Scheffer, T. (eds.) International Conference on Machine Learning (ICML), pp. 881–888. ACM, New York (2011)
Google Scholar
Gullapalli, V., Barto, A.: Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms. In: Neural Information Processing Systems, NIPS (1994)
Google Scholar
Iyengar, G.: Robust Dynamic Programming. Math. Oper. Res. 30(2), 257–280 (2005)
Article MathSciNet MATH Google Scholar
Jilkov, V., Li, X.: Online Bayesian Estimation of Transition Probabilities for Markovian Jump Systems. IEEE Trans. on Signal Processing 52(6) (2004)
Google Scholar
Joseph, J., Doshi-Velez, F., Huang, A.S., Roy, N.: A Bayesian nonparametric approach to modeling motion patterns. Autonomous Robots 31(4), 383–400 (2011)
Article Google Scholar
Kushner, H.J., George Yin, G.: Convergence of indirect adaptive asynchronous value iteration algorithms. Springer (2003)
Google Scholar
Liu, W., Pokharel, P.P., Principe, J.C.: The kernel least-mean-square algorithm. IEEE Transactions on Signal Processing 56(2), 543–554 (2008)
Article MathSciNet Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Google Scholar
Nigam, N., Kroo, I.: Persistent surveillance using multiple unmanned air vehicles. In: 2008 IEEE Aerospace Conference, pp. 1–14 (March 2008)
Google Scholar
Nilim, A., El Ghaoui, L.: Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices. Operations Research 53(5) (2005)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Stochastic Dynamic Programming. John Wiley and Sons (1994)
Google Scholar
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Redding, J.D., Toksoz, T., Kemal Ure, N., Geramifard, A., How, J.P., Vavrina, M., Vian, J.: Persistent distributed multi-agent missions with automated battery management. In: AIAA Guidance, Navigation, and Control Conference (GNC) (August 2011) (AIAA-2011-6480)
Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)
Google Scholar
Ryan, A., Hedrick, J.K.: A mode-switching path planner for uav-assisted search and rescue. In: 44th IEEE Conference on Decision and Control, 2005 and 2005 European Control Conference, CDC-ECC 2005, pp. 1471–1476 (December 2005)
Google Scholar
Shapiro, A., Wardi, Y.: Convergence Analysis of Gradient Descent Stochastic Algorithms. Journal of Optimization Theory and Applications 91(2), 439–454 (1996)
Article MathSciNet MATH Google Scholar
Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pp. 528–536 (2008)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Tozer, T.C., Grace, D.: High-altitude platforms for wireless communications. Electronics Communication Engineering Journal 13(3), 127–137 (2001)
Article Google Scholar
Paul, E.: Constructive function approximation. In: Feature extraction, construction, and selection: A data-mining perspective (1998)
Google Scholar
Yao, H., Sutton, R.S., Bhatnagar, S., Dongcui, D., Szepesvári, C.: Multi-step dyna planning for policy evaluation and control. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS, pp. 2187–2195. Curran Associates, Inc. (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, (MIT) Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, USA
N. Kemal Ure, Alborz Geramifard, Girish Chowdhary & Jonathan P. How

Authors

N. Kemal Ure
View author publications
You can also search for this author in PubMed Google Scholar
Alborz Geramifard
View author publications
You can also search for this author in PubMed Google Scholar
Girish Chowdhary
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan P. How
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK
Peter A. Flach
Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road,, BS8 1UB, Bristol, UK
Tijl De Bie & Nello Cristianini &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ure, N.K., Geramifard, A., Chowdhary, G., How, J.P. (2012). Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-33486-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33485-6
Online ISBN: 978-3-642-33486-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics