Abstract
Solving large scale sequential decision making problems without prior knowledge of the state transition model is a key problem in the planning literature. One approach to tackle this problem is to learn the state transition model online using limited observed measurements. We present an adaptive function approximator (incremental Feature Dependency Discovery (iFDD)) that grows the set of features online to approximately represent the transition model. The approach leverages existing feature-dependencies to build a sparse representation of the state transition model. Theoretical analysis and numerical simulations in domains with state space sizes varying from thousands to millions are used to illustrate the benefit of using iFDD for incrementally building transition models in a planning framework.
Chapter PDF
Similar content being viewed by others
Keywords
- Transition Model
- Markov Decision Process
- State Transition Model
- Adaptive Dynamic Program
- Adaptive Planning
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Singh, S.P., Barto, A.G., Bradtke, S.J.: Learning to act using real-timedynamicprogramming. Artificial Intelligience 72(1-2), 81–138 (1995)
Asmuth, J., Li, L., Littman, M., Nouri, A., Wingate, D.: A Bayesian sampling approach to exploration in reinforcement learning. In: Proceedings of the Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2009), pp. 19–26. AUAI Press, Corvallis (2009)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. I-II. Athena Scientific, Belmont (2007)
Bertuccelli, L.F., How, J.P.: Robust Markov decision processes using sigma point sampling. In: American Control Conference (ACC), June 11-13, pp. 5003–5008 (2008)
Bertuccelli, L.F.: Robust Decision-Making with Model Uncertainty in Aerospace Systems. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (September 2008)
Bethke, B., How, J.P., Vian, J.: Multi-UAV Persistent Surveillance With Communication Constraints and Health Management. In: AIAA Guidance, Navigation, and Control Conference (GNC) (August 2009) (AIAA 2009-5654)
Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research (JMLR) 3, 213–231 (2001)
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press (2010)
Delage, E., Mannor, S.: Percentile Optimization for Markov Decision Processes with Parameter Uncertainty. Subm. to Operations Research (2007)
Vasquez, T.F.D., Laugier, C.: Incremental Learning of Statistical Motion Patterns With Growing Hidden Markov Models. IEEE Transcations on Intelligent Transportation Systems 10(3) (2009)
Fox, E.B.: Bayesian Nonparametric Learning of Complex Dynamical Phenomena. PhD thesis, Massachusetts Institute of Technology, Cambridge MA (December 2009)
Geramifard, A.: Practical Reinforcement Learning Using Representation Learning and Safe Exploration for Large Scale Markov Decision Processes. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics (February 2012)
Geramifard, A., Doshi, F., Redding, J., Roy, N., How, J.: Online discovery of feature dependencies. In: Getoor, L., Scheffer, T. (eds.) International Conference on Machine Learning (ICML), pp. 881–888. ACM, New York (2011)
Gullapalli, V., Barto, A.: Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms. In: Neural Information Processing Systems, NIPS (1994)
Iyengar, G.: Robust Dynamic Programming. Math. Oper. Res. 30(2), 257–280 (2005)
Jilkov, V., Li, X.: Online Bayesian Estimation of Transition Probabilities for Markovian Jump Systems. IEEE Trans. on Signal Processing 52(6) (2004)
Joseph, J., Doshi-Velez, F., Huang, A.S., Roy, N.: A Bayesian nonparametric approach to modeling motion patterns. Autonomous Robots 31(4), 383–400 (2011)
Kushner, H.J., George Yin, G.: Convergence of indirect adaptive asynchronous value iteration algorithms. Springer (2003)
Liu, W., Pokharel, P.P., Principe, J.C.: The kernel least-mean-square algorithm. IEEE Transactions on Signal Processing 56(2), 543–554 (2008)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Nigam, N., Kroo, I.: Persistent surveillance using multiple unmanned air vehicles. In: 2008 IEEE Aerospace Conference, pp. 1–14 (March 2008)
Nilim, A., El Ghaoui, L.: Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices. Operations Research 53(5) (2005)
Puterman, M.L.: Markov Decision Processes: Stochastic Dynamic Programming. John Wiley and Sons (1994)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Redding, J.D., Toksoz, T., Kemal Ure, N., Geramifard, A., How, J.P., Vavrina, M., Vian, J.: Persistent distributed multi-agent missions with automated battery management. In: AIAA Guidance, Navigation, and Control Conference (GNC) (August 2011) (AIAA-2011-6480)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)
Ryan, A., Hedrick, J.K.: A mode-switching path planner for uav-assisted search and rescue. In: 44th IEEE Conference on Decision and Control, 2005 and 2005 European Control Conference, CDC-ECC 2005, pp. 1471–1476 (December 2005)
Shapiro, A., Wardi, Y.: Convergence Analysis of Gradient Descent Stochastic Algorithms. Journal of Optimization Theory and Applications 91(2), 439–454 (1996)
Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pp. 528–536 (2008)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Tozer, T.C., Grace, D.: High-altitude platforms for wireless communications. Electronics Communication Engineering Journal 13(3), 127–137 (2001)
Paul, E.: Constructive function approximation. In: Feature extraction, construction, and selection: A data-mining perspective (1998)
Yao, H., Sutton, R.S., Bhatnagar, S., Dongcui, D., Szepesvári, C.: Multi-step dyna planning for policy evaluation and control. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS, pp. 2187–2195. Curran Associates, Inc. (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ure, N.K., Geramifard, A., Chowdhary, G., How, J.P. (2012). Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-33486-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33485-6
Online ISBN: 978-3-642-33486-3
eBook Packages: Computer ScienceComputer Science (R0)