Abstract
Problem formulation is often an important first step for solving a problem effectively. In sequential decision problems, Markov decision process (MDP) ([2]; [22]) is a model formulation that has been commonly used, due to its generality, flexibility, and applicability to a wide range of problems. Despite these advantages, there are three necessary conditions that must be satisfied before the MDP model can be applied; that is,
-
1.
The environment model is given in advance (a completely-known environment).
-
2.
The environment states are completely observable (fully-observable states, implying a Markovian environment).
-
3.
The environment parameters do not change over time (a stationary environment).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. E. Bellman, (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.
R. E. Bellman, (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6:679–684.
J. A. Boyan and M. L. Littman, (1994). Packet routing in dynamically changing networks: a reinforcement learning approach. In Advances in Neural Information Processing Systems 6, pages 671–678, San Mateo, California. Morgan Kaugmann.
A. R. Cassandra, M. L. Littman, and N. Zhang, (1997) Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes. In Uncertainty in Artificial Intelligence, Providence, RI.
H.-T. Cheng, (1988). Algorithms for Partially Observable Markov Decision Processes. PhD thesis, University of British Columbia, British Columbia, Canada.
S. P. M. Choi, (2000). Reinforcement Learning in Nonstationary Environments. PhD thesis, Hong Kong University of Science and Technology, Department of Computer Science, HKUST, Clear Water Bay, Hong Kong, China, Jan.
S. P. M. Choi, D. Y. Yeung, and N. L. Zhang, (1999). An environment model for nonstationary reinforcement learning. In Advances in Neural Information Processing Systems 12. To appear.
L. Chrisman, (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In AAAI-92.
R. H. Crites and A. G. Barto, (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8.
P. Dayan and T. J. Sejnowski, (1996). Exploration bonuses and dual control. Machine Learning, 25(1):5–22, Oct.
T. Jaakkola, S. P. Singh, and M. I. Jordan, (1995). Monte-Carlo reinforcement learning in non-Markovian decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, MA. The MIT Press.
L. P. Kaelbling, M. L. Littman, and A. W. Moore, (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237–285, May.
L. J. Lin and T. M. Mitchell, (1992). Memory approaches to reinforcement learning in non-Markovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science.
M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, (1995a). Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362–370, San Francisco, CA. Morgan Kaufmann.
M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, (1995b). Efficient dynamicprogramming updates in partially observable Markov decision processes. Technical Report TR CS-95-19, Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA.
M. L. Littman and D. H. Ackley, (1991). Adaptation in constant utility non-stationary environments. In R. K. Belew and L. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 136–142, San Mateo, CA, Dec. Morgan Kaufmann.
W. S. Lovejoy, (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28:47–66.
A. McCallum, (1993). Overcoming incomplete perception with utile distinction memory. In Tenth International Machine Learning Conference, Amherst, MA.
A. McCallum, (1995). Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Dec.
G. E. Monahan, (1982). A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Science, 28:1–16.
C. H. Papadimitriou and J. N. Tsitsiklis (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441–450.
M. L. Puterman (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons.
L. R. Rabiner, (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), Feb.
J. H. Schmidhuber (1990). Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 500–506, San Mateo, CA. Morgan Kaufmann.
S. Singh and D. P. Bertsekas, (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 9, 1997.
E. J. Sondik, (1971). The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University, Stanford, California, USA.
R. S. Sutton, (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216–224. Morgan Kaufmann.
R. S. Sutton and A. G. Barto, (1998). Reinforcement Learning: An Introduction. The MIT Press.
C. C. White III, (1991). Partially observed markov decision processes: A survey. Annals of Operations Research, 32.
N. L. Zhang, S. S. Lee, and W. Zhang, (1999). A method for speeding up value iteration in partially observable markov decision processes. In Proceeding of 15th Conference on Uncertainties in Artificial Intelligence.
N. L. Zhang and W. Liu, (1997). A model approximation scheme for planning in partially observable stochastic domains. Journal of Artificial Intelligence Research, 7:199–230.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Choi, S.P.M., Yeung, DY., Zhang, N.L. (2000). Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_12
Download citation
DOI: https://doi.org/10.1007/3-540-44565-X_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41597-8
Online ISBN: 978-3-540-44565-4
eBook Packages: Springer Book Archive