Model-Based Online Learning of POMDPs

Shani, Guy; Brafman, Ronen I.; Shimony, Solomon E.

doi:10.1007/11564096_35

Guy Shani²³,
Ronen I. Brafman²³ &
Solomon E. Shimony²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3720))

Included in the following conference series:

European Conference on Machine Learning

Abstract

Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free methods — methods that learn a policy without learning a model of the world. When sensor noise increases, model-free methods provide less accurate policies. The model-based approach — learning a POMDP model of the world, and computing an optimal policy for the learned model — may generate superior results in the presence of sensor noise, but learning and solving a model of the environment is a difficult problem. We have previously shown how such a model can be obtained from the learned policy of model-free methods, but this approach implies a distinction between a learning phase and an acting phase that is undesirable. In this paper we present a novel method for learning a POMDP model online, based on McCallums’ Utile Suffix Memory (USM), in conjunction with an approximate policy obtained using an incremental POMDP solver. We show that the incrementally improving policy provides superior results to the original USM algorithm, especially in the presence of increasing sensor and action noise.

Download to read the full chapter text

Chapter PDF

Potential-based reward shaping for finite horizon online POMDP planning

Article 05 March 2015

Learning Explainable and Better Performing Representations of POMDP Strategies

Information gathering in POMDPs using active inference

Article 07 November 2024

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1962)
MATH Google Scholar
Bilmes, J.: A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical Report ICSI-TR-97-021 (1997)
Google Scholar
Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: AAAI 1994, pp. 1023–1028 (1994)
Google Scholar
Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: AAAI 2002, pp. 183–188 (1992)
Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)
MATH Google Scholar
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: Scaling up. In: ICML 1995 (1995)
Google Scholar
McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester (1996)
Google Scholar
Meuleau, N., Peshkin, L., Kim, K., Kaelbling, L.P.: Learning finite-state controllers for partially observable environments. In: UAI 1999, pp. 427–436 (1999)
Google Scholar
Nikovski, D.: State-Aggregation Algorithms for Learning Probabilistic Models for Robot Control. PhD thesis, Carnegie Mellon University (2002)
Google Scholar
Shani, G., Brafman, R.I.: Resolving perceptual aliasing in the presence of noisy sensors. In: NIPS’17 (2004)
Google Scholar
Shani, G., Brafman, R.I., Shimony, S.E.: Partial observability under noisy sensors — from model-free to model-based. In: ICML RRfRL Workshop (2005)
Google Scholar
Spaan, M.T.J., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. Technical Report IAS-UVA-04-02, University of Amsterdam (2004)
Google Scholar
Wierstra, D., Wiering, M.: Utile distinction hidden markov models. In: ICML (July 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Ben-Gurion University, Beer-Sheva, Israel
Guy Shani, Ronen I. Brafman & Solomon E. Shimony

Authors

Guy Shani
View author publications
You can also search for this author in PubMed Google Scholar
Ronen I. Brafman
View author publications
You can also search for this author in PubMed Google Scholar
Solomon E. Shimony
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shani, G., Brafman, R.I., Shimony, S.E. (2005). Model-Based Online Learning of POMDPs. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_35

Download citation

DOI: https://doi.org/10.1007/11564096_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Model-Based Online Learning of POMDPs

Abstract

Chapter PDF

Similar content being viewed by others

Potential-based reward shaping for finite horizon online POMDP planning

Learning Explainable and Better Performing Representations of POMDP Strategies

Information gathering in POMDPs using active inference

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Model-Based Online Learning of POMDPs

Abstract

Chapter PDF

Similar content being viewed by others

Potential-based reward shaping for finite horizon online POMDP planning

Learning Explainable and Better Performing Representations of POMDP Strategies

Information gathering in POMDPs using active inference

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation