Exploiting structure to efficiently solve large scale partially observable markov decision processes

January 2005

Author:
Pascal Poupart
University of Toronto (Canada)

Publisher:

University of Toronto
Computer Center Toronto, Ont. M5S 1A1
Canada

ISBN:978-0-494-02727-1

Order Number:AAINR02727

Pages:

144

Purchase on ProQuest

Bibliometrics

Abstract

Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: Large policy spaces and large state spaces.

On the other hand, for many real-world POMDPs it is possible to define effective policies with simple rules of thumb. This suggests that we may be able to find small policies that are near optimal. This thesis first presents a Bounded Policy Iteration (BPI) algorithm to robustly find a good policy represented by a small finite state controller. Real-world POMDPs also tend to exhibit structural properties that can be exploited to mitigate the effect of large state spaces. To that effect, a value-directed compression (VDC) technique is also presented to reduce POMDP models to lower dimensional representations.

In practice, it is critical to simultaneously mitigate the impact of complex policy representations and large state spaces. Hence, this thesis describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDC with Perseus (a randomized point-based value iteration algorithm by Spaan and Vlassis [136]), and state abstraction with Perseus. The scalability of those approaches is demonstrated on two problems with more than 33 million states: synthetic network management and a real-world system designed to assist elderly persons with cognitive deficiencies to carry out simple daily tasks such as hand-washing. This represents an important step towards the deployment of POMDP techniques in ever larger, real-world, sequential decision making problems.

Cited By

Contributors

Pascal Poupart
David R. Cheriton School of Computer Science
- Publication Years2000 - 2024
- Publication counts91
- Citation count681
- Available for Download33
- Downloads (cumulative)6,871
- Downloads (12 months)893
- Downloads (6 weeks)153
- Average Downloads per Article208
- Average Citation per Article7
View Full Profile

Index Terms

Comments

Recommendations

Partially Observable Risk-Sensitive Markov Decision Processes

We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon that is generated by a partially observable Markov decision process POMDP. In contrast to a risk-neutral decision ...
Robust partially observable Markov decision process
ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that ...
Approximate solution methods for partially observable markov and semi-markov decision processes

Browse Theses

Sections

Cited By

Index Terms

Partially Observable Risk-Sensitive Markov Decision Processes

Robust partially observable Markov decision process

Approximate solution methods for partially observable markov and semi-markov decision processes

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Partially Observable Risk-Sensitive Markov Decision Processes

Robust partially observable Markov decision process

Approximate solution methods for partially observable markov and semi-markov decision processes