Markov Decision Process Research Papers

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (MDPs) and... more

Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of... more

Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options-closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning. Formally, a set of options defined over an MDP constitutes a semi-Markov decision process (SMDP), and the theory of SMDPs provides the foundation for the theory of options. However, the most interesting issues concern the interplay between the underlying MDP and the SMDP and are thus beyond SMDP theory. We present results for three such cases: (1) we show that the results of planning with options can be used during execution to interrupt options and thereby perform even better than planned, (2) we introduce new intra-option methods that are able to learn about an option from fragments of its execution, and (3) we propose a notion of subgoal that can be used to improve the options themselves. All of these results have precursors in the existing literature; the contribution of this paper is to establish them in a simpler and more general setting with fewer changes to the existing reinforcement learning framework. In particular, we show that these results can be obtained without committing to (or ruling out) any particular approach to state abstraction, hierarchy, function approximation, or the macroutility problem. : S 0 0 0 4 -3 7 0 2 ( 9 9 ) 0 0 0 5 2 -1 182 R.S. Sutton et al. / Artificial Intelligence 112 (1999)

Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control... more

Bookmark
Download
- by Craig Boutilier and +1
  Cs Chair Tom Dean
- •
- 18
  Cognitive Science, Applied Mathematics, Artificial Intelligence, Control Theory

We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully observable case and the partially... more

Bookmark
Download
- by Sylvie C.W. Ong
- •
- 40
  Information Systems, Mechanical Engineering, Robotics, History

The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for... more

Bookmark
Download
- by Akash Omer
- •
- Markov Decision Process

In this paper we describe PRISM, a tool being developed at the University of Birmingham for the analysis of probabilistic systems. PRISM supports two probabilistic models: continuous-time Markov chains and Markov decision processes.... more

This paper addresses the problem of streaming packetized media over a lossy packet network in a rate-distortion optimized way. We show that although the data units in a media presentation generally depend on each other according to a... more

Bookmark
Download
- by Philip Chou
- •
- 19
  Engineering, Optimal Control, Video Coding, Channel Coding

The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively... more

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter... more

Typical Recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of gen- erating recommendations as a sequential deci- sion... more

Please scroll down for article-it is on subsequent pages With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.) and analytics professionals and students. INFORMS... more

In this paper, we propose a quantitative model for dialog systems that can be used for learning the dialog strategy. We claim that the problem of dialog design can be formalized as an optimization problem with an objective function... more

We propose a new approach to reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This new approach is motivated by the least-squares... more

Because of the slow progress in proving lower bounds on the circuit complexity of Boolean functions one is interested in restricted models of Boolean circuits like depth restricted circuits, decision trees, branching programs, width-k... more

Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We propose... more

Bookmark
Download
- by Craig Boutilier
- •
- 4
  Markov Decision Process, State Space, Value Iteration, Tree Structure

Anthony Cassandra Computer Science Dept. Brown University Providence, RI 02912 arc@cs.brown.edu ... Michael L. Littman Dept. of Computer Science Duke University Durham, NC 27708-0129 mlittman@cs.duke.edu ... Nevin L. Zhang Computer... more

Dynamic power management schemes (also called policies) reduce the power consumption of complex electronic systems by trading off performance for power in a controlled fashion, taking system workload into account. In a power-managed... more

We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination... more

Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While such traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large... more

Bookmark
Download
- by Craig Boutilier
- •
- 6
  Markov Decision Process, State Space, Policy Iteration, AI Planning

Markov decision processes (MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate... more

The architecture for the Beyond 3rd Generation (B3G) or 4th Generation (4G) wireless networks aims to integrate various heterogeneous wireless access networks. One of the major design issues is the support of vertical handoff. Vertical... more

Bookmark
Download
- by Vincent Wong
- •
- 26
  Engineering, Technology, Iterative Methods, Markov Processes

We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive... more

We formulate and analyze a Markov decision process (dynamic programming) model for airline seat allocation (yield management) on a single-leg flight with multiple fare classes. Unlike previous models, we allow cancellation, no-shows, and... more

Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice,... more

In this paper we present efficient symbolic techniques for probabilistic model checking. These have been implemented in PRISM, a tool for the analysis of probabilistic models such as discrete-time Markov chains, continuous-time Markov... more

We present a technique for computing approximately optimal solutions to stochastic resource allocation problems modeled as Markov decision processes (MDPs). We exploit two key properties to avoid explicitly enumerating the very large... more

Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed... more

Bookmark
Download
- by Victor Lesser and +1
  Shlomo Zilberstein
- •
- 7
  Cognitive Science, Applied Mathematics, Artificial Intelligence, Computational Complexity

This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where... more

We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These "PAC-MDP" algorithms include the wellknown E 3 and R-MAX algorithms as well as the more recent... more

We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a... more

We investigate the computability of problems in probabilistic planning and partially observable in nite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic nite automata is adapted to... more

We present a new motion planning framework that explicitly considers uncertainty in robot motion to maximize the probability of avoiding collisions and successfully reaching a goal. In many motion planning applications ranging from... more

Bookmark
Download
- by Victor Lesser and +1
  Shlomo Zilberstein
- •
- 5
  Markov Decision Process, Agent Coordination, Decision Problem, Cooperative Agents

External control of a genetic regulatory network is used for the purpose of avoiding undesirable states, such as those associated with disease. Heretofore, intervention has focused on finite-horizon control, i.e., control over a small... more

Bookmark
Download
- by Aniruddha Datta
- •
- 18
  Bioinformatics, Genetics, Pathology, Optimal Control

The bidding decision making problem is studied from a supplier's viewpoint in a spot market environment. The decision-making problem is formulated as a Markov Decision Process -a discrete stochastic optimization method All other suppliers... more

Bookmark
Download
- by chen liu
- •
- 14
  Environmental Economics, Game Theory, Decision Making, Production

We study the approximation of a small-noise Markov decision process x t = F (x t−1 , a t , ξ t ( )), t = 1, 2, . . . by means of its deterministic counterpart:

We review models for the optimal control of networks of queues, Our main emphasis is on models based on Markov decision theory and the characterization of the structure of optimal control policies.

Bookmark
Download
- by R R Weber
- •
- 8
  Applied Mathematics, Statistics, Optimal Control, Markov Decision Process

This paper examines the value of real-time traffic information to optimal vehicle routing in a nonstationary stochastic network. We present a systematic approach to aid in the implementation of transportation systems integrated with real... more

Bookmark
Download
- by Mark A . Lewis
- •
- 28
  Civil Engineering, Information Technology, Routing, Transportation

W e consider a network revenue management problem where customers choose among open fare products according to some prespecified choice model. Starting with a Markov decision process (MDP) formulation, we approximate the value function... more

Bookmark
Download
- by Hugo Simao
- •
- 20
  Applied Mathematics, Statistics, Behavior, Modeling

Many owners of growing privately-held firms make operational and financial decisions in an effort to maximize the expected present value of the proceeds from an Initial Public Offering (IPO). We ask: "What is the right time to make an... more

Older adults with dementia often cannot remember how to complete activities of daily living and require a caregiver to aid them through the steps involved. The use of a computerized guidance system could potentially reduce the reliance on... more

Bookmark
Download
- by Craig Boutilier and +2
  Jesse Hoey
  Jennifer Boger
- •
- 21
  Engineering, Computer Science, Artificial Intelligence, Information Technology

We consider the optimal production and inventory control of an assemble-to-order (ATO) system with m components, one end-product, and n customer classes. Demand from each class occurs continuously over time according to a Poisson process.... more

Markov Decision Process

Log In