Markov Decision Process
3,190 Followers
Most cited papers in Markov Decision Process
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (MDPs) and... more
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of... more
We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully observable case and the partially... more
The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for... more
In this paper we describe PRISM, a tool being developed at the University of Birmingham for the analysis of probabilistic systems. PRISM supports two probabilistic models: continuous-time Markov chains and Markov decision processes.... more
This paper addresses the problem of streaming packetized media over a lossy packet network in a rate-distortion optimized way. We show that although the data units in a media presentation generally depend on each other according to a... more
The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively... more
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter... more
Typical Recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of gen- erating recommendations as a sequential deci- sion... more
Please scroll down for article-it is on subsequent pages With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.) and analytics professionals and students. INFORMS... more
In this paper, we propose a quantitative model for dialog systems that can be used for learning the dialog strategy. We claim that the problem of dialog design can be formalized as an optimization problem with an objective function... more
We propose a new approach to reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This new approach is motivated by the least-squares... more
Because of the slow progress in proving lower bounds on the circuit complexity of Boolean functions one is interested in restricted models of Boolean circuits like depth restricted circuits, decision trees, branching programs, width-k... more
Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We propose... more
Anthony Cassandra Computer Science Dept. Brown University Providence, RI 02912 arc@cs.brown.edu ... Michael L. Littman Dept. of Computer Science Duke University Durham, NC 27708-0129 mlittman@cs.duke.edu ... Nevin L. Zhang Computer... more
Dynamic power management schemes (also called policies) reduce the power consumption of complex electronic systems by trading off performance for power in a controlled fashion, taking system workload into account. In a power-managed... more
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination... more
Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While such traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large... more
Markov decision processes (MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate... more
The architecture for the Beyond 3rd Generation (B3G) or 4th Generation (4G) wireless networks aims to integrate various heterogeneous wireless access networks. One of the major design issues is the support of vertical handoff. Vertical... more
We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive... more
We formulate and analyze a Markov decision process (dynamic programming) model for airline seat allocation (yield management) on a single-leg flight with multiple fare classes. Unlike previous models, we allow cancellation, no-shows, and... more
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice,... more
In this paper we present efficient symbolic techniques for probabilistic model checking. These have been implemented in PRISM, a tool for the analysis of probabilistic models such as discrete-time Markov chains, continuous-time Markov... more
We present a technique for computing approximately optimal solutions to stochastic resource allocation problems modeled as Markov decision processes (MDPs). We exploit two key properties to avoid explicitly enumerating the very large... more
This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where... more
We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These "PAC-MDP" algorithms include the wellknown E 3 and R-MAX algorithms as well as the more recent... more
We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a... more
We investigate the computability of problems in probabilistic planning and partially observable in nite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic nite automata is adapted to... more
We present a new motion planning framework that explicitly considers uncertainty in robot motion to maximize the probability of avoiding collisions and successfully reaching a goal. In many motion planning applications ranging from... more
External control of a genetic regulatory network is used for the purpose of avoiding undesirable states, such as those associated with disease. Heretofore, intervention has focused on finite-horizon control, i.e., control over a small... more
The bidding decision making problem is studied from a supplier's viewpoint in a spot market environment. The decision-making problem is formulated as a Markov Decision Process -a discrete stochastic optimization method All other suppliers... more
We study the approximation of a small-noise Markov decision process x t = F (x t−1 , a t , ξ t ( )), t = 1, 2, . . . by means of its deterministic counterpart:
We review models for the optimal control of networks of queues, Our main emphasis is on models based on Markov decision theory and the characterization of the structure of optimal control policies.
This paper examines the value of real-time traffic information to optimal vehicle routing in a nonstationary stochastic network. We present a systematic approach to aid in the implementation of transportation systems integrated with real... more
W e consider a network revenue management problem where customers choose among open fare products according to some prespecified choice model. Starting with a Markov decision process (MDP) formulation, we approximate the value function... more
Many owners of growing privately-held firms make operational and financial decisions in an effort to maximize the expected present value of the proceeds from an Initial Public Offering (IPO). We ask: "What is the right time to make an... more
We consider the optimal production and inventory control of an assemble-to-order (ATO) system with m components, one end-product, and n customer classes. Demand from each class occurs continuously over time according to a Poisson process.... more