Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
Jan 9, 2012 · This algorithm generates the splitting policies in a way that each pair of consecutive policies differs at exactly one state. The results are ...
as a convex combination of the occupancy measures of stationary policies, each selecting deterministic actions on the given.
This paper studies a discrete-time total-reward Markov decision process (MDP) with a given initial state distribution. A. (randomized) stationary policy can ...
An efficient algorithm is provided that presents the occupancy measure of a given policy as a convex combination of the occupancy measures of finitely many ...
Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes. E. Feinberg, and U. Rothblum. Math. Oper. Res., 37 (1): 129-153 (2012 ) ...
People also ask
If this is possible for a given policy, we say that the policy can be split. In particular, we are interested in splitting a randomized stationary policy into ( ...
In this paper, we investigate a Markov decision process with constraints on a Borel state space with the expected total reward criterion.
Apr 28, 2015 · The initial state distribution is fixed. According to, for a given randomized stationary policy, its occupation measure as a convex combination ...
Rothblum "Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes" Mathematics of Operations Research , v.37 , 2012 , p.129. E.A. ...
This paper presents three conditions. Each of them guarantees the uniqueness of optimal policies of discounted Markov decision processes. The conditions ...