Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
Jun 12, 2017 · We present a behavior policy search algorithm and empirically demonstrate its effectiveness in lowering the mean squared error of policy ...
We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to.
Jun 12, 2017 · We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is ...
Aug 6, 2017 · We present a behavior policy search algorithm and empirically demonstrate its effectiveness in lowering the mean squared error of policy ...
People also ask
Data-efficient Policy Evaluation Through Behavior Policy Search. JOSIAH HANNA ... search with Behavior Policy Gradient (BPG) to Monte Carlo policy evaluation ...
We present a behavior policy search algorithm and empirically demonstrate its effectiveness in lowering the mean squared error of policy performance estimates.
A novel policy evaluation sub-problem is proposed, behavior policy search: searching for a behavior policy that reduces mean squared error, and it is shown ...
Aug 11, 2017 · Dive into the research topics of 'Data-Efficient Policy Evaluation Through Behavior Policy Search'. Together they form a unique fingerprint.
We present a behavior policy search algorithm and empirically demonstrate its effectiveness in lowering the mean squared error of policy performance estimates.
Aug 8, 2017 · Data-efficient Policy Evaluation through Behavior Policy Search. 1 ... Data-efficient Policy Evaluation through Behavior Policy Search. 20.