State-action similarity-based representations for off-policy evaluation
Abstract
References
Recommendations
A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
AbstractOn-policy deep reinforcement learning (DRL) has the inherent advantage of using multi-step interaction data for policy learning. However, on-policy DRL still faces challenges in improving the sample efficiency of policy evaluations. Therefore, we ...
META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent SystemsTemporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with ...
An offline-to-online reinforcement learning approach based on multi-action evaluation with policy extension
AbstractOffline Reinforcement Learning (Offline RL) is able to learn from pre-collected offline data without real-time interaction with the environment by policy regularization via distributional constraints or support set constraints. However, since the ...
Comments
Information & Contributors
Information
Published In
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0