Difference of Convex Functions Programming for Policy Optimization in Reinforcement Learning
Abstract
References
Index Terms
- Difference of Convex Functions Programming for Policy Optimization in Reinforcement Learning
Recommendations
Optimal policy switching algorithms for reinforcement learning
AAMAS '10: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1We address the problem of single-agent, autonomous sequential decision making. We assume that some controllers or behavior policies are given as prior knowledge, and the task of the agent is to learn how to switch between these policies. We formulate ...
Hessian matrix distribution for Bayesian policy gradient reinforcement learning
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed ...
Decentralized multi-task reinforcement learning policy gradient method with momentum over networks
AbstractTo find the optimal policy quickly for reinforcement learning problems, policy gradient (PG) method is very effective, it parameters the policy and updates policy parameter directly. Besides, momentum methods are commonly employed to improve ...
Comments
Information & Contributors
Information
Published In
- General Chairs:
- Mehdi Dastani,
- Jaime Simão Sichman,
- Program Chairs:
- Natasha Alechina,
- Virginia Dignum
Sponsors
Publisher
International Foundation for Autonomous Agents and Multiagent Systems
Richland, SC
Publication History
Check for updates
Author Tags
Qualifiers
- Extended-abstract
Conference
Acceptance Rates
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 7Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in