Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

Balduzzi, David; Ghifary, Muhammad

Computer Science > Machine Learning

arXiv:1509.03005 (cs)

[Submitted on 10 Sep 2015]

Title:Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

Authors:David Balduzzi, Muhammad Ghifary

View PDF

Abstract:This paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the value function, its gradient, and determine the actor's policy respectively. We evaluate GProp on two challenging tasks: a contextual bandit problem constructed from nonparametric regression datasets that is designed to probe the ability of reinforcement learning algorithms to accurately estimate gradients; and the octopus arm, a challenging reinforcement learning benchmark. GProp is competitive with fully supervised methods on the bandit task and achieves the best performance to date on the octopus arm.

Comments:	27 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1509.03005 [cs.LG]
	(or arXiv:1509.03005v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1509.03005

Submission history

From: David Balduzzi [view email]
[v1] Thu, 10 Sep 2015 04:14:54 UTC (619 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2015-09

Change to browse by:

cs
cs.AI
cs.NE
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

David Balduzzi
Muhammad Ghifary

export BibTeX citation

Computer Science > Machine Learning

Title:Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators