Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Tomar, Manan; Efroni, Yonathan; Ghavamzadeh, Mohammad

Computer Science > Machine Learning

arXiv:1910.02919v1 (cs)

[Submitted on 7 Oct 2019 (this version), latest version 13 Jul 2020 (v3)]

Title:Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Authors:Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

View PDF

Abstract:Multi-step greedy policies have been extensively used in model-based Reinforcement Learning (RL) and in the case when a model of the environment is available (e.g., in the game of Go). In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration. These algorithms iteratively solve short-horizon decision problems and converge to the optimal solution of the original one. By using model-free algorithms as solvers of the short-horizon problems we derive fully model-free algorithms which are instances of the multi-step DP framework. As model-free algorithms are prone to instabilities w.r.t. the decision problem horizon, this simple approach can help in mitigating these instabilities and results in an improved model-free algorithms. We test this approach and show results on both discrete and continuous control problems.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.02919 [cs.LG]
	(or arXiv:1910.02919v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.02919

Submission history

From: Manan Tomar Mr. [view email]
[v1] Mon, 7 Oct 2019 17:20:25 UTC (11,435 KB)
[v2] Mon, 14 Oct 2019 17:25:19 UTC (11,437 KB)
[v3] Mon, 13 Jul 2020 00:00:32 UTC (914 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Manan Tomar
Yonathan Efroni
Mohammad Ghavamzadeh

export BibTeX citation

Computer Science > Machine Learning

Title:Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators