Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Liu, Boyi; Cai, Qi; Yang, Zhuoran; Wang, Zhaoran

Computer Science > Machine Learning

arXiv:1906.10306 (cs)

[Submitted on 25 Jun 2019 (v1), last revised 27 Feb 2023 (this version, v3)]

Title:Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Authors:Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

View PDF

Abstract:Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning. However, due to nonconvexity, the global convergence of PPO and TRPO remains less understood, which separates theory from practice. In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate. The key to our analysis is the global convergence of infinite-dimensional mirror descent under a notion of one-point monotonicity, where the gradient and iterate are instantiated by neural networks. In particular, the desirable representation power and optimization geometry induced by the overparametrization of such neural networks allow them to accurately approximate the infinite-dimensional gradient and iterate.

Comments:	A short version
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1906.10306 [cs.LG]
	(or arXiv:1906.10306v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.10306

Submission history

From: Boyi Liu [view email]
[v1] Tue, 25 Jun 2019 03:20:04 UTC (87 KB)
[v2] Wed, 11 Sep 2019 07:07:35 UTC (93 KB)
[v3] Mon, 27 Feb 2023 21:48:13 UTC (94 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-06

Change to browse by:

cs
math
math.OC
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Boyi Liu
Qi Cai
Zhuoran Yang
Zhaoran Wang

export BibTeX citation

Computer Science > Machine Learning

Title:Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators