On Many-Actions Policy Gradient

Nauman, Michal; Cygan, Marek

Computer Science > Machine Learning

arXiv:2210.13011 (cs)

[Submitted on 24 Oct 2022 (v1), last revised 30 Oct 2023 (this version, v5)]

Title:On Many-Actions Policy Gradient

Authors:Michal Nauman, Marek Cygan

View PDF

Abstract:We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.

Comments:	ICML Proceedings 2023
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2210.13011 [cs.LG]
	(or arXiv:2210.13011v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.13011
Journal reference:	Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25769-25789, 2023

Submission history

From: Michal Nauman [view email]
[v1] Mon, 24 Oct 2022 07:58:48 UTC (2,075 KB)
[v2] Thu, 17 Nov 2022 18:21:10 UTC (2,076 KB)
[v3] Tue, 2 May 2023 12:59:46 UTC (1,004 KB)
[v4] Thu, 11 May 2023 10:33:50 UTC (1,004 KB)
[v5] Mon, 30 Oct 2023 13:20:05 UTC (1,028 KB)

Computer Science > Machine Learning

Title:On Many-Actions Policy Gradient

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On Many-Actions Policy Gradient

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators