No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Rutherford, Alexander; Beukman, Michael; Willi, Timon; Lacerda, Bruno; Hawes, Nick; Foerster, Jakob

Computer Science > Machine Learning

arXiv:2408.15099 (cs)

[Submitted on 27 Aug 2024 (v1), last revised 29 Aug 2024 (this version, v2)]

Title:No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Authors:Alexander Rutherford, Michael Beukman, Timon Willi, Bruno Lacerda, Nick Hawes, Jakob Foerster

View PDF HTML (experimental)

Abstract:What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning. In particular, Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula enable agents to be robust to in- and out-of-distribution tasks. We ask to what extent these methods are themselves robust when applied to a novel setting, closely inspired by a real-world robotics problem. Surprisingly, we find that the state-of-the-art UED methods either do not improve upon the naïve baseline of Domain Randomisation (DR), or require substantial hyperparameter tuning to do so. Our analysis shows that this is due to their underlying scoring functions failing to predict intuitive measures of ``learnability'', i.e., in finding the settings that the agent sometimes solves, but not always. Based on this, we instead directly train on levels with high learnability and find that this simple and intuitive approach outperforms UED methods and DR in several binary-outcome environments, including on our domain and the standard UED domain of Minigrid. We further introduce a new adversarial evaluation procedure for directly measuring robustness, closely mirroring the conditional value at risk (CVaR). We open-source all our code and present visualisations of final policies here: this https URL.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2408.15099 [cs.LG]
	(or arXiv:2408.15099v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2408.15099

Submission history

From: Alexander Rutherford [view email]
[v1] Tue, 27 Aug 2024 14:31:54 UTC (1,860 KB)
[v2] Thu, 29 Aug 2024 14:20:44 UTC (1,861 KB)

Computer Science > Machine Learning

Title:No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators