CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Azad, Abdus Salam; Gur, Izzeddin; Emhoff, Jasper; Alexis, Nathaniel; Faust, Aleksandra; Abbeel, Pieter; Stoica, Ion

doi:10.5555/3618408.3618465

Computer Science > Machine Learning

arXiv:2210.10243 (cs)

[Submitted on 19 Oct 2022 (v1), last revised 7 Mar 2023 (this version, v2)]

Title:CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Authors:Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica

View PDF

Abstract:Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45\% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions.

Comments:	Preprint, Currently Under Review
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.10243 [cs.LG]
	(or arXiv:2210.10243v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.10243
Related DOI:	https://doi.org/10.5555/3618408.3618465

Submission history

From: Abdus Salam Azad [view email]
[v1] Wed, 19 Oct 2022 01:45:29 UTC (5,879 KB)
[v2] Tue, 7 Mar 2023 05:46:46 UTC (8,491 KB)

Computer Science > Machine Learning

Title:CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators