Exclusively Penalized Q-learning for Offline Reinforcement Learning

Yeom, Junghyuk; Jo, Yonghyeon; Kim, Jungmo; Lee, Sanghyeon; Han, Seungyul

Computer Science > Machine Learning

arXiv:2405.14082 (cs)

[Submitted on 23 May 2024 (v1), last revised 24 Oct 2024 (this version, v2)]

Title:Exclusively Penalized Q-learning for Offline Reinforcement Learning

Authors:Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han

View PDF HTML (experimental)

Abstract:Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods

Comments:	10 technical page followed by references and appendix. Accepted to Neurips 2024 as spotlight paper
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.14082 [cs.LG]
	(or arXiv:2405.14082v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.14082

Submission history

From: Seungyul Han [view email]
[v1] Thu, 23 May 2024 01:06:05 UTC (7,648 KB)
[v2] Thu, 24 Oct 2024 07:56:23 UTC (9,281 KB)

Computer Science > Machine Learning

Title:Exclusively Penalized Q-learning for Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Exclusively Penalized Q-learning for Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators