TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Yoon, Eunseop; Yoon, Hee Suk; Eom, SooHwan; Han, Gunsoo; Nam, Daniel Wontae; Jo, Daejin; On, Kyoung-Woon; Hasegawa-Johnson, Mark A.; Kim, Sungwoong; Yoo, Chang D.

Computer Science > Computation and Language

arXiv:2407.16574 (cs)

[Submitted on 23 Jul 2024]

Title:TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Authors:Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark A. Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo

View PDF HTML (experimental)

Abstract:Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks.

Comments:	ACL2024 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.16574 [cs.CL]
	(or arXiv:2407.16574v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.16574

Submission history

From: Eunseop Yoon [view email]
[v1] Tue, 23 Jul 2024 15:27:37 UTC (2,739 KB)

Computer Science > Computation and Language

Title:TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators