Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

Shen, Wei; Zheng, Rui; Zhan, Wenyu; Zhao, Jun; Dou, Shihan; Gui, Tao; Zhang, Qi; Huang, Xuanjing

Computer Science > Computation and Language

arXiv:2310.05199v5 (cs)

[Submitted on 8 Oct 2023 (v1), last revised 29 Nov 2023 (this version, v5)]

Title:Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

Authors:Wei Shen, Rui Zheng, Wenyu Zhan, Jun Zhao, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang

View PDF

Abstract:Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values. This alignment requires a vast corpus of human feedback to learn a reward model, which is subsequently used to finetune language models. However, we have identified that the reward model often finds shortcuts to bypass its intended objectives, misleadingly assuming that humans prefer longer responses. The emergence of length bias often induces the model to favor longer outputs, yet it doesn't equate to an increase in helpful information within these outputs. In this paper, we propose an innovative solution, applying the Product-of-Experts (PoE) technique to separate reward modeling from the influence of sequence length. In our framework, the main expert concentrates on understanding human intents, while the biased expert targets the identification and capture of length bias. To further enhance the learning of bias, we introduce perturbations into the bias-focused expert, disrupting the flow of semantic information. Experimental results validate the effectiveness of our approach, indicating that language model performance is improved, irrespective of sequence length.

Comments:	EMNLP 2023 findings, Length Bias in RLHF, Mitigate bias in reward modeling
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.05199 [cs.CL]
	(or arXiv:2310.05199v5 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.05199

Submission history

From: Wei Shen [view email]
[v1] Sun, 8 Oct 2023 15:14:39 UTC (1,902 KB)
[v2] Thu, 12 Oct 2023 09:04:07 UTC (1,971 KB)
[v3] Thu, 19 Oct 2023 11:09:47 UTC (1,971 KB)
[v4] Mon, 6 Nov 2023 10:28:16 UTC (1,971 KB)
[v5] Wed, 29 Nov 2023 14:45:53 UTC (1,971 KB)

Computer Science > Computation and Language

Title:Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators