Direct Alignment of Language Models via Quality-Aware Self-Refinement

Yu, Runsheng; Wang, Yong; Jiao, Xiaoqi; Zhang, Youzhi; Kwok, James T.

Computer Science > Computation and Language

arXiv:2405.21040 (cs)

[Submitted on 31 May 2024]

Title:Direct Alignment of Language Models via Quality-Aware Self-Refinement

Authors:Runsheng Yu, Yong Wang, Xiaoqi Jiao, Youzhi Zhang, James T. Kwok

View PDF HTML (experimental)

Abstract:Reinforcement Learning from Human Feedback (RLHF) has been commonly used to align the behaviors of Large Language Models (LLMs) with human preferences. Recently, a popular alternative is Direct Policy Optimization (DPO), which replaces an LLM-based reward model with the policy itself, thus obviating the need for extra memory and training time to learn the reward model. However, DPO does not consider the relative qualities of the positive and negative responses, and can lead to sub-optimal training outcomes. To alleviate this problem, we investigate the use of intrinsic knowledge within the on-the-fly fine-tuning LLM to obtain relative qualities and help to refine the loss function. Specifically, we leverage the knowledge of the LLM to design a refinement function to estimate the quality of both the positive and negative responses. We show that the constructed refinement function can help self-refine the loss function under mild assumptions. The refinement function is integrated into DPO and its variant Identity Policy Optimization (IPO). Experiments across various evaluators indicate that they can improve the performance of the fine-tuned models over DPO and IPO.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.21040 [cs.CL]
	(or arXiv:2405.21040v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.21040

Submission history

From: Runsheng Yu [view email]
[v1] Fri, 31 May 2024 17:31:18 UTC (2,487 KB)

Computer Science > Computation and Language

Title:Direct Alignment of Language Models via Quality-Aware Self-Refinement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Direct Alignment of Language Models via Quality-Aware Self-Refinement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators