Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Fan, Chongyu; Liu, Jiancheng; Lin, Licong; Jia, Jinghan; Zhang, Ruiqi; Mei, Song; Liu, Sijia

Computer Science > Computation and Language

arXiv:2410.07163 (cs)

[Submitted on 9 Oct 2024 (v1), last revised 28 Oct 2024 (this version, v2)]

Title:Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Authors:Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, Sijia Liu

View PDF HTML (experimental)

Abstract:In this work, we address the problem of large language model (LLM) unlearning, aiming to remove unwanted data influences and associated model capabilities (e.g., copyrighted data or harmful content generation) while preserving essential model utilities, without the need for retraining from scratch. Despite the growing need for LLM unlearning, a principled optimization framework remains lacking. To this end, we revisit the state-of-the-art approach, negative preference optimization (NPO), and identify the issue of reference model bias, which could undermine NPO's effectiveness, particularly when unlearning forget data of varying difficulty. Given that, we propose a simple yet effective unlearning optimization framework, called SimNPO, showing that 'simplicity' in removing the reliance on a reference model (through the lens of simple preference optimization) benefits unlearning. We also provide deeper insights into SimNPO's advantages, supported by analysis using mixtures of Markov chains. Furthermore, we present extensive experiments validating SimNPO's superiority over existing unlearning baselines in benchmarks like TOFU and MUSE, and robustness against relearning attacks. Codes are available at this https URL.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2410.07163 [cs.CL]
	(or arXiv:2410.07163v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.07163

Submission history

From: Chongyu Fan [view email]
[v1] Wed, 9 Oct 2024 17:58:12 UTC (5,134 KB)
[v2] Mon, 28 Oct 2024 19:55:24 UTC (5,134 KB)

Computer Science > Computation and Language

Title:Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators