Textual Unlearning Gives a False Sense of Unlearning

Du, Jiacheng; Wang, Zhibo; Ren, Kui

Computer Science > Cryptography and Security

arXiv:2406.13348 (cs)

[Submitted on 19 Jun 2024]

Title:Textual Unlearning Gives a False Sense of Unlearning

Authors:Jiacheng Du, Zhibo Wang, Kui Ren

View PDF HTML (experimental)

Abstract:Language models (LMs) are susceptible to "memorizing" training data, including a large amount of private or copyright-protected content. To safeguard the right to be forgotten (RTBF), machine unlearning has emerged as a promising method for LMs to efficiently "forget" sensitive training content and mitigate knowledge leakage risks. However, despite its good intentions, could the unlearning mechanism be counterproductive? In this paper, we propose the Textual Unlearning Leakage Attack (TULA), where an adversary can infer information about the unlearned data only by accessing the models before and after unlearning. Furthermore, we present variants of TULA in both black-box and white-box scenarios. Through various experimental results, we critically demonstrate that machine unlearning amplifies the risk of knowledge leakage from LMs. Specifically, TULA can increase an adversary's ability to infer membership information about the unlearned data by more than 20% in black-box scenario. Moreover, TULA can even reconstruct the unlearned data directly with more than 60% accuracy with white-box access. Our work is the first to reveal that machine unlearning in LMs can inversely create greater knowledge risks and inspire the development of more secure unlearning mechanisms.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2406.13348 [cs.CR]
	(or arXiv:2406.13348v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2406.13348

Submission history

From: Jiacheng Du [view email]
[v1] Wed, 19 Jun 2024 08:51:54 UTC (1,382 KB)

Computer Science > Cryptography and Security

Title:Textual Unlearning Gives a False Sense of Unlearning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Textual Unlearning Gives a False Sense of Unlearning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators