Towards More Realistic Extraction Attacks: An Adversarial Perspective

More, Yash; Ganesh, Prakhar; Farnadi, Golnoosh

Computer Science > Cryptography and Security

arXiv:2407.02596 (cs)

[Submitted on 2 Jul 2024 (v1), last revised 8 Nov 2024 (this version, v2)]

Title:Towards More Realistic Extraction Attacks: An Adversarial Perspective

Authors:Yash More, Prakhar Ganesh, Golnoosh Farnadi

View PDF HTML (experimental)

Abstract:Language models are prone to memorizing parts of their training data which makes them vulnerable to extraction attacks. Existing research often examines isolated setups--such as evaluating extraction risks from a single model or with a fixed prompt design. However, a real-world adversary could access models across various sizes and checkpoints, as well as exploit prompt sensitivity, resulting in a considerably larger attack surface than previously studied. In this paper, we revisit extraction attacks from an adversarial perspective, focusing on how to leverage the brittleness of language models and the multi-faceted access to the underlying data. We find significant churn in extraction trends, i.e., even unintuitive changes to the prompt, or targeting smaller models and earlier checkpoints, can extract distinct information. By combining information from multiple attacks, our adversary is able to increase the extraction risks by up to $2 \times$. Furthermore, even with mitigation strategies like data deduplication, we find the same escalation of extraction risks against a real-world adversary. We conclude with a set of case studies, including detecting pre-training data, copyright violations, and extracting personally identifiable information, showing how our more realistic adversary can outperform existing adversaries in the literature.

Comments:	Presented at PrivateNLP@ACL2024
Subjects:	Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2407.02596 [cs.CR]
	(or arXiv:2407.02596v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2407.02596

Submission history

From: Prakhar Ganesh [view email]
[v1] Tue, 2 Jul 2024 18:33:49 UTC (2,683 KB)
[v2] Fri, 8 Nov 2024 22:36:16 UTC (2,704 KB)

Computer Science > Cryptography and Security

Title:Towards More Realistic Extraction Attacks: An Adversarial Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Towards More Realistic Extraction Attacks: An Adversarial Perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators