Fine-Tuning Large Language Models with User-Level Differential Privacy

Charles, Zachary; Ganesh, Arun; McKenna, Ryan; McMahan, H. Brendan; Mitchell, Nicole; Pillutla, Krishna; Rush, Keith

Computer Science > Machine Learning

arXiv:2407.07737 (cs)

[Submitted on 10 Jul 2024]

Title:Fine-Tuning Large Language Models with User-Level Differential Privacy

Authors:Zachary Charles, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, Nicole Mitchell, Krishna Pillutla, Keith Rush

View PDF HTML (experimental)

Abstract:We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (DP) in order to provably safeguard all the examples contributed by each user. We study two variants of DP-SGD with: (1) example-level sampling (ELS) and per-example gradient clipping, and (2) user-level sampling (ULS) and per-user gradient clipping. We derive a novel user-level DP accountant that allows us to compute provably tight privacy guarantees for ELS. Using this, we show that while ELS can outperform ULS in specific settings, ULS generally yields better results when each user has a diverse collection of examples. We validate our findings through experiments in synthetic mean estimation and LLM fine-tuning tasks under fixed compute budgets. We find that ULS is significantly better in settings where either (1) strong privacy guarantees are required, or (2) the compute budget is large. Notably, our focus on LLM-compatible training algorithms allows us to scale to models with hundreds of millions of parameters and datasets with hundreds of thousands of users.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2407.07737 [cs.LG]
	(or arXiv:2407.07737v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.07737

Submission history

From: Zachary Charles [view email]
[v1] Wed, 10 Jul 2024 15:07:58 UTC (404 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Machine Learning

Title:Fine-Tuning Large Language Models with User-Level Differential Privacy

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Machine Learning

Title:Fine-Tuning Large Language Models with User-Level Differential Privacy

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators