Influence Functions for Scalable Data Attribution in Diffusion Models

Mlodozeniec, Bruno; Eschenhagen, Runa; Bae, Juhan; Immer, Alexander; Krueger, David; Turner, Richard

Computer Science > Machine Learning

arXiv:2410.13850 (cs)

[Submitted on 17 Oct 2024 (v1), last revised 7 Jan 2025 (this version, v3)]

Title:Influence Functions for Scalable Data Attribution in Diffusion Models

Authors:Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, Richard Turner

View PDF HTML (experimental)

Abstract:Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models by developing an influence functions framework. Influence function-based data attribution methods approximate how a model's output would have changed if some training data were removed. In supervised learning, this is usually used for predicting how the loss on a particular example would change. For diffusion models, we focus on predicting the change in the probability of generating a particular example via several proxy measurements. We show how to formulate influence functions for such quantities and how previously proposed methods can be interpreted as particular design choices in our framework. To ensure scalability of the Hessian computations in influence functions, we systematically develop K-FAC approximations based on generalised Gauss-Newton matrices specifically tailored to diffusion models. We recast previously proposed methods as specific design choices in our framework and show that our recommended method outperforms previous data attribution approaches on common evaluations, such as the Linear Data-modelling Score (LDS) or retraining without top influences, without the need for method-specific hyperparameter tuning.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.13850 [cs.LG]
	(or arXiv:2410.13850v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.13850

Submission history

From: Bruno Mlodozeniec [view email]
[v1] Thu, 17 Oct 2024 17:59:02 UTC (6,417 KB)
[v2] Thu, 24 Oct 2024 17:43:00 UTC (16,738 KB)
[v3] Tue, 7 Jan 2025 15:28:09 UTC (5,758 KB)

Computer Science > Machine Learning

Title:Influence Functions for Scalable Data Attribution in Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Influence Functions for Scalable Data Attribution in Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators