Model extraction from counterfactual explanations

Aïvodji, Ulrich; Bolot, Alexandre; Gambs, Sébastien

Computer Science > Machine Learning

arXiv:2009.01884 (cs)

[Submitted on 3 Sep 2020]

Title:Model extraction from counterfactual explanations

Authors:Ulrich Aïvodji, Alexandre Bolot, Sébastien Gambs

View PDF

Abstract:Post-hoc explanation techniques refer to a posteriori methods that can be used to explain how black-box machine learning models produce their outcomes. Among post-hoc explanation techniques, counterfactual explanations are becoming one of the most popular methods to achieve this objective. In particular, in addition to highlighting the most important features used by the black-box model, they provide users with actionable explanations in the form of data instances that would have received a different outcome. Nonetheless, by doing so, they also leak non-trivial information about the model itself, which raises privacy issues. In this work, we demonstrate how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. More precisely, our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations. The empirical evaluation of the proposed attack on black-box models trained on real-world datasets demonstrates that they can achieve high-fidelity and high-accuracy extraction even under low query budgets.

Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Cite as:	arXiv:2009.01884 [cs.LG]
	(or arXiv:2009.01884v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2009.01884

Submission history

From: Ulrich Aïvodji [view email]
[v1] Thu, 3 Sep 2020 19:02:55 UTC (933 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-09

Change to browse by:

cs
cs.CR
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ulrich Aïvodji
Sébastien Gambs

export BibTeX citation

Computer Science > Machine Learning

Title:Model extraction from counterfactual explanations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model extraction from counterfactual explanations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators