research-article

Model Reconstruction from Model Explanations

Authors:

Ludwig Schmidt,

Anca D. Dragan,

Moritz HardtAuthors Info & Claims

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

Pages 1 - 9

https://doi.org/10.1145/3287560.3287562

Published: 29 January 2019 Publication History

Abstract

We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the ability to offer model explanations.

On the theoretical side, we give an algorithm that provably learns a two-layer ReLU network in a setting where the algorithm may query the gradient of the model with respect to chosen inputs. The number of queries is independent of the dimension and nearly optimal in its dependence on the model size. Of interest not only from a learning-theoretic perspective, this result highlights the power of gradients rather than labels as a learning primitive.

Complementing our theory, we give effective heuristics for reconstructing models from gradient explanations that are orders of magnitude more query-efficient than reconstruction attacks relying on prediction interfaces.

References

[1]

Dana Angluin. 1988. Queries and concept learning. Machine learning (1988).

Digital Library

[2]

David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. Journal of Machine Learning Research (JMLR) (2010).

Digital Library

[3]

Eric B Baum. 1991. Neural net algorithms that learn in polynomial time from examples and queries. IEEE Transactions on Neural Networks (1991).

Digital Library

[4]

Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge University Press.

Digital Library

[5]

Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. arXiv preprint arXiv:1802.08232 (2018).

[6]

Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, and Deepak Verma. 2004. Adversarial classification. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).

Digital Library

[7]

Cynthia Dwork and Vitaly Feldman. 2018. Privacy-preserving prediction. Conference on Learning Theory (COLT) (2018).

[8]

Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. 2016. Strategic classification. In ACM Conference on Innovations in Theoretical Computer Science (TCS).

Digital Library

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In ACM Asia Conference on Computer and Communications Security (ASIACCS).

Digital Library

[11]

Itay Safran and Ohad Shamir. 2018. Spurious local minima are common in two-layer ReLU neural networks. International Conference on Machine Learning (ICML) (2018).

[12]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy (SP).

[13]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).

[14]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[15]

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).

[16]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Ried-miller. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014).

[17]

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365 (2017).

Digital Library

[18]

Yuandong Tian. 2017. An analytical formula of population gradient for two-layered relu network and its applications in convergence and critical point analysis. International Conference on Machine Learning (ICML) (2017).

Digital Library

[19]

Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. In USENIX Security Symposium.

Digital Library

[20]

Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV).

[21]

Kai Zhong, Zhao Song, Prateek Jain, Peter L Bartlett, and Inderjit S Dhillon. 2017. Recovery guarantees for one-hidden-layer neural networks. International Conference on Machine Learning (ICML) (2017).

Digital Library

Cited By

Yun JLu YLiu XGuan J(2024)Bio-Rollup: a new privacy protection solution for biometrics based on two-layer scalability-focused blockchainPeerJ Computer Science10.7717/peerj-cs.226810(e2268)Online publication date: 9-Sep-2024
https://doi.org/10.7717/peerj-cs.2268
Carlini NPaleka DDvijotham KSteinke THayase JCooper ALee KJagielski MNasr MConmy AWallace ERolnick DTramèr FSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stealing part of a production language modelProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692291(5680-5705)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692291
Miura TShibahara TYanai N(2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3665451.3665533
Show More Cited By

Index Terms

Model Reconstruction from Model Explanations
1. Computing methodologies
  1. Machine learning

Recommendations

The Privacy Issue of Counterfactual Explanations: Explanation Linkage Attacks
Black-box machine learning models are used in an increasing number of high-stakes domains, and this creates a growing need for Explainable AI (XAI). However, the use of XAI in machine learning introduces privacy risks, which currently remain largely ...
Minimalistic Explanations: Capturing the Essence of Decisions
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

The use of complex machine learning models can make systems opaque to users. Machine learning research proposes the use of post-hoc explanations. However, it is unclear if they give users insights into otherwise uninterpretable models. One minimalistic ...
A Marauder's Map of Security and Privacy in Machine Learning: An overview of current and future research directions for making machine learning secure and private
AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security

There is growing recognition that machine learning (ML) exposes new security and privacy vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited but expanding. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

January 2019

388 pages

ISBN:9781450361255

DOI:10.1145/3287560

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

FAT* '19

Sponsor:

ACM

FAT* '19: Conference on Fairness, Accountability, and Transparency

January 29 - 31, 2019

GA, Atlanta, USA

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

81
Total Citations
View Citations
1,698
Total Downloads

Downloads (Last 12 months)104
Downloads (Last 6 weeks)13

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yun JLu YLiu XGuan J(2024)Bio-Rollup: a new privacy protection solution for biometrics based on two-layer scalability-focused blockchainPeerJ Computer Science10.7717/peerj-cs.226810(e2268)Online publication date: 9-Sep-2024
https://doi.org/10.7717/peerj-cs.2268
Carlini NPaleka DDvijotham KSteinke THayase JCooper ALee KJagielski MNasr MConmy AWallace ERolnick DTramèr FSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stealing part of a production language modelProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692291(5680-5705)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692291
Miura TShibahara TYanai N(2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3665451.3665533
An SCao Y(2024)Counterfactual Explanation at Will, with Zero Privacy LeakageProceedings of the ACM on Management of Data10.1145/36549332:3(1-29)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654933
Ma MLiu SM.A.P. Chamikara Baruwal Chhetri MBai GSerra ESpezzano F(2024)Unveiling Intellectual Property Vulnerabilities of GAN-Based Distributed Machine Learning through Model Extraction AttacksProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679850(1617-1626)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679850
Gao LLiu WLiu KWu J(2024)AugSteal: Advancing Model Steal With Data Augmentation in Active Learning FrameworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338484119(4728-4740)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3384841
Yan HYan AHu LLiang JHu H(2024)MTL-Leak: Privacy Risk Assessment in Multi-Task LearningIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.324786921:1(204-215)Online publication date: Jan-2024
https://doi.org/10.1109/TDSC.2023.3247869
He JGao HZhou Y(2024)Enhancing Data-Free Model Stealing Attack on Robust Models2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650742(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650742
Diakonikolas IKane DKontonis VTzamos CZarifis N(2024)Agnostically Learning Multi-Index Models with Queries2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00116(1931-1952)Online publication date: 27-Oct-2024
https://doi.org/10.1109/FOCS61266.2024.00116
Vu KLai PNguyen T(2024)XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825935(1599-1604)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825935
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten