Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3287560.3287562acmconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

Model Reconstruction from Model Explanations

Published: 29 January 2019 Publication History

Abstract

We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the ability to offer model explanations.
On the theoretical side, we give an algorithm that provably learns a two-layer ReLU network in a setting where the algorithm may query the gradient of the model with respect to chosen inputs. The number of queries is independent of the dimension and nearly optimal in its dependence on the model size. Of interest not only from a learning-theoretic perspective, this result highlights the power of gradients rather than labels as a learning primitive.
Complementing our theory, we give effective heuristics for reconstructing models from gradient explanations that are orders of magnitude more query-efficient than reconstruction attacks relying on prediction interfaces.

References

[1]
Dana Angluin. 1988. Queries and concept learning. Machine learning (1988).
[2]
David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. Journal of Machine Learning Research (JMLR) (2010).
[3]
Eric B Baum. 1991. Neural net algorithms that learn in polynomial time from examples and queries. IEEE Transactions on Neural Networks (1991).
[4]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge University Press.
[5]
Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. arXiv preprint arXiv:1802.08232 (2018).
[6]
Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, and Deepak Verma. 2004. Adversarial classification. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).
[7]
Cynthia Dwork and Vitaly Feldman. 2018. Privacy-preserving prediction. Conference on Learning Theory (COLT) (2018).
[8]
Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. 2016. Strategic classification. In ACM Conference on Innovations in Theoretical Computer Science (TCS).
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In ACM Asia Conference on Computer and Communications Security (ASIACCS).
[11]
Itay Safran and Ohad Shamir. 2018. Spurious local minima are common in two-layer ReLU neural networks. International Conference on Machine Learning (ICML) (2018).
[12]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy (SP).
[13]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).
[14]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[15]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[16]
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Ried-miller. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014).
[17]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365 (2017).
[18]
Yuandong Tian. 2017. An analytical formula of population gradient for two-layered relu network and its applications in convergence and critical point analysis. International Conference on Machine Learning (ICML) (2017).
[19]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. In USENIX Security Symposium.
[20]
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV).
[21]
Kai Zhong, Zhao Song, Prateek Jain, Peter L Bartlett, and Inderjit S Dhillon. 2017. Recovery guarantees for one-hidden-layer neural networks. International Conference on Machine Learning (ICML) (2017).

Cited By

View all
  • (2024)Bio-Rollup: a new privacy protection solution for biometrics based on two-layer scalability-focused blockchainPeerJ Computer Science10.7717/peerj-cs.226810(e2268)Online publication date: 9-Sep-2024
  • (2024)Stealing part of a production language modelProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692291(5680-5705)Online publication date: 21-Jul-2024
  • (2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
  • Show More Cited By

Index Terms

  1. Model Reconstruction from Model Explanations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency
    January 2019
    388 pages
    ISBN:9781450361255
    DOI:10.1145/3287560
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 January 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Explanations
    2. machine learning
    3. privacy
    4. security

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    FAT* '19
    Sponsor:

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)104
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bio-Rollup: a new privacy protection solution for biometrics based on two-layer scalability-focused blockchainPeerJ Computer Science10.7717/peerj-cs.226810(e2268)Online publication date: 9-Sep-2024
    • (2024)Stealing part of a production language modelProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692291(5680-5705)Online publication date: 21-Jul-2024
    • (2024)MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AIProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665533(56-66)Online publication date: 2-Jul-2024
    • (2024)Counterfactual Explanation at Will, with Zero Privacy LeakageProceedings of the ACM on Management of Data10.1145/36549332:3(1-29)Online publication date: 30-May-2024
    • (2024)Unveiling Intellectual Property Vulnerabilities of GAN-Based Distributed Machine Learning through Model Extraction AttacksProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679850(1617-1626)Online publication date: 21-Oct-2024
    • (2024)AugSteal: Advancing Model Steal With Data Augmentation in Active Learning FrameworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338484119(4728-4740)Online publication date: 2024
    • (2024)MTL-Leak: Privacy Risk Assessment in Multi-Task LearningIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.324786921:1(204-215)Online publication date: Jan-2024
    • (2024)Enhancing Data-Free Model Stealing Attack on Robust Models2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650742(1-8)Online publication date: 30-Jun-2024
    • (2024)Agnostically Learning Multi-Index Models with Queries2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00116(1931-1952)Online publication date: 27-Oct-2024
    • (2024)XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825935(1599-1604)Online publication date: 15-Dec-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media