research-article

Open access

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods

Authors:

Sophie Hilgard,

Himabindu LakkarajuAuthors Info & Claims

AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

Pages 180 - 186

https://doi.org/10.1145/3375627.3375830

Published: 07 February 2020 Publication History

Abstract

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using extensive evaluation with multiple real world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases.

References

[1]

Ulrich Aivodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, and Alain Tapp. 2019. Fairwashing: the risk of rationalization. In International Conference on Machine Learning . 161--170.

[2]

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica (2016).

[3]

Arthur Asuncion and David Newman. 2007. UCI machine learning repository, 2007.

[4]

C Blake, E Koegh, and CJ Mertz. 1999. Repository of Machine Learning. University of California at Irvine (1999).

[5]

Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. 2019. Explanations can be manipulated and geometry is to blame. arXiv preprint arXiv:1906.07983 (2019).

[6]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).

[7]

Radwa Elshawi, Mouaz H Al-Mallah, and Sherif Sakr. 2019. On the interpretability of machine learning-based model for predicting hypertension. BMC medical informatics and decision making, Vol. 19, 1 (2019), 146.

[8]

Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.

Digital Library

[9]

Juyeon Heo, Sunghwan Joo, and Taesup Moon. 2019. Fooling Neural Network Interpretations via Adversarial Model Manipulation. In Advances in Neural Information Processing Systems 32. 2921--2932.

[10]

Mark Ibrahim, Melissa Louie, Ceena Modarres, and John Paisley. 2019. Global Explanations of Neural Networks: Mapping the Landscape of Predictions. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES '19). 279--287.

Digital Library

[11]

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory S ayres. 2018. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 2668--2677.

[12]

Joshua A Kroll, Solon Barocas, Edward W Felten, Joel R Reidenberg, David G Robinson, and Harlan Yu. 2016. Accountable algorithms. U. Pa. L. Rev., Vol. 165 (2016), 633.

[13]

Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. 2016. How we analyzed the COMPAS recidivism algorithm. ProPublica (5 2016), Vol. 9 (2016).

[14]

Zachary C. Lipton. 2018. The Mythos of Model Interpretability. Queue, Vol. 16, 3, Article 30 (June 2018), bibinfonumpages27 pages.

Digital Library

[15]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Neural Information Processing Systems (NIPS), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774.

[16]

Brent Mittelstadt, Chris Russell, and Sandra Wachter. 2019. Explaining explanations in AI. In Proceedings of the conference on fairness, accountability, and transparency. ACM, 279--288.

Digital Library

[17]

M Redmond. 2011. Communities and crime unnormalized data set. UCI Machine Learning Repository. (2011).

[18]

Michael Redmond and Alok Baveja. 2002. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, Vol. 141, 3 (2002), 660--678.

[19]

General Data Protection Regulation. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46. Official Journal of the European Union (OJ), Vol. 59, 1--88 (2016), 294.

[20]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Knowledge Discovery and Data Mining (KDD) .

[21]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence .

[22]

Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, Vol. 1, 5 (2019), 206.

[23]

Andrew D Selbst and Solon Barocas. 2018. The intuitive appeal of explainable machines. Fordham L. Rev., Vol. 87 (2018), 1085.

[24]

Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. 2018. Distill-and-compare: auditing black-box models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 303--310.

Digital Library

[25]

Leanne S Whitmore, Anthe George, and Corey M Hudson. 2016. Mapping chemical performance on molecular structures using locally interpretable explanations. arXiv preprint arXiv:1611.07443 (2016).

Cited By

Hamja MHasan MRashid MShourov M(2025)Exploring happiness factors with explainable ensemble learning in a global pandemicPLOS ONE10.1371/journal.pone.031327620:1(e0313276)Online publication date: 2-Jan-2025
https://doi.org/10.1371/journal.pone.0313276
Raman KKumar RMusante CMadhavan S(2025) Integrating Model‐Informed Drug Development With AI : A Synergistic Approach to Accelerating Pharmaceutical Innovation Clinical and Translational Science10.1111/cts.7012418:1Online publication date: 10-Jan-2025
https://doi.org/10.1111/cts.70124
Mohammadi MBabai MWilkinson M(2025)Generalized Relevance Learning Grassmann QuantizationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346631547:1(502-513)Online publication date: Jan-2025
https://doi.org/10.1109/TPAMI.2024.3466315
Show More Cited By

Index Terms

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User studies
    2. Interactive systems and tools

Recommendations

"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations
AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a human interpretable manner. ...
Explanatory subgraph attacks against Graph Neural Networks
Abstract
Graph Neural Networks (GNNs) are often viewed as black boxes due to their lack of transparency, which hinders their application in critical fields. Many explanation methods have been proposed to address the interpretability issue of GNNs. These ...
FATALRead - Fooling visual speech recognition models: Put words on Lips
Abstract
Visual speech recognition is essential in understanding speech in several real-world applications such as surveillance systems and aiding differently-abled. It proliferates the research in the realm of visual speech recognition, also known as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

February 2020

439 pages

ISBN:9781450371100

DOI:10.1145/3375627

General Chairs:
Annette Markham
Aarhus University | Loyola University
,
Julia Powles
University of Western Australia
,
Toby Walsh
TU Berlin | University of New South Wales | Data61
,
Anne L. Washington
New York University

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
Allen Institute of Artificial Intelligence (AI2)
Google

Conference

AIES '20

Sponsor:

SIGAI

AIES '20: AAAI/ACM Conference on AI, Ethics, and Society

February 7 - 9, 2020

NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

422
Total Citations
View Citations
11,804
Total Downloads

Downloads (Last 12 months)2,971
Downloads (Last 6 weeks)319

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hamja MHasan MRashid MShourov M(2025)Exploring happiness factors with explainable ensemble learning in a global pandemicPLOS ONE10.1371/journal.pone.031327620:1(e0313276)Online publication date: 2-Jan-2025
https://doi.org/10.1371/journal.pone.0313276
Raman KKumar RMusante CMadhavan S(2025) Integrating Model‐Informed Drug Development With AI : A Synergistic Approach to Accelerating Pharmaceutical Innovation Clinical and Translational Science10.1111/cts.7012418:1Online publication date: 10-Jan-2025
https://doi.org/10.1111/cts.70124
Mohammadi MBabai MWilkinson M(2025)Generalized Relevance Learning Grassmann QuantizationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346631547:1(502-513)Online publication date: Jan-2025
https://doi.org/10.1109/TPAMI.2024.3466315
Dwivedi YSingh RSharma ASharma AMarques C(2025)Enhanced Prediction and Optimization of Oil–Water Emulsion Stability Through Application of Machine Learning and Explainable Artificial Intelligence on TFBG Sensor DataIEEE Sensors Letters10.1109/LSENS.2024.35037529:1(1-4)Online publication date: Jan-2025
https://doi.org/10.1109/LSENS.2024.3503752
Panahandeh ARabiei-Dastjerdi HGoktas PMcArdle G(2025)Answering new urban questions: Using eXplainable AI-driven analysis to identify determinants of Airbnb price in DublinExpert Systems with Applications10.1016/j.eswa.2024.125360260(125360)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125360
Budhkar ASong QSu JZhang X(2025)Demystifying the black box: A survey on explainable artificial intelligence (XAI) in bioinformaticsComputational and Structural Biotechnology Journal10.1016/j.csbj.2024.12.02727(346-359)Online publication date: 2025
https://doi.org/10.1016/j.csbj.2024.12.027
Nannini LHuyskes DPanai EPistilli GTartaro A(2025)Nullius in Explanans: an ethical risk assessment for explainable AIEthics and Information Technology10.1007/s10676-024-09800-727:1Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s10676-024-09800-7
Tiwari EShrimankar DMaindarkar MBhagawati MKaur JSingh IMantella LJohri AKhanna NSingh RChaudhary SSaba LAl-Maini MAnand VKitas GSuri J(2025)Artificial intelligence-based cardiovascular/stroke risk stratification in women affected by autoimmune disorders: a narrative surveyRheumatology International10.1007/s00296-024-05756-545:1Online publication date: 2-Jan-2025
https://doi.org/10.1007/s00296-024-05756-5
Goethals SMartens DEvgeniou T(2025)Manipulation Risks in Explainable AI: The Implications of the Disagreement ProblemMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74633-8_12(185-200)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74633-8_12
Kirtas MTsampazis KAvramelou LPassalis NTefas A(2025)Using Part-Based Representations for Explainable Deep Reinforcement LearningMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_35(420-432)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74627-7_35
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents