research-article

Open access

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

Authors:

Sebastian Bordt,

Michèle Finck,

Ulrike von LuxburgAuthors Info & Claims

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Pages 891 - 905

https://doi.org/10.1145/3531146.3533153

Published: 20 June 2022 Publication History

All formats PDF

Abstract

Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to “explain”. Many researchers suggest using post-hoc explanation algorithms for this purpose. In this paper, we combine legal, philosophical and technical arguments to show that post-hoc explanation algorithms are unsuitable to achieve the law’s objectives. Indeed, most situations where explanations are requested are adversarial, meaning that the explanation provider and receiver have opposing interests and incentives, so that the provider might manipulate the explanation for her own ends. We show that this fundamental conflict cannot be resolved because of the high degree of ambiguity of post-hoc explanations in realistic application scenarios. As a consequence, post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms. Instead, there is a need to more explicitly discuss the objectives underlying “explainability” obligations as these can often be better achieved through other mechanisms. There is an urgent need for a more open and honest discussion regarding the potential and limitations of post-hoc explanations in adversarial contexts, in particular in light of the current negotiations of the European Union’s draft Artificial Intelligence Act.

References

[1]

P. Achinstein. 1983. The Nature of Explanation. Oxford University Press, New York.

[2]

A. Adadi and M. Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160.

[3]

J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim. 2018. Sanity checks for saliency maps. In Neural Information Processing Systems (NeurIPS).

[4]

A.Karimi, G. Barthe, B. Schölkopf, and I. Valera. 2021. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arxiv:2010.04050

[5]

C. Anders, P. Pasliev, A. K. Dombrowski, K. R. Müller, and P. Kessel. 2020. Fairwashing explanations with off-manifold detergent. In International Conference on Machine Learning (ICML).

[6]

S. Barocas, M. Hardt, and A. Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.

[7]

S. Barocas, A. Selbst, and M. Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In ACM Conference on Fairness, Accountability, and Transparency.

[8]

R. B. Braithwaite. 1953. Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science. Cambridge University Press, Cambridge.

[9]

O. Camburu, E. Giunchiglia, J. Foerster, T. Lukasiewicz, and P. Blunsom. 2019. Can I trust the explainer? Verifying post-hoc explanatory methods. arXiv:1910.02065 (2019).

[10]

L. Chazette, W. Brunotte, and T. Speith. 2021. Exploring explainability: A definition, a model, and a knowledge catalogue. In IEEE 29th International Requirements Engineering Conference (RE).

[11]

European Commission. 2020. White Paper on Artificial Intelligence-A European approach to excellence and trust. Com (2020) 65 Final (2020).

[12]

I. Covert, S. Lundberg, and S.I. Lee. 2021. Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research (JMLR) 22, 209 (2021), 1–90.

[13]

F. Ding, M. Hardt, J. Miller, and L. Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In Neural Information Processing Systems (NeurIPS).

[14]

L. Edwards and M. Veale. 2017. Slave to the algorithm: Why a right to an explanation is probably not the remedy you are looking for. Duke Law and Technology Review 16 (2017).

[15]

D. Garreau and U. von Luxburg. 2020. Explaining the Explainer: A First Theoretical Analysis of LIME. In Conference on Artificial Intelligence and Statistics (AISTATS).

[16]

S. Ghalebikesabi, L. Ter-Minassian, K. DiazOrdaz, and C. C. Holmes. 2021. On locality of local explanation models. In Advances in Neural Information Processing Systems (NeurIPS).

[17]

C. Hempel. 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. Free Press, New York.

[18]

M. Hildebrandt. 2019. Privacy as protection of the incomputable self: From agnostic to agonistic machine learning. Theoretical Inquiries in Law 20, 1 (2019), 83–121.

[19]

A. Z. Jacobs and H. Wallach. 2021. Measurement and fairness. In ACM conference on Fairness, Accountability, and Transparency.

[20]

D. Janzing, L. Minorics, and P. Blöbaum. 2020. Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (AISTATS).

[21]

M. Kaminski and J. Urban. 2021. The Right to Contest AI. Columbia Law Review (2021).

[22]

L. Kästner, M. Langer, V. Lazar, A. Schomäcker, T. Speith, and S. Sterz. 2021. On the Relation of Trust and Explainability: Why to Engineer for Trustworthiness. In IEEE 29th International Requirements Engineering Conference Workshops (REW).

[23]

J. Kleinberg, J. Ludwig, S. Mullainathan, and C. Sunstein. 2018. Discrimination in the Age of Algorithms. Journal of Legal Analysis 10 (2018), 113–174.

[24]

R. Kommiya Mothilal, D. Mahajan, C. Tan, and A. Sharma. 2021. Towards unifying feature attribution and counterfactual explanations: Different means to the same end. In AAAI/ACM Conference on AI, Ethics, and Society.

[25]

S. Krishna, T. Han, A. Gu, J. Pombra, S. Jabbari, S. Wu, and H. Lakkaraju. 2022. The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective. arXiv preprint arXiv:2202.01602(2022).

[26]

M. Langer, D. Oster, T. Speith, H. Hermanns, L. Kästner, E. Schmidt, A. Sesing, and K. Baum. 2021. What do we want from Explainable Artificial Intelligence (XAI)? – A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence 296 (2021).

[27]

E. Lee, D. Braines, Mi. Stiffler, A. Hudler, and D. Harborne. 2019. Developing the sensitivity of LIME for better machine learning explanation. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications.

[28]

D. Lewis. 1973. Counterfactuals. Blackwell.

[29]

Q. V. Liao and K. R. Varshney. 2021. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv preprint arXiv:2110.10790(2021).

[30]

S. Lundberg and S. Lee. 2017. A unified approach to interpreting model predictions. In Neural Information Processing Systems (NeurIPS).

[31]

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. I. Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.

[32]

G. Malgieri and G. Comandé. 2017. Why a Right to Legibility of Automated Decision-Making Exists in the General Data Protection Regulation. International Data Privacy Law 7, 4 (11 2017), 243–265.

[33]

C. Molnar. 2020. Interpretable machine learning. Lulu.com.

[34]

R. Mothilal, A. Sharma, and C. Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In ACM Conference on Fairness, Accountability, and Transparency.

[35]

High-Level Expert Group on AI. 2019. Ethics Guidelines for Trustworthy AI.

[36]

Working Party. 2016. Guidelines on Automated individual decision-making and Profiling for the purposes of RegulationGuidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679.

[37]

A. Paullada, I. Raji, E. Bender, E.and Denton, and A. Hanna. 2021. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (2021).

[38]

J. Pearl. 2000. Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge.

Digital Library

[39]

K. Popper. 1959. The Logic of Scientific Discovery. Hutchinson, London.

[40]

A. Reutlinger and J. Saatsi. 2018. Explanation Beyond Causation; Philosophical Perspectives on Non-Causal Explanations. Oxford University Press, Oxford.

[41]

M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. Why should i trust you? Explaining the predictions of any classifier. In 22nd ACM SIGKDD international conference on knowledge discovery and data mining.

[42]

C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.

[43]

W. Salmon. 1971. Statistical Explanation and Statistical Relevance. University of Pittsburgh Press, Pittsburgh, PA.

[44]

W. Salmon. 1989. Four Decades of Scientific Explanation. In Scientific Explanation, Kitcher and Salmon (Eds.). Minnesota Studies in the Philosophy of Science, Vol. 13. University of Minnesota Press, 3–219.

[45]

A. Selbst and J. Powles. 2018. Meaningful Information and the Right to Explanation. In ACM Conference on Fairness, Accountability, and Transparency.

[46]

D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju. 2021. Reliable post hoc explanations: Modeling uncertainty in explainability. In Neural Information Processing Systems (NeurIPS).

[47]

D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. 2020. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In AAAI/ACM Conference on AI, Ethics, and Society.

[48]

D. Slack, S. Hilgard, H. Lakkaraju, and S. Singh. 2021. Counterfactual Explanations Can Be Manipulated. arXiv:2106.02666 (2021).

[49]

P. Spirtes, C. Glymour, and R. Scheines. 1993. Causation, Prediction, and Search. Springer, Berlin.

[50]

W. Spohn. 1980. Stochastic independence, causal independence, and shieldability. Journal of Philosophical Logic 9 (1980), 73–99.

[51]

M. Sundararajan and A. Najmi. 2020. The many Shapley values for model explanation. In International Conference on Machine Learning (ICML).

[52]

R. Tomsett, D. Braines, D. Harborne, A. Preece, and S. Chakraborty. 2018. Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems. In ICML Workshop on Human Interpretability in Machine Learning.

[53]

P. Tschandl, C. Rinner, Z. Apalla, G. Argenziano, N. Codella, A. Halpern, M. Janda, A. Lallas, C. Longo, J. Malvehy, J. Paoli, S. Puig, C. Rosendahl, H. Soyer, I. Zalaudek, and H. Kittler. 2020. Human–computer collaboration for skin cancer recognition. Nature Medicine 26, 8 (2020), 1229–1234.

[54]

M. Veale and F. Zuiderveen Borgesius. 2021. Demystifying the Draft EU Artificial Intelligence Act—Analysing the good, the bad, and the unclear elements of the proposed approach. Computer Law Review International 22, 4 (2021), 97–112.

[55]

S. Venkatasubramanian and M. Alfano. 2020. The Philosophical Basis of Algorithmic Recourse. In ACM Conference on Fairness, Accountability, and Transparency.

[56]

G. Vilone and L. Longo. 2021. Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion 76(2021), 89–106.

Digital Library

[57]

W. J. von Eschenbach. 2021. Transparency and the Black Box Problem: Why We Do Not Trust AI. Philos. Technol. 34(2021), 1607–1622.

[58]

U. von Luxburg, R. Williamson, and I. Guyon. 2012. Clustering: Science or Art?JMLR Workshop and Conference Proceedings (Workshop on Unsupervised Learning and Transfer Learning)(2012), 65 – 79.

[59]

S. Wachter, B. Mittelstadt, and L. Floridi. 2017. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law 7, 2 (06 2017), 76–99.

[60]

S. Wachter, B. Mittelstadt, and C. Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31(2017), 841.

[61]

J. Woodward. 2003. Making Things Happen: A Theory of Causal Explanation. Oxford University Press.

[62]

J. Woodward and L. Ross. 2003. Scientific Explanation. The Stanford Encyclopedia of Philosophy (Summer Edition 2021) (2003). https://plato.stanford.edu/archives/sum2021/entries/scientific-explanation/

[63]

C. Zednik and H. Boelsen. forthcoming. Scientific Exploration and Explainable Artificial Intelligence. Minds and Machines(forthcoming).

[64]

Y. Zhang, K. Song, Y. Sun, S. Tan, and M. Udell. 2019. Why Should You Trust My Explanation? Understanding Uncertainty in LIME Explanations. arXiv preprint arXiv:1904.12991(2019).

Cited By

Xie JWang ZYu ZDing YGuo B(2024)Prototype Learning for Medical Time Series Classification via Human–Machine CollaborationSensors10.3390/s2408265524:8(2655)Online publication date: 22-Apr-2024
https://doi.org/10.3390/s24082655
Raizonville ALambin X(2024)From black box to glass box: algorithmic explainability as a strategic decisionSSRN Electronic Journal10.2139/ssrn.3958902Online publication date: 2024
https://doi.org/10.2139/ssrn.3958902
Chang TXia HMahajan SMahajan RMaisog JVattikuti SChow CChang J(2024)Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to reduce preventable all-cause readmissions or deathPLOS ONE10.1371/journal.pone.030287119:5(e0302871)Online publication date: 9-May-2024
https://doi.org/10.1371/journal.pone.0302871
Show More Cited By

Index Terms

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

Index terms have been assigned to the content through auto-classification.

Recommendations

When Should We Use Linear Explanations?
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

The increasing interest in transparent and fair AI systems has propelled the research in explainable AI (XAI). One of the main research lines in XAI is post-hoc explainability, the task of explaining the logic of an already deployed black-box model. ...
RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning
AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that ...
Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities
While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

June 2022

2351 pages

ISBN:9781450393522

DOI:10.1145/3531146

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2022

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

FAccT '22

Sponsor:

ACM

FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency

June 21 - 24, 2022

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
1,817
Total Downloads

Downloads (Last 12 months)1,058
Downloads (Last 6 weeks)84

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xie JWang ZYu ZDing YGuo B(2024)Prototype Learning for Medical Time Series Classification via Human–Machine CollaborationSensors10.3390/s2408265524:8(2655)Online publication date: 22-Apr-2024
https://doi.org/10.3390/s24082655
Raizonville ALambin X(2024)From black box to glass box: algorithmic explainability as a strategic decisionSSRN Electronic Journal10.2139/ssrn.3958902Online publication date: 2024
https://doi.org/10.2139/ssrn.3958902
Chang TXia HMahajan SMahajan RMaisog JVattikuti SChow CChang J(2024)Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to reduce preventable all-cause readmissions or deathPLOS ONE10.1371/journal.pone.030287119:5(e0302871)Online publication date: 9-May-2024
https://doi.org/10.1371/journal.pone.0302871
Chen CLiao MSundar S(2024)When to Explain? Exploring the Effects of Explanation Timing on User Perceptions and Trust in AI systemsProceedings of the Second International Symposium on Trustworthy Autonomous Systems10.1145/3686038.3686066(1-17)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3686038.3686066
Mandler HWeigand B(2024)A review and benchmark of feature importance methods for neural networksACM Computing Surveys10.1145/367901256:12(1-30)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3679012
Verma SBoonsanong VHoang MHines KDickerson JShah C(2024)Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A ReviewACM Computing Surveys10.1145/3677119Online publication date: 9-Jul-2024
https://doi.org/10.1145/3677119
Moreira CChou YHsieh COuyang CPereira JJorge J(2024)Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxACM Computing Surveys10.1145/3672553Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672553
Jafari EBoratto LGena CMarras MGermanakos PPopescus E(2024)Balanced Explanations in Recommender SystemsAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3664915(25-29)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3631700.3664915
Carmichael ZLohit SCherian AJones MScheirer W(2024)Pixel-Grounded Prototypical Part Networks2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00470(4756-4767)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00470
Nannini LAlonso-Moral JCatalá ALama MBarro S(2024)Operationalizing Explainable Artificial Intelligence in the European Union Regulatory EcosystemIEEE Intelligent Systems10.1109/MIS.2024.338315539:4(37-48)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/MIS.2024.3383155
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents