Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3531146.3533153acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

Published: 20 June 2022 Publication History

Abstract

Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to “explain”. Many researchers suggest using post-hoc explanation algorithms for this purpose. In this paper, we combine legal, philosophical and technical arguments to show that post-hoc explanation algorithms are unsuitable to achieve the law’s objectives. Indeed, most situations where explanations are requested are adversarial, meaning that the explanation provider and receiver have opposing interests and incentives, so that the provider might manipulate the explanation for her own ends. We show that this fundamental conflict cannot be resolved because of the high degree of ambiguity of post-hoc explanations in realistic application scenarios. As a consequence, post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms. Instead, there is a need to more explicitly discuss the objectives underlying “explainability” obligations as these can often be better achieved through other mechanisms. There is an urgent need for a more open and honest discussion regarding the potential and limitations of post-hoc explanations in adversarial contexts, in particular in light of the current negotiations of the European Union’s draft Artificial Intelligence Act.

References

[1]
P. Achinstein. 1983. The Nature of Explanation. Oxford University Press, New York.
[2]
A. Adadi and M. Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160.
[3]
J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim. 2018. Sanity checks for saliency maps. In Neural Information Processing Systems (NeurIPS).
[4]
A.Karimi, G. Barthe, B. Schölkopf, and I. Valera. 2021. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arxiv:2010.04050
[5]
C. Anders, P. Pasliev, A. K. Dombrowski, K. R. Müller, and P. Kessel. 2020. Fairwashing explanations with off-manifold detergent. In International Conference on Machine Learning (ICML).
[6]
S. Barocas, M. Hardt, and A. Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.
[7]
S. Barocas, A. Selbst, and M. Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In ACM Conference on Fairness, Accountability, and Transparency.
[8]
R. B. Braithwaite. 1953. Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science. Cambridge University Press, Cambridge.
[9]
O. Camburu, E. Giunchiglia, J. Foerster, T. Lukasiewicz, and P. Blunsom. 2019. Can I trust the explainer? Verifying post-hoc explanatory methods. arXiv:1910.02065 (2019).
[10]
L. Chazette, W. Brunotte, and T. Speith. 2021. Exploring explainability: A definition, a model, and a knowledge catalogue. In IEEE 29th International Requirements Engineering Conference (RE).
[11]
European Commission. 2020. White Paper on Artificial Intelligence-A European approach to excellence and trust. Com (2020) 65 Final (2020).
[12]
I. Covert, S. Lundberg, and S.I. Lee. 2021. Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research (JMLR) 22, 209 (2021), 1–90.
[13]
F. Ding, M. Hardt, J. Miller, and L. Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In Neural Information Processing Systems (NeurIPS).
[14]
L. Edwards and M. Veale. 2017. Slave to the algorithm: Why a right to an explanation is probably not the remedy you are looking for. Duke Law and Technology Review 16 (2017).
[15]
D. Garreau and U. von Luxburg. 2020. Explaining the Explainer: A First Theoretical Analysis of LIME. In Conference on Artificial Intelligence and Statistics (AISTATS).
[16]
S. Ghalebikesabi, L. Ter-Minassian, K. DiazOrdaz, and C. C. Holmes. 2021. On locality of local explanation models. In Advances in Neural Information Processing Systems (NeurIPS).
[17]
C. Hempel. 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. Free Press, New York.
[18]
M. Hildebrandt. 2019. Privacy as protection of the incomputable self: From agnostic to agonistic machine learning. Theoretical Inquiries in Law 20, 1 (2019), 83–121.
[19]
A. Z. Jacobs and H. Wallach. 2021. Measurement and fairness. In ACM conference on Fairness, Accountability, and Transparency.
[20]
D. Janzing, L. Minorics, and P. Blöbaum. 2020. Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (AISTATS).
[21]
M. Kaminski and J. Urban. 2021. The Right to Contest AI. Columbia Law Review (2021).
[22]
L. Kästner, M. Langer, V. Lazar, A. Schomäcker, T. Speith, and S. Sterz. 2021. On the Relation of Trust and Explainability: Why to Engineer for Trustworthiness. In IEEE 29th International Requirements Engineering Conference Workshops (REW).
[23]
J. Kleinberg, J. Ludwig, S. Mullainathan, and C. Sunstein. 2018. Discrimination in the Age of Algorithms. Journal of Legal Analysis 10 (2018), 113–174.
[24]
R. Kommiya Mothilal, D. Mahajan, C. Tan, and A. Sharma. 2021. Towards unifying feature attribution and counterfactual explanations: Different means to the same end. In AAAI/ACM Conference on AI, Ethics, and Society.
[25]
S. Krishna, T. Han, A. Gu, J. Pombra, S. Jabbari, S. Wu, and H. Lakkaraju. 2022. The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective. arXiv preprint arXiv:2202.01602(2022).
[26]
M. Langer, D. Oster, T. Speith, H. Hermanns, L. Kästner, E. Schmidt, A. Sesing, and K. Baum. 2021. What do we want from Explainable Artificial Intelligence (XAI)? – A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence 296 (2021).
[27]
E. Lee, D. Braines, Mi. Stiffler, A. Hudler, and D. Harborne. 2019. Developing the sensitivity of LIME for better machine learning explanation. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications.
[28]
D. Lewis. 1973. Counterfactuals. Blackwell.
[29]
Q. V. Liao and K. R. Varshney. 2021. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv preprint arXiv:2110.10790(2021).
[30]
S. Lundberg and S. Lee. 2017. A unified approach to interpreting model predictions. In Neural Information Processing Systems (NeurIPS).
[31]
S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. I. Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.
[32]
G. Malgieri and G. Comandé. 2017. Why a Right to Legibility of Automated Decision-Making Exists in the General Data Protection Regulation. International Data Privacy Law 7, 4 (11 2017), 243–265.
[33]
C. Molnar. 2020. Interpretable machine learning. Lulu.com.
[34]
R. Mothilal, A. Sharma, and C. Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In ACM Conference on Fairness, Accountability, and Transparency.
[35]
High-Level Expert Group on AI. 2019. Ethics Guidelines for Trustworthy AI.
[36]
Working Party. 2016. Guidelines on Automated individual decision-making and Profiling for the purposes of RegulationGuidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679.
[37]
A. Paullada, I. Raji, E. Bender, E.and Denton, and A. Hanna. 2021. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (2021).
[38]
J. Pearl. 2000. Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge.
[39]
K. Popper. 1959. The Logic of Scientific Discovery. Hutchinson, London.
[40]
A. Reutlinger and J. Saatsi. 2018. Explanation Beyond Causation; Philosophical Perspectives on Non-Causal Explanations. Oxford University Press, Oxford.
[41]
M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. Why should i trust you? Explaining the predictions of any classifier. In 22nd ACM SIGKDD international conference on knowledge discovery and data mining.
[42]
C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.
[43]
W. Salmon. 1971. Statistical Explanation and Statistical Relevance. University of Pittsburgh Press, Pittsburgh, PA.
[44]
W. Salmon. 1989. Four Decades of Scientific Explanation. In Scientific Explanation, Kitcher and Salmon (Eds.). Minnesota Studies in the Philosophy of Science, Vol. 13. University of Minnesota Press, 3–219.
[45]
A. Selbst and J. Powles. 2018. Meaningful Information and the Right to Explanation. In ACM Conference on Fairness, Accountability, and Transparency.
[46]
D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju. 2021. Reliable post hoc explanations: Modeling uncertainty in explainability. In Neural Information Processing Systems (NeurIPS).
[47]
D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. 2020. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In AAAI/ACM Conference on AI, Ethics, and Society.
[48]
D. Slack, S. Hilgard, H. Lakkaraju, and S. Singh. 2021. Counterfactual Explanations Can Be Manipulated. arXiv:2106.02666 (2021).
[49]
P. Spirtes, C. Glymour, and R. Scheines. 1993. Causation, Prediction, and Search. Springer, Berlin.
[50]
W. Spohn. 1980. Stochastic independence, causal independence, and shieldability. Journal of Philosophical Logic 9 (1980), 73–99.
[51]
M. Sundararajan and A. Najmi. 2020. The many Shapley values for model explanation. In International Conference on Machine Learning (ICML).
[52]
R. Tomsett, D. Braines, D. Harborne, A. Preece, and S. Chakraborty. 2018. Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems. In ICML Workshop on Human Interpretability in Machine Learning.
[53]
P. Tschandl, C. Rinner, Z. Apalla, G. Argenziano, N. Codella, A. Halpern, M. Janda, A. Lallas, C. Longo, J. Malvehy, J. Paoli, S. Puig, C. Rosendahl, H. Soyer, I. Zalaudek, and H. Kittler. 2020. Human–computer collaboration for skin cancer recognition. Nature Medicine 26, 8 (2020), 1229–1234.
[54]
M. Veale and F. Zuiderveen Borgesius. 2021. Demystifying the Draft EU Artificial Intelligence Act—Analysing the good, the bad, and the unclear elements of the proposed approach. Computer Law Review International 22, 4 (2021), 97–112.
[55]
S. Venkatasubramanian and M. Alfano. 2020. The Philosophical Basis of Algorithmic Recourse. In ACM Conference on Fairness, Accountability, and Transparency.
[56]
G. Vilone and L. Longo. 2021. Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion 76(2021), 89–106.
[57]
W. J. von Eschenbach. 2021. Transparency and the Black Box Problem: Why We Do Not Trust AI. Philos. Technol. 34(2021), 1607–1622.
[58]
U. von Luxburg, R. Williamson, and I. Guyon. 2012. Clustering: Science or Art?JMLR Workshop and Conference Proceedings (Workshop on Unsupervised Learning and Transfer Learning)(2012), 65 – 79.
[59]
S. Wachter, B. Mittelstadt, and L. Floridi. 2017. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law 7, 2 (06 2017), 76–99.
[60]
S. Wachter, B. Mittelstadt, and C. Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31(2017), 841.
[61]
J. Woodward. 2003. Making Things Happen: A Theory of Causal Explanation. Oxford University Press.
[62]
J. Woodward and L. Ross. 2003. Scientific Explanation. The Stanford Encyclopedia of Philosophy (Summer Edition 2021) (2003). https://plato.stanford.edu/archives/sum2021/entries/scientific-explanation/
[63]
C. Zednik and H. Boelsen. forthcoming. Scientific Exploration and Explainable Artificial Intelligence. Minds and Machines(forthcoming).
[64]
Y. Zhang, K. Song, Y. Sun, S. Tan, and M. Udell. 2019. Why Should You Trust My Explanation? Understanding Uncertainty in LIME Explanations. arXiv preprint arXiv:1904.12991(2019).

Cited By

View all
  • (2024)Prototype Learning for Medical Time Series Classification via Human–Machine CollaborationSensors10.3390/s2408265524:8(2655)Online publication date: 22-Apr-2024
  • (2024)From black box to glass box: algorithmic explainability as a strategic decisionSSRN Electronic Journal10.2139/ssrn.3958902Online publication date: 2024
  • (2024)Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to reduce preventable all-cause readmissions or deathPLOS ONE10.1371/journal.pone.030287119:5(e0302871)Online publication date: 9-May-2024
  • Show More Cited By

Index Terms

  1. Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
          June 2022
          2351 pages
          ISBN:9781450393522
          DOI:10.1145/3531146
          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 June 2022

          Check for updates

          Author Tags

          1. Artificial Intelligence Act
          2. Counterfactual Explanations
          3. Explainability
          4. GDPR
          5. LIME
          6. Regulation
          7. SHAP
          8. Transparency

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          FAccT '22
          Sponsor:

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)1,058
          • Downloads (Last 6 weeks)84
          Reflects downloads up to 10 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Prototype Learning for Medical Time Series Classification via Human–Machine CollaborationSensors10.3390/s2408265524:8(2655)Online publication date: 22-Apr-2024
          • (2024)From black box to glass box: algorithmic explainability as a strategic decisionSSRN Electronic Journal10.2139/ssrn.3958902Online publication date: 2024
          • (2024)Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to reduce preventable all-cause readmissions or deathPLOS ONE10.1371/journal.pone.030287119:5(e0302871)Online publication date: 9-May-2024
          • (2024)When to Explain? Exploring the Effects of Explanation Timing on User Perceptions and Trust in AI systemsProceedings of the Second International Symposium on Trustworthy Autonomous Systems10.1145/3686038.3686066(1-17)Online publication date: 16-Sep-2024
          • (2024)A review and benchmark of feature importance methods for neural networksACM Computing Surveys10.1145/367901256:12(1-30)Online publication date: 19-Jul-2024
          • (2024)Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A ReviewACM Computing Surveys10.1145/3677119Online publication date: 9-Jul-2024
          • (2024)Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxACM Computing Surveys10.1145/3672553Online publication date: 12-Jun-2024
          • (2024)Balanced Explanations in Recommender SystemsAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3664915(25-29)Online publication date: 27-Jun-2024
          • (2024)Pixel-Grounded Prototypical Part Networks2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00470(4756-4767)Online publication date: 3-Jan-2024
          • (2024)Operationalizing Explainable Artificial Intelligence in the European Union Regulatory EcosystemIEEE Intelligent Systems10.1109/MIS.2024.338315539:4(37-48)Online publication date: 1-Jul-2024
          • Show More Cited By

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media