research-article

Public Access

Explaining machine learning classifiers through diverse counterfactual explanations

Authors:

Ramaravind K. Mothilal,

Chenhao TanAuthors Info & Claims

FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

Pages 607 - 617

https://doi.org/10.1145/3351095.3372850

Published: 27 January 2020 Publication History

Abstract

Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on four real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries, outperforming prior approaches to generating diverse counterfactuals. We provide an implementation of the framework at https://github.com/microsoft/DiCE.

Supplementary Material

PDF File (p607-mothilal-supp.pdf)

Supplemental material.

Download
364.36 KB

References

[1]

[n.d.]. Lending Club Statistics. https://www.lendingclub.com/info/download-data.action.

[2]

Accessed 2019. German credit dataset. https://archive.ics.uci.edu/ml/support/statlog+(german+credit+data).

[3]

Monica Andini, Emanuele Ciani, Guido de Blasio, Alessio D'Ignazio, and Viola Salvestrini. 2017. Targeting policy-compliers with machine learning: an application to a tax rebate programme in Italy. (2017).

[4]

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. "Machine bias: There's software used across the country to predict future criminals. And it's biased against blacks". https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

[5]

Susan Athey. 2017. Beyond prediction: Using big data for policy problems. Science 355, 6324 (2017), 483--485.

[6]

Sarah R Beck, Kevin J Riggs, and Sarah L Gorniak. 2009. Relating developments in childrenś counterfactual thinking and executive functions. Thinking & reasoning.

[7]

Daphna Buchsbaum, Sophie Bridgers, Deena Skolnick Weisberg, and Alison Gopnik. 2012. The power of possibility: Causal learning, counterfactual reasoning, and pretend play. Philosophical Trans. of the Royal Soc. B: Biological Sciences (2012).

[8]

M Kate Bundorf and Helena Szrek. 2010. Choice set size and decision making: the case of Medicare Part D prescription drug plans. Medical Decision Making 30, 5 (2010), 582--593.

[9]

Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of KDD.

Digital Library

[10]

Mark Craven and Jude W Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in neural information processing systems.

[11]

Wuyang Dai, Theodora S Brisimi, William G Adams, Theofanie Mela, Venkatesh Saligrama, and Ioannis Ch Paschalidis. 2015. Prediction of hospitalization due to heart diseases by supervised learning methods. International journal of medical informatics 84, 3 (2015), 189--197.

[12]

Kevin Davenport. 2015. Lending Club Data Analysis Revisited with Python. http://kldavenport.com/lending-club-data-analysis-revisted-with-python/.

[13]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).

[14]

Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science advances 4, 1 (2018), eaao5580.

[15]

Michael D Ekstrand, F Maxwell Harper, Martijn C Willemsen, and Joseph A Konstan. 2014. User perception of differences in recommender algorithms. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 161--168.

Digital Library

[16]

Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. 2018. Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018).

[17]

JFdarre. 2015. Project 1: Lending Club's data. https://rpubs.com/jfdarre/119147.

[18]

Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In Proceedings of NIPS.

Digital Library

[19]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR.

[20]

Ronny Kohavi and Barry Becker. 1996. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/adult

[21]

Alex Kulesza, Ben Taskar, et al. 2012. Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning 5, 2--3 (2012), 123--286.

[22]

Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems-A survey. Knowledge-Based Systems 123 (2017), 154--162.

Digital Library

[23]

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4066--4076.

[24]

Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proc. KDD.

Digital Library

[25]

Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016).

[26]

Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of KDD.

Digital Library

[27]

Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013. Accurate intelligible models with pairwise interactions. In Proceedings of KDD.

Digital Library

[28]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of NIPS.

Digital Library

[29]

Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of CVPR.

[30]

PAIR. 2018. What-If Tool. https://pair-code.github.io/what-if-tool/.

[31]

Judea Pearl. 2009. Causality. Cambridge university press.

[32]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of KDD.

Digital Library

[33]

Jonah E Rockoff, Brian A Jacob, Thomas J Kane, and Douglas O Staiger. 2011. Can you recognize an effective teacher when you recruit one? Education finance and Policy 6, 1 (2011), 43--74.

[34]

Chris Russell. 2019. Efficient Search for Diverse Coherent Explanations. In Proceedings of FAT*.

Digital Library

[35]

Mark Sanderson, Jiayu Tang, Thomas Arni, and Paul Clough. 2009. What else is there? search diversity examined. In European Conference on Information Retrieval. Springer, 562--569.

Digital Library

[36]

Benjamin Scheibehenne, Rainer Greifeneder, and Peter M Todd. 2010. Can there ever be too many options? A meta-analytic review of choice overload. Journal of consumer research 37, 3 (2010), 409--425.

[37]

S Tan, R Caruana, G Hooker, and Y Lou. 2017. Distill-and-compare: Auditing black-box models using transparent model distillation. (2017).

[38]

Richard Tomsett, Dave Braines, Dan Harborne, Alun Preece, and Supriyo Chakraborty. 2018. Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems. arXiv preprint arXiv:1806.07552 (2018).

[39]

Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR.

[40]

Austin Waters and Risto Miikkulainen. 2014. Grade: Machine learning support for graduate admissions. AI Magazine 35, 1 (2014), 64.

Digital Library

[41]

Deena S Weisberg and Alison Gopnik. 2013. Pretense, counterfactuals, and Bayesian causal models: Why what is not real really matters. Cognitive Science (2013).

[42]

Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of ECCV.

[43]

Bolei Zhou, Yiyou Sun, David Bau, and Antonio Torralba. 2018. Interpretable basis decomposition for visual explanation. In Proceedings of ECCV.

[44]

Haojun Zhu. 2016. Predicting Earning Potential using the Adult Dataset. https://rpubs.com/H_Zhu/235617.

[45]

Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web. ACM, 22--32.

Digital Library

Cited By

Belle V(2025)Counterfactual Explanations as PlansElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.416.14416(153-167)Online publication date: 13-Feb-2025
https://doi.org/10.4204/EPTCS.416.14
Dritsas ETrigka M(2025)Exploring the Intersection of Machine Learning and Big Data: A SurveyMachine Learning and Knowledge Extraction10.3390/make70100137:1(13)Online publication date: 7-Feb-2025
https://doi.org/10.3390/make7010013
Cornelio JMohd Razak SCho YLiu HVaidya RJafarpour B(2025)Transformer Neural Networks for Behavior-Centric Production Forecasting in Unconventional ReservoirSPE Journal10.2118/212953-PA(1-18)Online publication date: 27-Feb-2025
https://doi.org/10.2118/212953-PA
Show More Cited By

Recommendations

Counterfactual explanations for misclassified images: How human and machine explanations differ
Abstract
Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of black-box deep-learning systems because people easily understand them, they apply across different problem ...
Highlights
- A novel user-centric methodology where users create counterfactual explanations.
- Human-generated counterfactuals differ significantly from machine-generated ones.
- A novel interpretation of this difference using the notion of “...
Human performance effects of combining counterfactual explanations with normative and contrastive explanations in supervised machine learning for automated decision assistance
Highlights
- Counterfactual explanations gained popularity as a solution for aiding causal understanding and explain the reasons behind machine learning outputs.
- Empirical data on the influence of counterfactuals on human decisions is scarce ...
Abstract
Counterfactual explanations have emerged as a popular solution for elucidating the reasons behind machine learning predictions due to their contribution in supporting people's understanding of causality. Despite psychological research suggesting ...
Counterfactual Explanations for Reinforcement Learning Agents
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Reinforcement learning (RL) algorithms often use neural networks to represent agent's policy, making them difficult to interpret. Counterfactual explanations are human-friendly explanations which offer users actionable advice on how to change their ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

January 2020

895 pages

ISBN:9781450369367

DOI:10.1145/3351095

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

NSF

Conference

FAT* '20

Sponsor:

ACM

FAT* '20: Conference on Fairness, Accountability, and Transparency

January 27 - 30, 2020

Barcelona, Spain

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

567
Total Citations
View Citations
11,107
Total Downloads

Downloads (Last 12 months)2,791
Downloads (Last 6 weeks)385

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Belle V(2025)Counterfactual Explanations as PlansElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.416.14416(153-167)Online publication date: 13-Feb-2025
https://doi.org/10.4204/EPTCS.416.14
Dritsas ETrigka M(2025)Exploring the Intersection of Machine Learning and Big Data: A SurveyMachine Learning and Knowledge Extraction10.3390/make70100137:1(13)Online publication date: 7-Feb-2025
https://doi.org/10.3390/make7010013
Cornelio JMohd Razak SCho YLiu HVaidya RJafarpour B(2025)Transformer Neural Networks for Behavior-Centric Production Forecasting in Unconventional ReservoirSPE Journal10.2118/212953-PA(1-18)Online publication date: 27-Feb-2025
https://doi.org/10.2118/212953-PA
Wu CShirley RMilinovich ALiu KMireles-Cabodevila EKhouli HDuggal ABhattacharyya A(2025)Exploring timely and safe discharge from ICU: a comparative study of machine learning predictions and clinical practicesIntensive Care Medicine Experimental10.1186/s40635-025-00717-z13:1Online publication date: 24-Jan-2025
https://doi.org/10.1186/s40635-025-00717-z
Lee HBelitz CNasiar NBosch N(2025)XAI Reveals the Causes of Attention Deficit Hyperactivity Disorder (ADHD) Bias in Student Performance PredictionProceedings of the 15th International Learning Analytics and Knowledge Conference10.1145/3706468.3706521(418-428)Online publication date: 3-Mar-2025
https://dl.acm.org/doi/10.1145/3706468.3706521
chander BJohn CWarrier LGopalakrishnan K(2025)Toward Trustworthy Artificial Intelligence (TAI) in the Context of Explainability and RobustnessACM Computing Surveys10.1145/367539257:6(1-49)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3675392
Moreira CChou YHsieh COuyang CPereira JJorge J(2025)Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxACM Computing Surveys10.1145/367255357:6(1-37)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3672553
Chen CWang D(2025)CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal RetrievalIEEE Access10.1109/ACCESS.2025.352994213(12734-12745)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3529942
Zhang JMu LZhang DChen ZRajbhandari-Thapa JPagán JLi YMai GZhou Z(2025)SpaCE: a spatial counterfactual explainable deep learning model for predicting out-of-hospital cardiac arrest survival outcomeInternational Journal of Geographical Information Science10.1080/13658816.2024.2443757(1-32)Online publication date: 28-Jan-2025
https://doi.org/10.1080/13658816.2024.2443757
Murphy MNoordhoek KGathmann SDauenhauer PBartel C(2025)Catalytic resonance theory: forecasting the flow of programmable catalytic loopsDigital Discovery10.1039/D4DD00216D4:2(411-423)Online publication date: 2025
https://doi.org/10.1039/D4DD00216D
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten