Meaningfully Debugging Model Mistakes using Conceptual Counterfactual Explanations

Abid, Abubakar; Yuksekgonul, Mert; Zou, James

Computer Science > Machine Learning

arXiv:2106.12723v3 (cs)

[Submitted on 24 Jun 2021 (v1), last revised 14 Jun 2022 (this version, v3)]

Title:Meaningfully Debugging Model Mistakes using Conceptual Counterfactual Explanations

Authors:Abubakar Abid, Mert Yuksekgonul, James Zou

View PDF

Abstract:Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases. However, this is often an ad hoc process that involves manually looking at the model's mistakes on many test samples and guessing at the underlying reasons for those incorrect predictions. In this paper, we propose a systematic approach, conceptual counterfactual explanations (CCE), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts (e.g. this zebra is misclassified as a dog because of faint stripes). We base CCE on two prior ideas: counterfactual explanations and concept activation vectors, and validate our approach on well-known pretrained models, showing that it explains the models' mistakes meaningfully. In addition, for new models trained on data with spurious correlations, CCE accurately identifies the spurious correlation as the cause of model mistakes from a single misclassified test sample. On two challenging medical applications, CCE generated useful insights, confirmed by clinicians, into biases and mistakes the model makes in real-world settings.

Comments:	ICML 2022
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2106.12723 [cs.LG]
	(or arXiv:2106.12723v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.12723

Submission history

From: Mert Yuksekgonul [view email]
[v1] Thu, 24 Jun 2021 01:49:55 UTC (2,877 KB)
[v2] Tue, 12 Oct 2021 00:51:20 UTC (4,227 KB)
[v3] Tue, 14 Jun 2022 22:12:48 UTC (4,824 KB)

Computer Science > Machine Learning

Title:Meaningfully Debugging Model Mistakes using Conceptual Counterfactual Explanations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Meaningfully Debugging Model Mistakes using Conceptual Counterfactual Explanations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators