Abstract
In this paper, we propose leveraging causal generative learning as an interpretable tool for explaining image classifiers. Specifically, we present a generative counterfactual inference approach to study the influence of visual features (pixels) as well as causal factors through generative learning. To this end, we first uncover the most influential pixels on a classifier’s decision by computing both Shapely and contrastive explanations for counterfactual images with different attribute values. We then establish a Monte Carlo mechanism using the generator of a causal generative model in order to adapt Shapley explainers to produce feature importances for the human-interpretable attributes of a causal dataset. This method is applied to the case where a classifier has been trained exclusively on the images of the causal dataset. Finally, we present optimization methods for creating counterfactual explanations of classifiers by means of counterfactual inference, proposing straightforward approaches for both differentiable and arbitrary classifiers. We exploit the Morpho-MNIST causal dataset as a case study for exploring our proposed methods for generating counterfactual explanations. However, our methods are applicable also to other causal datasets containing image data. We employ visual explanation methods from the OmnixAI open source toolkit to compare them with our proposed methods. By employing quantitative metrics to measure the interpretability of counterfactual explanations, we find that our proposed methods of counterfactual explanation offer more interpretable explanations compared to those generated from OmnixAI. This finding suggests that our methods are well-suited for generating highly interpretable counterfactual explanations on causal datasets.
Similar content being viewed by others
Data availability
The source code for our experiments is available in a public Github repository at https://github.com/wtaylor17/CDGMExplainers. The Morpho-MNIST dataset used in this work is available online at https://github.com/dccastro/Morpho-MNIST.
Notes
The original MNIST dataset may be downloaded from http://yann.lecun.com/exdb/mnist.
References
de Castro DC, Tan J, Kainz B, et al (2018) Morpho-mnist: Quantitative assessment and diagnostics for representation learning. CoRR arXiv:1809.10780
Dash S, Balasubramanian VN, Sharma A (2022) Evaluating and mitigating bias in image classifiers: a causal perspective using counterfactuals. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 915–924
Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29(6):141–142
Dhurandhar A, Chen PY, Luss R et al (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. arXiv:1802.07623
Dinh L, Krueger D, Bengio Y (2014) Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516
Donahue J, Krähenbühl P, Darrell T (2016) Adversarial feature learning. arXiv preprint arXiv:1605.09782
Dumoulin V, Belghazi I, Poole B, et al (2016) Adversarially learned inference. arXiv preprint arXiv:1606.00704
Dwivedi R, Dave D, Naik H et al (2023) Explainable ai (xai): core ideas, techniques, and solutions. ACM Comput Surv 55(9):1–33
Feng R, Lin Z, Zhu J, et al (2021) Uncertainty principles of encoding gans. In: International conference on machine learning, PMLR, pp 3240–3251
Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Goudet O, Kalainathan D, Caillou P et al (2018) Learning functional causal models with generative neural networks. Explainable and interpretable models in computer vision and machine learning, pp 39–80
Hvilshøj F, Iosifidis A, Assent I (2021a) Ecinn: efficient counterfactuals from invertible neural networks. arXiv preprint arXiv:2103.13701
Hvilshøj F, Iosifidis A, Assent I (2021b) On quantitative evaluations of counterfactuals. arXiv preprint arXiv:2111.00177
Kobyzev I, Prince SJ, Brubaker MA (2020) Normalizing flows: an introduction and review of current methods. IEEE Trans Pattern Anal Mach Intell 43(11):3964–3979
Kocaoglu M, Snyder C, Dimakis AG, et al (2017) Causalgan: learning causal implicit generative models with adversarial training. CoRR arXiv:1709.02023
Liu X, Zou Y, Kong L et al (2018) Data augmentation via latent space interpolation for image classification. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 728–733
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S et al (eds) advances in neural information processing systems 30. Curran Associates, Inc., pp 4765–4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Mahajan D, Tan C, Sharma A (2019) Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277
O’Shaughnessy M, Canal G, Connor M, et al (2020) Generative causal explanations of black-box classifiers. arXiv:2006.13913
Parafita Á, Vitrià J (2019) Explaining visual models by causal attribution. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW). IEEE, pp 4167–4175
Pawlowski N, Coelho de Castro D, Glocker B (2020) Deep structural causal models for tractable counterfactual inference. Adv Neural Inf Process Syst 33:857–869
Pearl J (2009) Causality. Cambridge University Press, Cambridge
Rezende D, Mohamed S (2015) Variational inference with normalizing flows. In: International conference on machine learning. PMLR, pp 1530–1538
Schut L, Key O, Mc Grath R et al (2021) Generating interpretable counterfactual explanations by implicit minimisation of epistemic and aleatoric uncertainties. In: International conference on artificial intelligence and statistics. PMLR, pp 1756–1764
Schölkopf B, von Kügelgen J (2022) From statistical to causal learning. https://doi.org/10.48550/ARXIV.2204.00607, arXiv:2204.00607
Van Looveren A, Klaise J (2021) Interpretable counterfactual explanations guided by prototypes. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 650–665
Van Looveren A, Klaise J, Vacanti G et al (2021) Conditional generative models for counterfactual explanations. arXiv preprint arXiv:2101.10123
Wachter S, Mittelstadt B, Russell C (2018) Counterfactual explanations without opening the black box: automated decisions and the gdpr. arXiv:1711.00399
Yang F, Alva SS, Chen J et al (2021) Model-based counterfactual synthesizer for interpretation. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 1964–1974
Funding
This work was supported by the Natural Sciences and Engineering Research Council of Canada [grant number 550722].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Taylor-Melanson, W., Sadeghi, Z. & Matwin, S. Causal generative explainers using counterfactual inference: a case study on the Morpho-MNIST dataset. Pattern Anal Applic 27, 89 (2024). https://doi.org/10.1007/s10044-024-01306-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01306-8