Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

Li, Xuhong; Xiong, Haoyi; Li, Xingjian; Wu, Xuanyu; Zhang, Xiao; Liu, Ji; Bian, Jiang; Dou, Dejing

doi:10.1007/s10115-022-01756-8

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

Survey Paper
Published: 14 September 2022

Volume 64, pages 3197–3234, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xuhong Li¹,
Haoyi Xiong¹,
Xingjian Li¹,
Xuanyu Wu²,
Xiao Zhang³,
Ji Liu¹,
Jiang Bian¹ &
…
Dejing Dou^1,4

8773 Accesses
127 Citations
1 Altmetric
Explore all metrics

Abstract

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction results of deep models. In recent years, many interpretation tools have been proposed to explain or reveal how deep models make decisions. In this paper, we review this line of research and try to make a comprehensive survey. Specifically, we first introduce and clarify two basic concepts—interpretations and interpretability—that people usually get confused about. To address the research efforts in interpretations, we elaborate the designs of a number of interpretation algorithms, from different perspectives, by proposing a new taxonomy. Then, to understand the interpretation results, we also survey the performance metrics for evaluating interpretation algorithms. Further, we summarize the current works in evaluating models’ interpretability using “trustworthy” interpretation algorithms. Finally, we review and discuss the connections between deep models’ interpretations and other factors, such as adversarial robustness and learning from interpretations, and we introduce several open-source libraries for interpretation algorithms and evaluation approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two to Trust: AutoML for Safe Modelling and Interpretable Deep Learning for Robustness

Considerations for Evaluation and Generalization in Interpretable Machine Learning

Explaining neural networks without access to training data

Article Open access 10 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The subtle differences among interpretation, explanation, and attribution are not considered in this paper, and we use them interchangeably.
Without any limits, even a rule-based model may be too complex for a human to understand the model [107, 141]. This is also the motivation of several works that pursue the sparsity of explanation results [137].
We also note that whether the usage of deep models improves the recommendation system is an open discussion [42], but this is out of the scope of this survey.
https://github.com/sicara/tf-explain.
https://github.com/pytorch/captum.
https://github.com/PaddlePaddle/InterpretDL.
https://github.com/interpretml/interpret.
https://github.com/Trusted-AI/AIX360.
https://github.com/PAIR-code/lit.

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org
Abnar S, Zuidema WH (2020) Quantifying attention flow in transformers. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL. Association for Computational Linguistics
Adebayo J, Gilmer J, Muelly M, Goodfellow IJ, Hardt M, Kim B (2018) Sanity checks for saliency maps. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018 (NeurIPS 2018), December 3–8, 2018, Montréal, Canada, pp 9525–9536 (2018)
Adebayo J, Muelly M, Liccardi I, Kim B (2020) Debugging tests for model explanations. In: Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020 (NeurIPS 2020), December 6–12, 2020
Agarwal R, Melnick L, Frosst N, Zhang X, Lengerich BJ, Caruana R, Hinton GE (2021) Neural additive models: interpretable machine learning with neural nets. In: Ranzato M, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021 (NeurIPS 2021), December 6–14, 2021, pp 4699–4711 (2021)
Ahern I, Noack A, Guzman-Nateras L, Dou D, Li B, Huan J (2019) Normlime: a new feature importance metric for explaining deep neural networks. CoRR, arXiv:1909.04200
Ahn J, Kwak S (2018) Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR 2018), Salt Lake City, UT, USA, June 18–22, 2018. Computer Vision Foundation/IEEE Computer Society, pp 4981–4990
Alvarez-Melis D, Jaakkola TS (2018) On the robustness of interpretability methods. CoRR arXiv:1806.08049
Ancona M, Ceolini E, Öztireli C, Gross M (2018) Towards better understanding of gradient-based attribution methods for deep neural networks. In: 6th International conference on learning representations (ICLR 2018), Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net
Andrychowicz M, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J, Sidor S, Tobin J, Welinder P, Weng L, Zaremba W (2020) Learning dexterous in-hand manipulation. Int J Robot Res 39(1):66
Article Google Scholar
Antorán J, Bhatt U, Adel T, Weller A, Hernández-Lobato JM (2021) Getting a CLUE: a method for explaining uncertainty estimates. In: 9th International conference on learning representations (ICLR 2021), virtual event, Austria, May 3–7, 2021. OpenReview.net
André A, Barbara H (2019) On the computation of counterfactual explanations—a survey. CoRR, arXiv:1911.07749 (2019)
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38
Article Google Scholar
Atanasova P, Simonsen JG, Lioma C, Augenstein I (2020) Generating fact checking explanations. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics (ACL). Association for Computational Linguistics
Atrey A, Clary K, Jensen DD (2020) Exploratory not explanatory: counterfactual analysis of saliency maps for deep reinforcement learning. In: 8th International conference on learning representations (ICLR 2020), Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 6:66
Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) International conference on learning representations
Bajaj M, Chu L, Xue ZY, Pei J, Wang L, Lam PC-H, Zhang Y (2021) Robust counterfactual explanations on graph neural networks. In: Ranzato MA, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021 (NeurIPS 2021), December 6–14, 2021, virtual, pp 5644–5655
Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Commun ACM 6:66
Article Google Scholar
Baldassarre F, Azizpour H (2019) Explainability techniques for graph convolutional networks. CoRR, arXiv:1905.13686
Baldock RJN, Maennel H, Neyshabur B (2021) Deep learning through the lens of example difficulty. In: Ranzato MA, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021 (NeurIPS 2021), December 6–14, 2021, virtual, pp 10876–10889
Bansal N, Agarwal C, Nguyen A (2020) SAM: the sensitivity of attribution methods to hyperparameters. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR 2020), Seattle, WA, USA, June 13–19, 2020, pp 8670–8680. Computer Vision Foundation/IEEE
Barbalau A, Cosma A, Ionescu RT, Popescu M (2020) A generic and model-agnostic exemplar synthetization framework for explainable AI. In: Hutter F, Kersting K, Lijffijt J, Valera I (eds) Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part II, volume 12458 of lecture notes in computer science. Springer, pp 190–205
Bau D, Zhou B, Khosla A, Oliva A, Torralba A (2017) Network dissection: quantifying interpretability of deep visual representations. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR 2017), Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 3319–3327
Bau D, Zhu J-Y, Strobelt H, Zhou B, Tenenbaum JB, Freeman WT, Torralba A (2019) GAN dissection: visualizing and understanding generative adversarial networks. In: 7th International conference on learning representations (ICLR 2019), New Orleans, LA, USA, May 6–9, 2019. OpenReview.net (2019)
Bien J, Tibshirani R (2011) Prototype selection for interpretable classification. Ann Appl Stat 6:66
MathSciNet MATH Google Scholar
Binder A, Montavon G, Lapuschkin S, Müller K-R, Samek W (2016) Layer-wise relevance propagation for neural networks with local renormalization layers. In: Villa AEP, Masulli P, Rivero AP (eds) Artificial neural networks and machine learning—ICANN 2016—25th international conference on artificial neural networks, Barcelona, Spain, September 6–9, 2016, Proceedings, Part II, volume 9887 of lecture notes in computer science. Springer, pp 63–71
Carlini N, Erlingsson Ú, Papernot N (2019) Distribution density, tails, and outliers in machine learning: metrics and applications. CoRR, arXiv:1910.13427
Carvalho DV, Pereira EM, Cardoso JS (2019) Machine learning interpretability: a survey on methods and metrics. Electronics 6:66
Google Scholar
Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D (2018) Adversarial attacks and defences: a survey. CoRR, arXiv:1810.00069
Chang C-H, Creager E, Goldenberg A, Duvenaud D (2019) Explaining image classifiers by counterfactual generation. In: 7th International conference on learning representations (ICLR 2019), New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
Chattopadhyay A, Sarkar A, Howlader P, Balasubramanian VN (2018) Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE winter conference on applications of computer vision (WACV 2018), Lake Tahoe, NV, USA, March 12–15, 2018. IEEE Computer Society, pp 839–847
Chefer H, Gur S, Wolf L (2021) Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In: 2021 IEEE/CVF international conference on computer vision (ICCV 2021), Montreal, QC, Canada, October 10–17, 2021. IEEE, pp 387–396
Chefer H, Gur S, Wolf L (2021) Transformer interpretability beyond attention visualization. In: IEEE conference on computer vision and pattern recognition (CVPR 2021), virtual, June 19–25, 2021. Computer Vision Foundation/IEEE, pp 782–791
Chen C, Li O, Tao D, Barnett A, Rudin C, Su J (2019) This looks like that: deep learning for interpretable image recognition. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 8928–8939
Chen C, Zhang M, Liu Y, Ma S (2018) Neural attentional rating regression with review-level explanations. In: Champin P-A, Gandon F, Lalmas M, Ipeirotis PG (eds) Proceedings of the 2018 World Wide Web conference on World Wide Web (WWW 2018), Lyon, France, April 23–27, 2018. ACM, pp 1583–1592
Chen X, Liu C, Li B, Lu K, Song D (2017) Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, arXiv:1712.05526
Chen Y, Li B, Yu H, Wu P, Miao C (2021) Hydra: hypergradient data relevance analysis for interpreting deep neural networks. In: Thirty-fifth AAAI conference on artificial intelligence (AAAI 2021), thirty-third conference on innovative applications of artificial intelligence (IAAI 2021), the eleventh symposium on educational advances in artificial intelligence (EAAI 2021), virtual event, February 2–9, 2021. AAAI Press, pp 7081–7089 (2021)
Cheng H-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong L, Jain V, Liu X, Shah H (2016) Wide & deep learning for recommender systems. In: Karatzoglou A, Hidasi B, Tikk D, Shalom OS, Roitman H, Shapira B, Rokach L (eds) Proceedings of the 1st workshop on deep learning for recommender systems, DLRS@RecSys 2016, Boston, MA, USA, September 15, 2016. ACM, pp 7–10
Covington P, Adams J, Sargin E (2016) Deep neural networks for youtube recommendations. In: Sen S, Geyer W, Freyne J, Castells P (eds) Proceedings of the 10th ACM conference on recommender systems, Boston, MA, USA, September 15–19, 2016. ACM, pp 191–198
Croce F, Andriushchenko M, Sehwag V, Debenedetti E, Flammarion N, Chiang M, Mittal P, Hein M (2020) Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670
Dacrema MF, Cremonesi P, Jannach D (2019) Are we really making much progress? A worrying analysis of recent neural recommendation approaches. In: Bogers T, Said A, Brusilovsky P, Tikk D (eds) Proceedings of the 13th ACM conference on recommender systems (RecSys 2019), Copenhagen, Denmark, September 16–20, 2019. ACM, pp 101–109
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA. IEEE Computer Society, pp 248–255
Desai S, Ramaswamy HG (2020) Ablation-CAM: visual explanations for deep convolutional network via gradient-free localization. In: IEEE winter conference on applications of computer vision (WACV 2020), Snowmass Village, CO, USA, March 1–5, 2020. IEEE, pp 972–980 (2020)
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American Chapter of the Association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186
Dong Y, Su H, Zhu J, Zhang B (2017) Improving interpretability of deep neural networks with semantic information. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR 2017), Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 975–983
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International conference on learning representations (ICLR 2021), virtual event, Austria, May 3–7, 2021. OpenReview.net
Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. In: 2018 IEEE international conference on machine learning workshops
Etmann C, Lunz S, Maass P, Schönlieb C (2019) On the connection between adversarial robustness and saliency map interpretability. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning (ICML 2019), 9–15 June 2019, Long Beach, CA, USA, volume 97 of proceedings of machine learning research (PMLR), pp 1823–1832
Faber L, Moghaddam AK, Wattenhofer R (2020) Contrastive graph neural network explanation. CoRR, arXiv:2010.13663
Feldman V (2020) Does learning require memorization? A short tale about a long tail. In: Makarychev K, Makarychev Y, Tulsiani M, Kamath G, Chuzhoy J (eds) Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing (STOC 2020), Chicago, IL, USA, June 22–26, 2020. ACM, pp 954–959
Feldman V, Zhang C (2020) What neural networks memorize and why: discovering the long tail via influence estimation. In: Larochelle H, Ranzato MA, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020 (NeurIPS 2020), December 6–12, 2020, virtual
Fong R, Patrick M, Vedaldi A (2019) Understanding deep networks via extremal perturbations and smooth masks. In: 2019 IEEE/CVF international conference on computer vision (ICCV 2019), Seoul, Korea (South), October 27–November 2, 2019. IEEE, pp 2950–2958
Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: IEEE international conference on computer vision (ICCV 2017), Venice, Italy, October 22–29, 2017. IEEE Computer Society, pp 3449–3457
Friedler SA, Roy CD, Scheidegger C, Slack D (2019) Assessing the local interpretability of machine learning models. CoRR, arXiv:1902.03501
Frosst N, Hinton GE (2017) Distilling a neural network into a soft decision tree. In: Besold TR, Kutz O (eds) Proceedings of the first international workshop on comprehensibility and explanation in AI and ML 2017 co-located with 16th international conference of the Italian Association for artificial intelligence (AI*IA 2017), Bari, Italy, November 16th and 17th, 2017, volume 2071 of CEUR workshop proceedings. CEUR-WS.org
Geirhos R, Jacobsen J-H, Michaelis C, Zemel RS, Brendel W, Bethge M, Wichmann FA (2020) Shortcut learning in deep neural networks. Nat Mach Intell 2(11):665–673
Article Google Scholar
Geirhos R, Narayanappa K, Mitzkus B, Thieringer T, Bethge M, Wichmann FA, Brendel W (2021) Partial success in closing the gap between human and machine vision. In: Ranzato MA, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021 (NeurIPS 2021), December 6–14, 2021, virtual, pp 23885–23899
Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann FA, Brendel W (2019) Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In: 7th International conference on learning representations (ICLR 2019), New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
Ghaeini R, Fern XZ, Tadepalli P (2018) Interpreting recurrent and attention-based neural models: a case study on natural language inference. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics
Goh GB, Hodas NO, Vishnu A (2017) Deep learning for computational chemistry. J Comput Chem 38(16):1291–1307
Article Google Scholar
Gomez-Uribe CA, Hunt N (2016) The netflix recommender system: algorithms, business value, and innovation. ACM Trans Manag Inf Syst 6(4):131–1319
Article Google Scholar
Goyal Y, Wu Z, Ernst J, Batra D, Parikh D, Lee S (2019) Counterfactual visual explanations. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning (ICML 2019), 9–15 June 2019, Long Beach, CA, USA, volume 97 of proceedings of machine learning research (PMLR), pp 2376–2384
Greydanus S, Koul A, Dodge J, Fern A (2018) Visualizing and understanding Atari agents. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning (ICML 2018), Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, volume 80 of proceedings of machine learning research (PMLR), pp 1787–1796
Grgic-Hlaca N, Redmiles EM, Gummadi KP, Weller A (2018) Human perceptions of fairness in algorithmic decision making: a case study of criminal risk prediction. In: Champin P-A, Gandon F, Lalmas M, Ipeirotis PG (eds) Proceedings of the 2018 World Wide Web conference on World Wide Web (WWW 2018), Lyon, France, April 23–27, 2018. ACM, pp 903–912
Gu J, Yang Y, Tresp V (2018) Understanding individual decisions of cnns via contrastive backpropagation. In: Jawahar CV, Li H, Mori G, Schindler K (eds) Computer vision—ACCV 2018—14th Asian conference on computer vision, Perth, Australia, December 2–6, 2018, revised selected papers, Part III, volume 11363 of lecture notes in computer science. Springer, pp 119–134
Gu T, Dolan-Gavitt B, Garg S (2017) BadNets: identifying vulnerabilities in the machine learning model supply chain. CoRR, arXiv:1708.06733
Guidotti R, Monreale A, Matwin S, Pedreschi D (2019) Black box explanation by learning image exemplars in the latent feature space. In: Brefeld U, Fromont É, Hotho A, Knobbe AJ, Maathuis MH, Robardet C (eds) Machine learning and knowledge discovery in databases—European conference (ECML PKDD 2019), Würzburg, Germany, September 16–20, 2019, proceedings, Part I, volume 11906 of lecture notes in computer science. Springer, pp 189–205
Hendrycks D, Dietterich TG (2019) Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International conference on learning representations (ICLR 2019), New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
Heo J, Joo S, Moon T (2019) Fooling neural network interpretations via adversarial model manipulation. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 2921–2932
Herlocker JL, Konstan JA, Riedl J (2000) Explaining collaborative filtering recommendations. In: Kellogg WA, Whittaker S (eds) CSCW 2000, proceeding on the ACM 2000 conference on computer supported cooperative work, Philadelphia, PA, USA, December 2–6, 2000. ACM, pp 241–250 (2000)
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. In: 6th International conference on learning representations (ICLR 2018), Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net
Hooker S, Erhan D, Kindermans P-J, Kim B (2019) A benchmark for interpretability methods in deep neural networks. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 9734–9745
Hua K-L, Hsu C-H, Hidayati SC, Wen-Huang C, Yu-Jen C (2015) Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets Ther 6:66
Google Scholar
Huang Q, Yamada M, Tian Y, Singh D, Yin D, Chang Y (2020) Graphlime: local interpretable model explanations for graph neural networks. CoRR, arXiv:2001.06216
Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 125–136
Islam SR, Eberle W, Ghafoor SK (2020) Towards quantification of explainability in explainable artificial intelligence methods. In: Barták R, Bell E (eds) Proceedings of the thirty-third international Florida artificial intelligence research society conference, originally to be held in North Miami Beach, Florida, USA, May 17–20, 2020. AAAI Press, pp 75–81
Iwana BK, Kuroki R, Uchida S (2019) Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation. In: 2019 IEEE/CVF international conference on computer vision workshops (ICCV Workshops 2019), Seoul, Korea (South), October 27–28, 2019. IEEE, pp 4176–4185
Iyer R, Li Y, Li H, Lewis M, Sundar R, Sycara KP (2018) Transparency and explanation in deep reinforcement learning neural networks. In: Furman J, Marchant GE, Price H, Rossi F (eds) Proceedings of the 2018 AAAI/ACM conference on AI, ethics, and society, AIES 2018, New Orleans, LA, USA, February 02–03, 2018. ACM, pp 144–150
Jacovi A, Goldberg Y (2020) Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics (ACL 2020), Online, July 5–10, 2020. Association for Computational Linguistics, pp 4198–4205
Jain S, Wallace BC (2019) Attention is not explanation. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American Chapter of the Association for computational linguistics: human language technologies, NAACL-HLT. Association for Computational Linguistics
Jo T, Nho K, Saykin AJ (2019) Deep learning in Alzheimer’s disease: diagnostic classification and prognostic prediction using neuroimaging data. CoRR, arXiv:1905.00931
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A et al (2021) Highly accurate protein structure prediction with alphafold. Nature 6:66
Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Article Google Scholar
Khan A, Huerta EA, Zheng H (2021) Interpretable AI forecasting for numerical relativity waveforms of quasi-circular, spinning, non-precessing binary black hole mergers. CoRR, arXiv:2110.06968
Kim B, Wattenberg M, Gilmer J, Cai CJ, Wexler J, Viégas FB, Sayres R (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning (ICML 2018), Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, volume 80 of Proceedings of machine learning research, pp 2673–2682
Kim J-M, Choe J, Akata Z, Oh SJ (2021) Keep CALM and improve visual feature attribution. In: 2021 IEEE/CVF international conference on computer vision (ICCV 2021), Montreal, QC, Canada, October 10–17, 2021. IEEE, pp 8330–8340
Kim J-H, Choo W, Song HO (2020) Puzzle mix: exploiting saliency and local statistics for optimal mixup. In: Proceedings of the 37th international conference on machine learning (ICML 2020), 13–18 July 2020, Virtual Event, volume 119 of proceedings of machine learning research (PMLR), pp 5275–5285
Koh PW, Ang K-S, Teo HHK, Liang P (2019) On the accuracy of influence functions for measuring group effects. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 5255–5265
Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning (ICML 2017), Sydney, NSW, Australia, 6–11 August 2017, volume 70 of proceedings of machine learning research (PMLR), pp 1885–1894
Kontschieder P, Fiterau M, Criminisi A, Bulò SR (2015) Deep neural decision forests. In: 2015 IEEE international conference on computer vision (ICCV 2015), Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1467–1475
Kumar S, Talukdar PP (2020) NILE: natural language inference with faithful natural language explanations. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics (ACL)
Lage I, Chen E, He J, Narayanan M, Kim B, Gershman S, Doshi-Velez F (2019) An evaluation of the human-interpretability of explanation. CoRR, arXiv:1902.00006
Lai B, Gong X (2017) Saliency guided end-to-end learning for weakly supervised object detection. In: Sierra C (ed) Proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI 2017), Melbourne, Australia, August 19–25, 2017, pp 2053–2059. ijcai.org
Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable & explorable approximations of black box models. CoRR, arXiv:1707.01154
Laugel T, Lesot M-J, Marsala C, Renard X, Detyniecki M (2019) Unjustified classification regions and counterfactual explanations in machine learning. In: Brefeld U, Fromont É, Hotho A, Knobbe AJ, Maathuis MH, Robardet C (eds) Machine learning and knowledge discovery in databases—European conference (ECML PKDD 2019), Würzburg, Germany, September 16–20, 2019, Proceedings, Part II, volume 11907 of lecture notes in computer science. Springer, pp 37–54
LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17:39:1-39:40
MathSciNet MATH Google Scholar
Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4–5):421–436
Article Google Scholar
Li B, Qi P, Liu B, Di S, Liu J, Pei J, Yi J, Zhou B (2021) Trustworthy AI: from principles to practices. CoRR, arXiv:2110.01167
Li C, Quan C, Peng L, Qi Y, Deng Y, Wu L (2019) A capsule network for recommendation and explaining what you like and dislike. In: Piwowarski B, Chevalier M, Gaussier É, Maarek Y, Nie J-Y, Scholer F (eds) Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval (SIGIR 2019), Paris, France, July 21–25, 2019. ACM, pp 275–284
Li O, Liu H, Chen C, Rudin C (2018) Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. AAAI Press, pp 3530–3537
Li X, Xiong H, Huang S, Ji S, Dou D (2021) Cross-model consensus of explanations and beyond for image classification models: an empirical study. CoRR, arXiv:2109.00707
Li Y (2017) Deep reinforcement learning: an overview. CoRR, arXiv:1701.07274
Lin Y-S, Lee W-C, Celik ZB (2021) What do you see? Evaluation of explainable artificial intelligence (XAI) interpretability through neural backdoors. In: Zhu F, Ooi BC, Miao C (eds) KDD ’21: the 27th ACM SIGKDD conference on knowledge discovery and data mining, virtual event, Singapore, August 14–18, 2021. ACM, pp 1027–1035
Lipton ZC (2018) The mythos of model interpretability. Commun ACM 61(10):36–43
Article Google Scholar
Litjens G, Kooi T, Bejnordi BE, Adiyoso Setio AA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Liu H, Yin Q, Wang WY (2019) Towards explainable NLP: a generative explanation framework for text classification. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics. Association for Computational Linguistics (ACL)
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9 (2017), Long Beach, CA, USA, pp 4765–4774
Luo D, Cheng W, Xu D, Yu W, Zong B, Chen H, Zhang X (2020) Parameterized explainer for graph neural network. In: Larochelle H, Ranzato MA, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020 (NeurIPS 2020), December 6–12, 2020, virtual
Ma Y, Yu D, Wu T, Wang H (2019) Paddlepaddle: an open-source deep learning platform from industrial practice. Front Data Comput 6:66
Google Scholar
Mahendran A, Vedaldi A (2015) Understanding deep image representations by inverting them. In: IEEE conference on computer vision and pattern recognition (CVPR 2015), Boston, MA, USA, June 7–12, 2015. IEEE Computer Society, pp 5188–5196
Margeloiu A, Simidjievski N, Jamnik M, Weller A (2020) Improving interpretability in medical imaging diagnosis using adversarial training. CoRR, arXiv:2012.01166
Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38
Article MathSciNet MATH Google Scholar
Ming Y, Xu P, Qu H, Ren L (2019) Interpretable and steerable sequence learning via prototypes. In: Teredesai A, Kumar V, Li Y, Rosales R, Terzi E, Karypis G (eds) Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (KDD 2019), Anchorage, AK, USA, August 4–8, 2019. ACM, pp 903–913
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller MA, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Montavon G, Lapuschkin S, Binder A, Samek W, Müller K-R (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit 65:211–222
Article Google Scholar
Montavon G, Samek W, Müller K-R (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process 73:1–15
Article MathSciNet Google Scholar
Moraffah R, Karami M, Guo R, Raglin A, Liu H (2020) Causal interpretability for machine learning—problems, methods and evaluation. SIGKDD Explor 22(1):18–33
Article Google Scholar
Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Hildebrandt M, Castillo C, Celis LE, Ruggieri S, Taylor L, Zanfir-Fortuna G (eds) FAT* ’20: conference on fairness, accountability, and transparency, Barcelona, Spain, January 27–30, 2020. ACM, pp 607–617
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Interpretable machine learning: definitions, methods, and applications. CoRR, arXiv:1901.04592
Nam W-J, Gur S, Choi J, Wolf L, Lee S-W (2020) Relative attributing propagation: interpreting the comparative contributions of individual units in deep neural networks. In: The thirty-fourth AAAI conference on artificial intelligence (AAAI 2020), the thirty-second innovative applications of artificial intelligence conference (IAAI 2020), the tenth AAAI symposium on educational advances in artificial intelligence, (EAAI 2020), New York, NY, USA, February 7–12, 2020. AAAI Press, pp 2501–2508
Nguyen AM, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016) Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp 3387–3395
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang EZ, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 8024–8035
Pearl J et al (2009) Causal inference in statistics: an overview. Stat Surv 6:66
MathSciNet MATH Google Scholar
Petsiuk V, Das A, Saenko K (2018) RISE: randomized input sampling for explanation of black-box models. In: British machine vision conference 2018 (BMVC 2018), Newcastle, UK, September 3–6, 2018. BMVA Press, p 151 (2018)
Pleiss G, Zhang T, Elenberg ER, Weinberger KQ (2020) Identifying mislabeled data using the area under the margin ranking. In: Larochelle H, Ranzato MA, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020 (NeurIPS 2020), December 6–12, 2020, virtual
Plumb G, Al-Shedivat M, Cabrera ÁA, Perer A, Xing EP, Talwalkar A (2020) Regularizing black-box models for improved interpretability. In: Larochelle H, Ranzato MA, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020 (NeurIPS 2020), December 6–12, 2020, virtual
Plumb G, Molitor D, Talwalkar A (2018) Model agnostic supervised local explanations. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018 (NeurIPS 2018), December 3–8, 2018, Montréal, Canada, pp 2520–2529
Plumerault A, Borgne HL, Hudelot C (2020) Controlling generative models with continuous factors of variations. In: 8th International conference on learning representations (ICLR 2020), Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Pope PE, Kolouri S, Rostami M, Martin CE, Hoffmann H (2019) Explainability methods for graph convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR 2019), Long Beach, CA, USA, June 16–20, 2019. Computer Vision Foundation/IEEE, pp 10772–10781
Preuer K, Klambauer G, Rippmann F, Hochreiter S, Unterthiner T (2019) Interpretable deep learning in drug discovery. In: Samek W, Montavon G, Vedaldi A, Hansen LK, Müller K-R (eds) Explainable AI: interpreting, explaining and visualizing deep learning, volume 11700 of lecture notes in computer science. Springer, pp 331–345
Puiutta E, Veith EMSP (2020) Explainable reinforcement learning: a survey. In: Holzinger A, Kieseberg P, Tjoa AM, Weippl ER (eds) Machine learning and knowledge extraction—4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 international cross-domain conference, CD-MAKE 2020, Dublin, Ireland, August 25–28, 2020, proceedings, volume 12279 of lecture notes in computer science. Springer, pp 77–95
Puri N, Verma S, Gupta P, Kayastha D, Deshmukh S, Krishnamurthy B, Singh S (2020) Explain your move: understanding agent actions using specific and relevant feature attribution. In: 8th International conference on learning representations (ICLR 2020), Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Rajpurkar P, O’Connell C, Schechter A, Asnani N, Li J, Kiani A, Ball RL, Mendelson M, Maartens G, van Hoving DJ et al (2020) Chexaid: deep learning assistance for physician diagnosis of tuberculosis using chest x-rays in patients with HIV. NPJ Digit Med 6:6
Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: explaining the predictions of any classifier. In: Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R (eds) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13–17, 2016. ACM, pp 1135–1144
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. AAAI Press, pp 1527–1535
Ricci F, Rokach L, Shapira B (2011) Introduction to recommender systems handbook. In: Ricci F, Rokach L, Shapira B, Kantor PB (eds) Recommender systems handbook. Springer, pp 1–35
Ross AS, Doshi-Velez F (2018) Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. AAAI Press, pp 1660–1669
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 6:66
Google Scholar
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 3856–3866
Samek W, Binder A, Montavon G, Lapuschkin S, Müller K-R (2017) Evaluating the visualization of what a deep neural network has learned. IEEE Trans Neural Netw Learn Syst 28(11):2660–2673
Article MathSciNet Google Scholar
Samek W, Montavon G, Lapuschkin S, Anders CJ, Müller K-R (2021) Explaining deep neural networks and beyond: a review of methods and applications. Proc IEEE 109(3):247–278
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2020) Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128(2):336–359
Article Google Scholar
Sengupta S, Singh A, Leopold HA, Gulati T, Lakshminarayanan V (2020) Ophthalmic diagnosis using deep learning with fundus images—a critical review. Artif Intell Med 102:101758
Article Google Scholar
Seo S, Huang J, Yang H, Liu Y (2017) Interpretable convolutional neural networks with dual local and global attention for review rating prediction. In: Cremonesi P, Ricci F, Berkovsky S, Tuzhilin A (eds) Proceedings of the eleventh ACM conference on recommender systems (RecSys 2017), Como, Italy, August 27–31, 2017. ACM, pp 297–305
Serrano S, Smith NA (2019) Is attention interpretable? In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics. Association for Computational Linguistics (ACL)
Shen Y, Zhou B (2021) Closed-form factorization of latent semantics in gans. In: IEEE conference on computer vision and pattern recognition (CVPR 2021), virtual, June 19–25, 2021. Computer Vision Foundation/IEEE, pp 1532–1540
Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning (ICML 2017), Sydney, NSW, Australia, 6–11 August 2017, volume 70 of proceedings of machine learning research (PMLR), pp 3145–3153
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap TP, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: visualising image classification models and saliency maps. In: Bengio Y, LeCun Y (eds) 2nd International conference on learning representations (ICLR 2014), Banff, AB, Canada, April 14–16, 2014, workshop track proceedings
Singh A, Sengupta S, Lakshminarayanan V (2020) Explainable deep learning models in medical image analysis. J Imaging 6(6):52
Article Google Scholar
Smilkov D, Thorat N, Kim B, Viégas FB, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. CoRR, arXiv:1706.03825
Srinivas S, Fleuret F (2019) Full-gradient representation for neural network visualization. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 4126–4135
Strobelt H, Gehrmann S, Behrisch M, Perer A, Pfister H, Rush AM (2019) Seq2seq-vis: a visual debugging tool for sequence-to-sequence models. IEEE Trans Vis Comput Graph 6:66
Google Scholar
Strobelt H, Gehrmann S, Pfister H, Rush AM (2018) Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans Vis Comput Graph 6:66
Google Scholar
Sun Y, Wang S, Li Y-K, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019) ERNIE: enhanced representation through knowledge integration. CoRR arXiv:1904.09223
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning (ICML 2017), Sydney, NSW, Australia, 6–11 August 2017, volume 70 of proceedings of machine learning research (PMLR), pp 3319–3328
Swayamdipta S, Schwartz R, Lourie N, Wang Y, Hajishirzi H, Smith NA, Choi Y (2020) Dataset cartography: mapping and diagnosing datasets with training dynamics. In: Webber B, Cohn T, He Y, Liu Y (eds) Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP 2020), Online, November 16–20, 2020. Association for Computational Linguistics, pp 9275–9293
Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: Chang Y, Zhai C, Liu Y, Maarek Y (eds) Proceedings of the eleventh ACM international conference on web search and data mining (WSDM 2018), Marina Del Rey, CA, USA, February 5–9, 2018. ACM, 565–573
Tjoa E, Guan C (2021) A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans Neural Netw Learn Syst 32(11):4793–4813
Article Google Scholar
Toneva M, Sordoni A, des Combes RT, Trischler A, Bengio Y, Gordon GJ (2019) An empirical study of example forgetting during deep neural network learning. In: 7th International conference on learning representations (ICLR 2019), New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
Tsipras D, Santurkar S, Engstrom L, Turner A, Madry A (2019) Robustness may be at odds with accuracy. In: 7th International conference on learning representations (ICLR 2019), New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
van der Linden I, Haned H, Kanoulas E (2019) Global aggregations of local explanations for black box models. CoRR, arXiv:1907.03039
Verma S, Dickerson JP, Hines K (2020) Counterfactual explanations for machine learning: a review. CoRR, arXiv:2010.10596
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou JP, Jaderberg M, Vezhnevets AS, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Le Paine T, Gülçehre Ç, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, McKinney K, Smith O, Schaul T, Lillicrap TP, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782):350–354
Voita E, Talbot D, Moiseev F, Sennrich R, Titov I (2019) Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the Association for Computational Linguistics (ACL 2019), Florence, Italy, July 28–August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp 5797–5808
Voynov A, Babenko A (2019) RPGAN: gans interpretability via random routing. CoRR, arXiv:1912.10920
Voynov A, Babenko A (2020) Unsupervised discovery of interpretable directions in the GAN latent space. In: Proceedings of the 37th international conference on machine learning (ICML 2020), 13–18 July 2020, virtual event, volume 119 of proceedings of machine learning research (PMLR), pp 9786–9796
Vu MN, Nguyen TDT, Phan N, Gera R, Thai MT (2019) Evaluating explainers via perturbation. CoRR, arXiv:1906.02032
Wachter S, Mittelstadt BD, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. CoRR, arXiv:1711.00399
Wang H, Wang Z, Du M, Yang F, Zhang Z, Ding S, Mardziel P, Hu X (2020) Score-CAM: score-weighted visual explanations for convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR Workshops 2020), Seattle, WA, USA, June 14–19, 2020. Computer Vision Foundation/IEEE, pp 111–119
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech-UCSD birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology
Wickramanayake S, Hsu W, Lee M-L (2021) Explanation-based data augmentation for image classification. In: Ranzato MA, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021 (NeurIPS 2021), December 6–14, 2021, virtual, pp 20929–20940
Wiegreffe S, Pinter Y (2019) Attention is not not explanation. In: Inui K, Jiang J, Ng V, Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, proceedings, Part VII, volume 11211 of lecture notes in computer science. Springer, pp 3–19
Xu G, Duong TD, Li Q, Liu S, Wang X (2020) Causality learning: a new perspective for interpretable machine learning. CoRR, arXiv:2006.16789
Yang C, Shen Y, Zhou B (2021) Semantic hierarchy emerges in deep generative representations for scene synthesis. Int J Comput Vis 129(5):1451–1466
Article Google Scholar
Yang M, Kim B (2019) Benchmarking attribution methods with relative feature importance
Yao Y, Chen T, Xie G-S, Zhang C, Shen F, Wu Q, Tang Z, Zhang J (2021) Non-salient region object mining for weakly supervised semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR 2021), virtual, June 19–25, 2021. Computer Vision Foundation/IEEE, pp 2623–2632
Yeh C-K, Hsieh C-Y, Suggala AS, Inouye DI, Ravikumar P (2019) On the (in)fidelity and sensitivity of explanations. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 10965–10976
Ying Z, Bourgeois D, You J, Zitnik M, Leskovec J (2019) Gnnexplainer: generating explanations for graph neural networks. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019 (NeurIPS 2019), December 8–14, 2019, Vancouver, BC, Canada, pp 9240–9251
Yuan T, Li X, Xiong H, Cao H, Dou D (2021) Explaining information flow inside vision transformers using Markov chain. In: Neural information processing systems XAI4Debugging workshop
Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: 5th International conference on learning representations (ICLR 2017), Toulon, France, April 24–26, 2017, conference track proceedings. OpenReview.net
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2021) Understanding deep learning (still) requires rethinking generalization. Commun ACM 64(3):107–115
Article Google Scholar
Zhang H, Cissé M, Dauphin YN, Lopez-Paz D (2018) Mixup: beyond empirical risk minimization. In: 6th International conference on learning representations (ICLR 2018), Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net
Zhang J, Bargal SA, Lin Z, Brandt J, Shen X, Sclaroff S (2018) Top-down neural attention by excitation backprop. Int J Comput Vis 126(10):1084–1102
Article Google Scholar
Zhang Q, Cao R, Shi F, Wu YN, Zhu S-C (2018) Interpreting CNN knowledge via an explanatory graph. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. AAAI Press, pp 4454–4463
Zhang Q, Wu YN, Zhu S-C (2018) Interpretable convolutional neural networks. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR 2018), Salt Lake City, UT, USA, June 18–22, 2018. Computer Vision Foundation/IEEE Computer Society, pp 8827–8836
Zhang Q, Yang Y, Ma H, Wu YN (2019) Interpreting cnns via decision trees. In: IEEE conference on computer vision and pattern recognition (CVPR 2019), Long Beach, CA, USA, June 16–20, 2019. Computer Vision Foundation/IEEE, pp 6261–6270
Zhang S, Yao L, Sun A, Tay Y (2019) Deep learning based recommender system: a survey and new perspectives. ACM Comput Surv 52(1):51–538
Google Scholar
Zhang T, Zhu Z (2019) Interpreting adversarially trained convolutional neural networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning (ICML 2019), 9–15 June 2019, Long Beach, California, USA, volume 97 of proceedings of machine learning research (PMLR), pp 7502–7511
Zhang Y, Chen X (2020) Explainable recommendation: a survey and new perspectives. Found Trends Inf Retr 14(1):1–101
Article MathSciNet Google Scholar
Zhao G, Zhou B, Wang K, Jiang R, Xu M (2018) Respond-CAM: analyzing deep models for 3d imaging data by visualizations. In: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G (eds) Medical image computing and computer assisted intervention—MICCAI 2018—21st international conference, Granada, Spain, September 16–20, 2018, proceedings, Part I, volume 11070 of lecture notes in computer science. Springer, pp 485–492
Zhou B, Khosla A, Lapedriza À, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: 2016 IEEE conference on computer vision and pattern recognition, (CVPR 2016), Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 2921–2929

Download references

Acknowledgements

Funding was provided by National Key R &D Program of China (Grant No. 2021ZD0110303).

Author information

Authors and Affiliations

Baidu Research, Baidu Inc., Beijing, China
Xuhong Li, Haoyi Xiong, Xingjian Li, Ji Liu, Jiang Bian & Dejing Dou
School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
Xuanyu Wu
Department of Electronics and Information Engineering, Tsinghua University, Beijing, China
Xiao Zhang
Computer and Information Science Department, University of Oregon, Eugene, OR, USA
Dejing Dou

Authors

Xuhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Haoyi Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Xingjian Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuanyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ji Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Bian
View author publications
You can also search for this author in PubMed Google Scholar
Dejing Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dejing Dou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, X., Xiong, H., Li, X. et al. Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond. Knowl Inf Syst 64, 3197–3234 (2022). https://doi.org/10.1007/s10115-022-01756-8

Download citation

Received: 18 March 2021
Revised: 23 August 2022
Accepted: 27 August 2022
Published: 14 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10115-022-01756-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two to Trust: AutoML for Safe Modelling and Interpretable Deep Learning for Robustness

Considerations for Evaluation and Generalization in Interpretable Machine Learning

Explaining neural networks without access to training data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two to Trust: AutoML for Safe Modelling and Interpretable Deep Learning for Robustness

Considerations for Evaluation and Generalization in Interpretable Machine Learning

Explaining neural networks without access to training data

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation