Abstract
Explanation generation for transformers enhances accountability for their predictions. However, there have been few studies on generating visual explanations for the transformers that use multidimensional context, such as LambdaNetworks. In this paper, we propose the Lambda Attention Branch Networks, which attend to important regions in detail and generate easily interpretable visual explanations. We also propose the Patch Insertion-Deletion score, an extension of the Insertion-Deletion score, as an effective evaluation metric for images with sparse important regions. Experimental results on two public datasets indicate that the proposed method successfully generates visual explanations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020)
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NeurIPS, vol. 31 (2018)
Bello, I.: LambdaNetworks: modeling long-range interactions without attention. In: ICLR (2021)
Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 63–71. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_8
Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visualization. In: CVPR, pp. 782–791 (2021)
Das, A., Rad, P.: Opportunities and challenges in explainable artificial intelligence (XAI): a survey. arXiv preprint arXiv:2006.11371 (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Dosovitskiy, A., Beyer, L., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. In: ICLR (2021)
Fel, T., Vigouroux, D., Cadène, R., Serre, T.: How good is your explanation? Algorithmic stability measures to assess the quality of explanations for deep neural networks. In: WACV, pp. 720–730 (2022)
Fukui, H., Hirakawa, T., et al.: Attention branch network: learning of attention mechanism for visual explanation. In: CVPR, pp. 10705–10714 (2019)
Hooker, S., Erhan, D., Kindermans, P.J., Kim, B.: A benchmark for interpretability methods in deep neural networks. In: NeurIPS, vol. 32 (2019)
Ismail, A.A., Corrada Bravo, H., Feizi, S.: Improving deep learning interpretability by saliency guided training. In: NeurIPS (2021)
Jain, S., Wallace, B.: Attention is not explanation. In: NAACL, pp. 3543–3556 (2019)
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., et al.: Transformers in vision: a survey. arXiv preprint arXiv:2101.01169 (2021)
Li, H., Ellis, J., Zhang, L., Chang, S.F.: PatternNet: visual pattern mining with deep neural network. In: ICMR, pp. 291–299 (2018)
Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS, pp. 4765–4774 (2017)
Magassouba, A., Sugiura, K., et al.: A multimodal classifier generative adversarial network for carry and place tasks from ambiguous language instructions. RA-L 3(4), 3113–3120 (2018)
Magassouba, A., Sugiura, K., et al.: Predicting and attending to damaging collisions for placing everyday objects in photo-realistic simulations. Adv. Robot. 35(12), 787–799 (2021)
Mitsuhara, M., Fukui, H., Sakashita, Y., et al.: Embedding human knowledge into deep neural network via attention map. In: VISAPP (2021)
Nishizuka, N., Sugiura, K., et al.: Deep flare net (DeFN) model for solar flare prediction. Astrophys. J. 858(2), 113 (8 pp) (2018)
Ogura, T., Magassouba, A., Sugiura, K., et al.: Alleviating the burden of labeling: sentence generation by attention branch encoder-decoder network. RA-L 5(4), 5945–5952 (2020)
Pan, B., Panda, R., Jiang, Y., et al.: IA-RED\(^2\): interpretability-aware redundancy reduction for vision transformers. In: NeurIPS (2021)
Pesnell, W., Thompson, B., Chamberlin, P.: The solar dynamics observatory (SDO). Sol. Phys. 275(1–2), 3–15 (2012). https://doi.org/10.1007/s11207-011-9841-3
Petsiuk, V., Das, A., Saenko, K.: RISE: randomized input sampling for explanation of black-box models. In: BMVC, p. 151 (13 pp) (2018)
Porwal, P., et al.: IDRiD: diabetic retinopathy - segmentation and grading challenge. Med. Image Anal. 59, 101561 (2020)
Ribeiro, M., Singh, S., et al.: “Why should i trust you?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144 (2016)
Scherrer, P., Schou, J., Bush, R., et al.: The helioseismic and magnetic imager (HMI) investigation for the solar dynamics observatory (SDO). Sol. Phys. 275, 207–227 (2012). https://doi.org/10.1007/s11207-011-9834-2
Selvaraju, R., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)
Smilkov, D., Thorat, N., Kim, B., Viégas, F.B., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, vol. 70, pp. 3319–3328 (2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al.: Attention is all you need. In: NeurIPS, pp. 6000–6010 (2017)
Vig, J.: A multiscale visualization of attention in the transformer model. In: ACL, pp. 37–42 (2019)
Wang, H., Wang, Z., Du, M., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: CVPR, pp. 24–25 (2020)
Wu, L., et al.: Classification of diabetic retinopathy and diabetic macular edema. World J. Diabetes 4(6), 290–294 (2013)
Zhang, Z., Chen, Y., Li, H., Zhang, Q.: IA-CNN: a generalised interpretable convolutional neural network with attention mechanism. In: IJCNN, pp. 1–8 (2021)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., et al.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)
Acknowledgement
This work was partially supported by JSPS KAKENHI Grant Number 20H04269 and NEDO.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Iida, T. et al. (2023). Visual Explanation Generation Based on Lambda Attention Branch Networks. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13842. Springer, Cham. https://doi.org/10.1007/978-3-031-26284-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-26284-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26283-8
Online ISBN: 978-3-031-26284-5
eBook Packages: Computer ScienceComputer Science (R0)