Visual Explanation Generation Based on Lambda Attention Branch Networks

Iida, Tsumugi; Komatsu, Takumi; Kaneda, Kanta; Hirakawa, Tsubasa; Yamashita, Takayoshi; Fujiyoshi, Hironobu; Sugiura, Komei

doi:10.1007/978-3-031-26284-5_29

Tsumugi Iida¹²,
Takumi Komatsu¹²,
Kanta Kaneda¹²,
Tsubasa Hirakawa¹³,
Takayoshi Yamashita¹³,
Hironobu Fujiyoshi¹³ &
…
Komei Sugiura¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13842))

Included in the following conference series:

Asian Conference on Computer Vision

382 Accesses

Abstract

Explanation generation for transformers enhances accountability for their predictions. However, there have been few studies on generating visual explanations for the transformers that use multidimensional context, such as LambdaNetworks. In this paper, we propose the Lambda Attention Branch Networks, which attend to important regions in detail and generate easily interpretable visual explanations. We also propose the Patch Insertion-Deletion score, an extension of the Insertion-Deletion score, as an effective evaluation metric for images with sparse important regions. Experimental results on two public datasets indicate that the proposed method successfully generates visual explanations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MANet: Mixed Attention Network for Visual Explanation

Article Open access 23 May 2024

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer

Article Open access 26 November 2024

Notes

1.
https://sdo.gsfc.nasa.gov/data/.

References

Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020)
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NeurIPS, vol. 31 (2018)
Google Scholar
Bello, I.: LambdaNetworks: modeling long-range interactions without attention. In: ICLR (2021)
Google Scholar
Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 63–71. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_8
Chapter MATH Google Scholar
Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visualization. In: CVPR, pp. 782–791 (2021)
Google Scholar
Das, A., Rad, P.: Opportunities and challenges in explainable artificial intelligence (XAI): a survey. arXiv preprint arXiv:2006.11371 (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Dosovitskiy, A., Beyer, L., et al.: An image is worth $16\times 16$ words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Fel, T., Vigouroux, D., Cadène, R., Serre, T.: How good is your explanation? Algorithmic stability measures to assess the quality of explanations for deep neural networks. In: WACV, pp. 720–730 (2022)
Google Scholar
Fukui, H., Hirakawa, T., et al.: Attention branch network: learning of attention mechanism for visual explanation. In: CVPR, pp. 10705–10714 (2019)
Google Scholar
Hooker, S., Erhan, D., Kindermans, P.J., Kim, B.: A benchmark for interpretability methods in deep neural networks. In: NeurIPS, vol. 32 (2019)
Google Scholar
Ismail, A.A., Corrada Bravo, H., Feizi, S.: Improving deep learning interpretability by saliency guided training. In: NeurIPS (2021)
Google Scholar
Jain, S., Wallace, B.: Attention is not explanation. In: NAACL, pp. 3543–3556 (2019)
Google Scholar
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., et al.: Transformers in vision: a survey. arXiv preprint arXiv:2101.01169 (2021)
Li, H., Ellis, J., Zhang, L., Chang, S.F.: PatternNet: visual pattern mining with deep neural network. In: ICMR, pp. 291–299 (2018)
Google Scholar
Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS, pp. 4765–4774 (2017)
Google Scholar
Magassouba, A., Sugiura, K., et al.: A multimodal classifier generative adversarial network for carry and place tasks from ambiguous language instructions. RA-L 3(4), 3113–3120 (2018)
Google Scholar
Magassouba, A., Sugiura, K., et al.: Predicting and attending to damaging collisions for placing everyday objects in photo-realistic simulations. Adv. Robot. 35(12), 787–799 (2021)
Article Google Scholar
Mitsuhara, M., Fukui, H., Sakashita, Y., et al.: Embedding human knowledge into deep neural network via attention map. In: VISAPP (2021)
Google Scholar
Nishizuka, N., Sugiura, K., et al.: Deep flare net (DeFN) model for solar flare prediction. Astrophys. J. 858(2), 113 (8 pp) (2018)
Google Scholar
Ogura, T., Magassouba, A., Sugiura, K., et al.: Alleviating the burden of labeling: sentence generation by attention branch encoder-decoder network. RA-L 5(4), 5945–5952 (2020)
Google Scholar
Pan, B., Panda, R., Jiang, Y., et al.: IA-RED$^2$: interpretability-aware redundancy reduction for vision transformers. In: NeurIPS (2021)
Google Scholar
Pesnell, W., Thompson, B., Chamberlin, P.: The solar dynamics observatory (SDO). Sol. Phys. 275(1–2), 3–15 (2012). https://doi.org/10.1007/s11207-011-9841-3
Article Google Scholar
Petsiuk, V., Das, A., Saenko, K.: RISE: randomized input sampling for explanation of black-box models. In: BMVC, p. 151 (13 pp) (2018)
Google Scholar
Porwal, P., et al.: IDRiD: diabetic retinopathy - segmentation and grading challenge. Med. Image Anal. 59, 101561 (2020)
Article Google Scholar
Ribeiro, M., Singh, S., et al.: “Why should i trust you?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144 (2016)
Google Scholar
Scherrer, P., Schou, J., Bush, R., et al.: The helioseismic and magnetic imager (HMI) investigation for the solar dynamics observatory (SDO). Sol. Phys. 275, 207–227 (2012). https://doi.org/10.1007/s11207-011-9834-2
Article Google Scholar
Selvaraju, R., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)
Google Scholar
Smilkov, D., Thorat, N., Kim, B., Viégas, F.B., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, vol. 70, pp. 3319–3328 (2017)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al.: Attention is all you need. In: NeurIPS, pp. 6000–6010 (2017)
Google Scholar
Vig, J.: A multiscale visualization of attention in the transformer model. In: ACL, pp. 37–42 (2019)
Google Scholar
Wang, H., Wang, Z., Du, M., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: CVPR, pp. 24–25 (2020)
Google Scholar
Wu, L., et al.: Classification of diabetic retinopathy and diabetic macular edema. World J. Diabetes 4(6), 290–294 (2013)
Article Google Scholar
Zhang, Z., Chen, Y., Li, H., Zhang, Q.: IA-CNN: a generalised interpretable convolutional neural network with attention mechanism. In: IJCNN, pp. 1–8 (2021)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., et al.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)
Google Scholar

Download references

Acknowledgement

This work was partially supported by JSPS KAKENHI Grant Number 20H04269 and NEDO.

Author information

Authors and Affiliations

Keio University, Tokyo, Japan
Tsumugi Iida, Takumi Komatsu, Kanta Kaneda & Komei Sugiura
Chubu University, Kasugai, Japan
Tsubasa Hirakawa, Takayoshi Yamashita & Hironobu Fujiyoshi

Authors

Tsumugi Iida
View author publications
You can also search for this author in PubMed Google Scholar
Takumi Komatsu
View author publications
You can also search for this author in PubMed Google Scholar
Kanta Kaneda
View author publications
You can also search for this author in PubMed Google Scholar
Tsubasa Hirakawa
View author publications
You can also search for this author in PubMed Google Scholar
Takayoshi Yamashita
View author publications
You can also search for this author in PubMed Google Scholar
Hironobu Fujiyoshi
View author publications
You can also search for this author in PubMed Google Scholar
Komei Sugiura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tsumugi Iida .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 426 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iida, T. et al. (2023). Visual Explanation Generation Based on Lambda Attention Branch Networks. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13842. Springer, Cham. https://doi.org/10.1007/978-3-031-26284-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-26284-5_29
Published: 23 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26283-8
Online ISBN: 978-3-031-26284-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Visual Explanation Generation Based on Lambda Attention Branch Networks