Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Visual Explanation Generation Based on Lambda Attention Branch Networks

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13842))

Included in the following conference series:

  • 357 Accesses

Abstract

Explanation generation for transformers enhances accountability for their predictions. However, there have been few studies on generating visual explanations for the transformers that use multidimensional context, such as LambdaNetworks. In this paper, we propose the Lambda Attention Branch Networks, which attend to important regions in detail and generate easily interpretable visual explanations. We also propose the Patch Insertion-Deletion score, an extension of the Insertion-Deletion score, as an effective evaluation metric for images with sparse important regions. Experimental results on two public datasets indicate that the proposed method successfully generates visual explanations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://sdo.gsfc.nasa.gov/data/.

References

  1. Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020)

  2. Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NeurIPS, vol. 31 (2018)

    Google Scholar 

  3. Bello, I.: LambdaNetworks: modeling long-range interactions without attention. In: ICLR (2021)

    Google Scholar 

  4. Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 63–71. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_8

    Chapter  MATH  Google Scholar 

  5. Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visualization. In: CVPR, pp. 782–791 (2021)

    Google Scholar 

  6. Das, A., Rad, P.: Opportunities and challenges in explainable artificial intelligence (XAI): a survey. arXiv preprint arXiv:2006.11371 (2020)

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

    Google Scholar 

  8. Dosovitskiy, A., Beyer, L., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. In: ICLR (2021)

    Google Scholar 

  9. Fel, T., Vigouroux, D., Cadène, R., Serre, T.: How good is your explanation? Algorithmic stability measures to assess the quality of explanations for deep neural networks. In: WACV, pp. 720–730 (2022)

    Google Scholar 

  10. Fukui, H., Hirakawa, T., et al.: Attention branch network: learning of attention mechanism for visual explanation. In: CVPR, pp. 10705–10714 (2019)

    Google Scholar 

  11. Hooker, S., Erhan, D., Kindermans, P.J., Kim, B.: A benchmark for interpretability methods in deep neural networks. In: NeurIPS, vol. 32 (2019)

    Google Scholar 

  12. Ismail, A.A., Corrada Bravo, H., Feizi, S.: Improving deep learning interpretability by saliency guided training. In: NeurIPS (2021)

    Google Scholar 

  13. Jain, S., Wallace, B.: Attention is not explanation. In: NAACL, pp. 3543–3556 (2019)

    Google Scholar 

  14. Khan, S., Naseer, M., Hayat, M., Zamir, S.W., et al.: Transformers in vision: a survey. arXiv preprint arXiv:2101.01169 (2021)

  15. Li, H., Ellis, J., Zhang, L., Chang, S.F.: PatternNet: visual pattern mining with deep neural network. In: ICMR, pp. 291–299 (2018)

    Google Scholar 

  16. Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS, pp. 4765–4774 (2017)

    Google Scholar 

  17. Magassouba, A., Sugiura, K., et al.: A multimodal classifier generative adversarial network for carry and place tasks from ambiguous language instructions. RA-L 3(4), 3113–3120 (2018)

    Google Scholar 

  18. Magassouba, A., Sugiura, K., et al.: Predicting and attending to damaging collisions for placing everyday objects in photo-realistic simulations. Adv. Robot. 35(12), 787–799 (2021)

    Article  Google Scholar 

  19. Mitsuhara, M., Fukui, H., Sakashita, Y., et al.: Embedding human knowledge into deep neural network via attention map. In: VISAPP (2021)

    Google Scholar 

  20. Nishizuka, N., Sugiura, K., et al.: Deep flare net (DeFN) model for solar flare prediction. Astrophys. J. 858(2), 113 (8 pp) (2018)

    Google Scholar 

  21. Ogura, T., Magassouba, A., Sugiura, K., et al.: Alleviating the burden of labeling: sentence generation by attention branch encoder-decoder network. RA-L 5(4), 5945–5952 (2020)

    Google Scholar 

  22. Pan, B., Panda, R., Jiang, Y., et al.: IA-RED\(^2\): interpretability-aware redundancy reduction for vision transformers. In: NeurIPS (2021)

    Google Scholar 

  23. Pesnell, W., Thompson, B., Chamberlin, P.: The solar dynamics observatory (SDO). Sol. Phys. 275(1–2), 3–15 (2012). https://doi.org/10.1007/s11207-011-9841-3

    Article  Google Scholar 

  24. Petsiuk, V., Das, A., Saenko, K.: RISE: randomized input sampling for explanation of black-box models. In: BMVC, p. 151 (13 pp) (2018)

    Google Scholar 

  25. Porwal, P., et al.: IDRiD: diabetic retinopathy - segmentation and grading challenge. Med. Image Anal. 59, 101561 (2020)

    Article  Google Scholar 

  26. Ribeiro, M., Singh, S., et al.: “Why should i trust you?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144 (2016)

    Google Scholar 

  27. Scherrer, P., Schou, J., Bush, R., et al.: The helioseismic and magnetic imager (HMI) investigation for the solar dynamics observatory (SDO). Sol. Phys. 275, 207–227 (2012). https://doi.org/10.1007/s11207-011-9834-2

    Article  Google Scholar 

  28. Selvaraju, R., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)

    Google Scholar 

  29. Smilkov, D., Thorat, N., Kim, B., Viégas, F.B., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)

  30. Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  31. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, vol. 70, pp. 3319–3328 (2017)

    Google Scholar 

  32. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al.: Attention is all you need. In: NeurIPS, pp. 6000–6010 (2017)

    Google Scholar 

  33. Vig, J.: A multiscale visualization of attention in the transformer model. In: ACL, pp. 37–42 (2019)

    Google Scholar 

  34. Wang, H., Wang, Z., Du, M., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: CVPR, pp. 24–25 (2020)

    Google Scholar 

  35. Wu, L., et al.: Classification of diabetic retinopathy and diabetic macular edema. World J. Diabetes 4(6), 290–294 (2013)

    Article  Google Scholar 

  36. Zhang, Z., Chen, Y., Li, H., Zhang, Q.: IA-CNN: a generalised interpretable convolutional neural network with attention mechanism. In: IJCNN, pp. 1–8 (2021)

    Google Scholar 

  37. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., et al.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by JSPS KAKENHI Grant Number 20H04269 and NEDO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsumugi Iida .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 426 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Iida, T. et al. (2023). Visual Explanation Generation Based on Lambda Attention Branch Networks. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13842. Springer, Cham. https://doi.org/10.1007/978-3-031-26284-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26284-5_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26283-8

  • Online ISBN: 978-3-031-26284-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics