Comparing CAM Algorithms for the Identification of Salient Image Features in Iconography Artwork Analysis
Abstract
:1. Introduction
- Are CAMs an effective tool for understanding how a CNN classifier recognizes the iconographic classes of a painting?
- Are there significant differences in the state-of-the-art CAM algorithms with respect to their ability to support the explanation of iconography classification by CNNs?
- Are the image areas highlighted by CAMs a good starting point for creating semi-automatically the bounding boxes necessary for training iconography detectors?
- We apply four state-of-the-art class activation map algorithms (namely, CAM [15], Grad-CAM [16], Grad-CAM++ [17], and Smooth Grad-CAM++ [18]) to the CNN iconography classification model presented in [11], which exploits a backbone based on ResNet50 [19] trained on the ImageNet dataset [20] and refined on the ArtDL dataset (http://www.artdl.org—accessed on 15 May 2021) consisting of 42,479 images of artworks portraying Christian Saints divided into 10 classes. Note that, in order to avoid ambiguity, we refer to the specific algorithm as “CAM” and to the generic output as “class activation maps”.
- For the quantitative evaluation of the different algorithms, a test dataset has been built which comprises 823 images annotated with 2957 bounding boxes surrounding specific iconographic symbols. One such annotated image is shown in Figure 1. We use the Intersection over Union (IoU) metrics to measure the agreement between the areas of the image highlighted by the algorithm and those annotated manually as ground truth. Furthermore, we analyze the class activation map area based on percentage of covered bounding boxes and percentage of covered area that does not contain any iconographic symbol.
- The comparison shows that Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++ deliver better results than the original CAM algorithm in terms of area coverage and explainability. This finding confirms the result discussed in [18] for natural images. Smooth Grad-CAM++ produces multiple disconnected image regions that identify small iconographic symbols quite precisely. Grad-CAM produces wider and more contiguous areas that cover well both large and small iconographic symbols. To the best of our knowledge, such a comparison has not been performed before in the context of artwork analysis.
- We perform a qualitative evaluation by examining the overlap between the ground-truth bounding boxes and the class activation maps. This investigation illustrates the strengths and weaknesses of the analyzed algorithms, highlights their capacity of detecting symbols that were missed by the human annotator and discusses cases of confusion between the symbols of different classes. A simple procedure is tested for selecting “good enough” class activation maps and for creating symbol bounding boxes automatically from them. The results of such a procedure are illustrated visually.
- We deepen the evaluation by measuring quantitatively the agreement between the ground-truth bounding boxes and the bounding boxes estimated from the class activation maps. The assessment shows that the whole Saint bounding boxes computed from the Grad-CAM class activation maps obtain 55% average IoU, 61% GT-known localization and 31% mAP. Such results obtained by a simple post-processing of the output of a general purpose CNN interpretability technique pave the way to the use of automatically computed bounding boxes for training weakly supervised object detectors in artwork images.
2. Related Work
2.1. Automated Artwork Image Analysis
2.2. Interpretability and Activation Maps
3. Class Activation Maps for Iconography Classification
3.1. CAM
3.2. Grad-CAM
3.3. Grad-CAM++
3.4. Smooth Grad-CAM++
4. Evaluation
4.1. Dataset
4.2. Class Activation Maps Generation
4.3. Choice of the Threshold Value
4.4. Intersection Over Union Metrics
4.5. Quantitative Analysis
4.6. Qualitative Analysis
Bounding Box Generation
- Compute the Grad-CAM class activation map of the selected images and apply the corresponding threshold: 0.1 for symbol bounding boxes and 0.05 for whole Saint bounding boxes;
- Only for symbol boxes: split the class activation maps into connected components. Remove the components whose average activation value is less than half of the average activation value of all components. This step filters out all the foreground pixels with low activation that usually correspond to irrelevant areas (Figure 12);
- For each Iconclass category, draw one bounding box surrounding each component (symbol bounding boxes) and one bounding box surrounding the entire class activation map (whole Saint bounding boxes).
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Panofsky, E. Studies in Iconology: Humanistic Themes in the Art of the Renaissance; Routledge Taylor and Francis Group: New York, NY, USA, 1939; p. 262. [Google Scholar]
- Proulx, D.A. A Sourcebook of Nasca Ceramic Iconography: Reading a Culture through Its Art; University of Iowa Press: Iowa City, IA, USA, 2009. [Google Scholar]
- Parani, M.G. Reconstructing the Reality of Images: Byzantine Material Culture and Religious Iconography 11th–15th Centuries; Brill: Leiden, The Netherlands, 2003; Volume 41. [Google Scholar]
- Van Leeuwen, T.; Jewitt, C. The Handbook of Visual Analysis; Sage: Thousand Oaks, CA, USA, 2001; pp. 100–102. [Google Scholar]
- King, J.N. Tudor Royal Iconography: Literature and Art in an Age of Religious Crisis; Princeton University Press: Princeton, NJ, USA, 1989. [Google Scholar]
- Roberts, H.E. Encyclopedia of Comparative Iconography: Themes Depicted in Works of Art; Routledge: London, UK, 2013. [Google Scholar]
- Zujovic, J.; Gandy, L.; Friedman, S.; Pardo, B.; Pappas, T.N. Classifying paintings by artistic genre: An analysis of features classifiers. In Proceedings of the 2009 IEEE International Workshop on Multimedia Signal Processing, Rio de Janeiro, Brazil, 5–7 October 2009; pp. 1–5. [Google Scholar]
- Shamir, L.; Tarakhovsky, J.A. Computer Analysis of Art. J. Comput. Cult. Herit. 2012, 5. [Google Scholar] [CrossRef]
- Cai, H.; Wu, Q.; Corradi, T.; Hall, P. The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs. arXiv 2015, arXiv:1505.00110. [Google Scholar]
- Gonthier, N.; Gousseau, Y.; Ladjal, S.; Bonfait, O. Weakly supervised object detection in artworks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Milani, F.; Fraternali, P. A Data Set and a Convolutional Model for Iconography Classification in Paintings. arXiv 2020, arXiv:2010.11697. [Google Scholar]
- Sun, K.H.; Huh, H.; Tama, B.A.; Lee, S.Y.; Jung, J.H.; Lee, S. Vision-Based Fault Diagnostics Using Explainable Deep Learning With Class Activation Maps. IEEE Access 2020, 8, 129169–129179. [Google Scholar] [CrossRef]
- Patro, B.; Lunayach, M.; Patel, S.; Namboodiri, V. U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 7443–7452. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; Kim, Y.; Kim, Y.; Kim, C. Combinational Class Activation Maps for Weakly Supervised Object Localization. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 2930–2938. [Google Scholar] [CrossRef]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018. [Google Scholar] [CrossRef] [Green Version]
- Omeiza, D.; Speakman, S.; Cintas, C.; Weldermariam, K. Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv 2019, arXiv:1908.01224. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; IEEE Computer Society: New York, NY, USA, 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]
- Karayev, S.; Trentacoste, M.; Han, H.; Agarwala, A.; Darrell, T.; Hertzmann, A.; Winnemoeller, H. Recognizing image style. arXiv 2013, arXiv:1311.3715. [Google Scholar]
- Crowley, E.J.; Zisserman, A. The State of the Art: Object Retrieval in Paintings Using Discriminative Regions; British Machine Vision Association: Durham, UK, 2014. [Google Scholar]
- Khan, F.S.; Beigpour, S.; Van de Weijer, J.; Felsberg, M. Painting-91: A large scale database for computational painting categorization. Mach. Vis. Appl. 2014, 25, 1385–1397. [Google Scholar] [CrossRef] [Green Version]
- Strezoski, G.; Worring, M. Omniart: Multi-task deep learning for artistic data analysis. arXiv 2017, arXiv:1708.00684. [Google Scholar]
- Mao, H.; Cheung, M.; She, J. Deepart: Learning joint representations of visual arts. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1183–1191. [Google Scholar]
- Bianco, S.; Mazzini, D.; Napoletano, P.; Schettini, R. Multitask painting categorization by deep multibranch neural network. Expert Syst. Appl. 2019, 135, 90–101. [Google Scholar] [CrossRef] [Green Version]
- Castellano, G.; Vessio, G. Deep learning approaches to pattern extraction and recognition in paintings and drawings: An overview. In Neural Computing and Applications; Springer: New York, NY, USA, 2021; pp. 1–20. [Google Scholar]
- Santos, I.; Castro, L.; Rodriguez-Fernandez, N.; Torrente-Patino, A.; Carballal, A. Artificial Neural Networks and Deep Learning in the Visual Arts: A review. In Neural Computing and Applications; Springer: New York, NY, USA, 2021; pp. 1–37. [Google Scholar]
- Zhao, W.; Zhou, D.; Qiu, X.; Jiang, W. Compare the performance of the models in art classification. PLoS ONE 2021, 16, e0248414. [Google Scholar] [CrossRef]
- Gao, Z.; Shan, M.; Li, Q. Adaptive sparse representation for analyzing artistic style of paintings. J. Comput. Cult. Herit. (JOCCH) 2015, 8, 1–15. [Google Scholar] [CrossRef]
- Elgammal, A.; Kang, Y.; Den Leeuw, M. Picasso, matisse, or a fake? Automated analysis of drawings at the stroke level for attribution and authentication. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Crowley, E.J.; Zisserman, A. Of gods and goats: Weakly supervised learning of figurative art. Learning 2013, 8, 14. [Google Scholar]
- Shen, X.; Efros, A.A.; Aubry, M. Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning. arXiv 2019, arXiv:1903.02678. [Google Scholar]
- Kadish, D.; Risi, S.; Løvlie, A.S. Improving Object Detection in Art Images Using Only Style Transfer. arXiv 2021, arXiv:2102.06529. [Google Scholar]
- Banar, N.; Daelemans, W.; Kestemont, M. Multi-Modal Label Retrieval for the Visual Arts: The Case of Iconclass; Scitepress: Setúbal, Portugal, 2021. [Google Scholar]
- Gonthier, N.; Gousseau, Y.; Ladjal, S. An analysis of the transfer learning of convolutional neural networks for artistic images. arXiv 2020, arXiv:2011.02727. [Google Scholar]
- Cömert, C.; Özbayoğlu, M.; Kasnakoğlu, C. Painter Prediction from Artworks with Transfer Learning. In Proceedings of the IEEE 2021 7th International Conference on Mechatronics and Robotics Engineering (ICMRE), Budapest, Hungary, 3–5 February 2021; pp. 204–208. [Google Scholar]
- Belhi, A.; Ahmed, H.O.; Alfaqheri, T.; Bouras, A.; Sadka, A.H.; Foufou, S. Study and Evaluation of Pre-trained CNN Networks for Cultural Heritage Image Classification. In Data Analytics for Cultural Heritage: Current Trends and Concepts; Springer: Cham, Switerland, 2021; p. 47. [Google Scholar]
- Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 2018, 51, 1–42. [Google Scholar] [CrossRef] [Green Version]
- Buhrmester, V.; Münch, D.; Arens, M. Analysis of explainers of black box deep neural networks for computer vision: A survey. arXiv 2019, arXiv:1911.12116. [Google Scholar]
- Gupta, V.; Demirer, M.; Bigelow, M.; Yu, S.M.; Yu, J.S.; Prevedello, L.M.; White, R.D.; Erdal, B.S. Using Transfer Learning and Class Activation Maps Supporting Detection and Localization of Femoral Fractures on Anteroposterior Radiographs. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1526–1529. [Google Scholar]
- Zhang, M.; Zhou, Y.; Zhao, J.; Man, Y.; Liu, B.; Yao, R. A survey of semi-and weakly supervised semantic segmentation of images. Artif. Intell. Rev. 2020, 53, 4259–4288. [Google Scholar] [CrossRef]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Qiu, S. Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification. arXiv 2018, arXiv:1809.08264. [Google Scholar]
- Lanzi, F.; Lanzi, G. Saints and Their Symbols: Recognizing Saints in Art and in Popular Images; Liturgical Press: Collegeville, MN, USA, 2004; pp. 327–342. [Google Scholar]
- Wikipedia: Saint Symbolism. Available online: https://en.wikipedia.org/wiki/Saint_symbolism (accessed on 24 April 2021).
- Couprie, L.D. Iconclass: An iconographic classification system. Art Libr. J. 1983, 8, 32–49. [Google Scholar] [CrossRef]
- Zhang, D.; Han, J.; Cheng, G.; Yang, M.H. Weakly Supervised Object Localization and Detection: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021. [Google Scholar] [CrossRef]
- Singh, K.K.; Lee, Y.J. Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization. arXiv 2017, arXiv:1704.04232. [Google Scholar]
- Choe, J.; Shim, H. Attention-Based Dropout Layer for Weakly Supervised Object Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 2219–2228. [Google Scholar] [CrossRef] [Green Version]
- Bae, W.; Noh, J.; Kim, G. Rethinking Class Activation Mapping for Weakly Supervised Object Localization. In Part XV, Proceedings of the Computer Vision - ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Lecture Notes in Computer Science; Springer: New York, NY, USA, 2020; Volume 12360, pp. 618–634. [Google Scholar] [CrossRef]
- Gonthier, N.; Ladjal, S.; Gousseau, Y. Multiple instance learning on deep features for weakly supervised object detection with extreme domain shifts. arXiv 2020, arXiv:2008.01178. [Google Scholar]
- Wang, H.; Du, M.; Yang, F.; Zhang, Z. Score-cam: Improved visual explanations via score-weighted class activation mapping. arXiv 2019, arXiv:1910.01279. [Google Scholar]
- Zhao, G.; Zhou, B.; Wang, K.; Jiang, R.; Xu, M. Respond-cam: Analyzing deep models for 3d imaging data by visualizations. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: New York, NY, USA, 2018; pp. 485–492. [Google Scholar]
Iconclass Category | Symbols |
---|---|
Anthony of Padua | Baby Jesus, bread, book, lily, face, cloth |
Dominic | Rosary, star, dog with a torch, face, cloth |
Francis of Assisi | Franciscan cloth, wolf, birds, fish, skull, stigmata, face, cloth |
Jerome | Hermitage, lion, cardinal’s galero, cardinal vest, cross, skull, book, writing material, stone in hand, face, cloth |
John the Baptist | Lamb, head on platter, animal skin, pointing at Christ, pointing at lamb, cross, face, cloth |
Mary Magdalene | Ointment jar, long hair, washing Christ’s feet, skull, crucifix, red egg, face, cloth |
Paul | Sword, book, scroll, horse, beard, balding head, face, cloth |
Peter | Keys, boat, fish, rooster, pallium, papal vest, inverted cross, book, scroll, bushy beard, bushy hair, face, cloth |
Sebastian | Arrows, crown, face, cloth |
Virgin Mary | Baby Jesus, rose, lily, heart, seven swords, crown of stars, serpent, rosary, blue robe, sun and moon, face, cloth, crown |
Iconclass Category | Symbol Classes | Symbol Bounding Boxes |
---|---|---|
Anthony of Padua | 6 | 83 |
Dominic | 4 | 59 |
Francis of Assisi | 5 | 295 |
Jerome | 11 | 434 |
John the Baptist | 5 | 231 |
Mary Magdalene | 5 | 283 |
Paul | 6 | 132 |
Peter | 9 | 408 |
Sebastian | 3 | 267 |
Virgin Mary | 7 | 695 |
Method | Average IoU | GT-Known Loc (%) | mAP (at IoU ) |
---|---|---|---|
CAM | 0.489 | 49.70 | 0.206 |
GradCAM | 0.551 | 61.20 | 0.316 |
GradCAM++ | 0.529 | 59.88 | 0.292 |
Smooth-GradCAM++ | 0.544 | 61.18 | 0.307 |
Anthony | John | Paul | Francis | Magdalene | Jerome | Dominic | Virgin | Peter | Sebastian |
---|---|---|---|---|---|---|---|---|---|
0.076 | 0.289 | 0.173 | 0.33 | 0.616 | 0.228 | 0.142 | 0.442 | 0.399 | 0.468 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pinciroli Vago, N.O.; Milani, F.; Fraternali, P.; da Silva Torres, R. Comparing CAM Algorithms for the Identification of Salient Image Features in Iconography Artwork Analysis. J. Imaging 2021, 7, 106. https://doi.org/10.3390/jimaging7070106
Pinciroli Vago NO, Milani F, Fraternali P, da Silva Torres R. Comparing CAM Algorithms for the Identification of Salient Image Features in Iconography Artwork Analysis. Journal of Imaging. 2021; 7(7):106. https://doi.org/10.3390/jimaging7070106
Chicago/Turabian StylePinciroli Vago, Nicolò Oreste, Federico Milani, Piero Fraternali, and Ricardo da Silva Torres. 2021. "Comparing CAM Algorithms for the Identification of Salient Image Features in Iconography Artwork Analysis" Journal of Imaging 7, no. 7: 106. https://doi.org/10.3390/jimaging7070106