Computer Science > Computer Vision and Pattern Recognition
[Submitted on 7 Oct 2016 (v1), revised 30 Dec 2016 (this version, v2), latest version 3 Dec 2019 (v4)]
Title:Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization
View PDFAbstract:We propose a technique for making CNN-based models more transparent by visualizing the input image regions that are important for predictions from these models- producing visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses the class-specific gradient information flowing into the final convolutional layer of a CNN to produce a coarse localization map of the regions in the image important for each class. Grad-CAM is a strict generalization of Class Activation Mapping (CAM). Unlike CAM, Grad-CAM is broadly applicable to any CNN-based architectures and needs no re-training. We show how Grad-CAM may be combined with pixel-space visualizations (such as Guided Backprop) to create a high-resolution class-discriminative visualization (Guided Grad-CAM). We generate Grad-CAM and Guided Grad-CAM visualizations to better understand off-the-shelf image classification, image captioning, and visual question answering (VQA) models, including Res-Net based architectures. In the context of image classification models, our visualizations (a) lend insight into model's failure modes, and (b) outperform pixel-space gradient visualizations on the ILSVRC-15 weakly-supervised localization. For image captioning and VQA, our visualizations expose the somewhat surprising insight that common CNN+LSTM models are good at localizing discriminative input image regions despite not being trained on grounded image-text pairs. Finally, through human studies we show that our explanations help users establish trust in the predictions made by deep networks. Interestingly, we find that Guided Grad-CAM helps untrained users successfully discern a stronger deep network from a weaker one even when both make identical decisions. Our code is available at this http URL and a demo is available at this http URL. Video of the demo can be found at this http URL.
Submission history
From: Ramprasaath Ramasamy Selvaraju [view email][v1] Fri, 7 Oct 2016 19:54:24 UTC (8,245 KB)
[v2] Fri, 30 Dec 2016 07:19:35 UTC (8,596 KB)
[v3] Tue, 21 Mar 2017 23:48:00 UTC (9,133 KB)
[v4] Tue, 3 Dec 2019 02:13:03 UTC (7,321 KB)
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.