ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

Chen, Kan; Wang, Jiang; Chen, Liang-Chieh; Gao, Haoyuan; Xu, Wei; Nevatia, Ram

Computer Science > Computer Vision and Pattern Recognition

arXiv:1511.05960v2 (cs)

[Submitted on 18 Nov 2015 (v1), last revised 3 Apr 2016 (this version, v2)]

Title:ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

Authors:Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia

View PDF

Abstract:We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and an image related natural language question, VQA generates the natural language answer for the question. Generating the correct answers requires the model's attention to focus on the regions corresponding to the question, because different questions inquire about the attributes of different image regions. We introduce an attention based configurable convolutional neural network (ABC-CNN) to learn such question-guided attention. ABC-CNN determines an attention map for an image-question pair by convolving the image feature map with configurable convolutional kernels derived from the question's semantics. We evaluate the ABC-CNN architecture on three benchmark VQA datasets: Toronto COCO-QA, DAQUAR, and VQA dataset. ABC-CNN model achieves significant improvements over state-of-the-art methods on these datasets. The question-guided attention generated by ABC-CNN is also shown to reflect the regions that are highly relevant to the questions.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1511.05960 [cs.CV]
	(or arXiv:1511.05960v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1511.05960

Submission history

From: Kan Chen [view email]
[v1] Wed, 18 Nov 2015 20:59:50 UTC (2,863 KB)
[v2] Sun, 3 Apr 2016 22:47:38 UTC (4,291 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2015-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kan Chen
Jiang Wang
Liang-Chieh Chen
Haoyuan Gao
Wei Xu

…

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators