Interpretable Visual Understanding with Cognitive Attention Network

Tang, Xuejiao; Zhang, Wenbin; Yu, Yi; Turner, Kea; Derr, Tyler; Wang, Mengyu; Ntoutsi, Eirini

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.02924 (cs)

[Submitted on 6 Aug 2021 (v1), last revised 7 Dec 2023 (this version, v3)]

Title:Interpretable Visual Understanding with Cognitive Attention Network

Authors:Xuejiao Tang, Wenbin Zhang, Yi Yu, Kea Turner, Tyler Derr, Mengyu Wang, Eirini Ntoutsi

View PDF HTML (experimental)

Abstract:While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at this https URL

Comments:	ICANN21
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2108.02924 [cs.CV]
	(or arXiv:2108.02924v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.02924

Submission history

From: Xuejiao Tang [view email]
[v1] Fri, 6 Aug 2021 02:57:43 UTC (20,003 KB)
[v2] Sat, 14 Aug 2021 17:23:36 UTC (20,087 KB)
[v3] Thu, 7 Dec 2023 23:09:57 UTC (8,776 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-08

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Wenbin Zhang
Yi Yu
Tyler Derr
Eirini Ntoutsi

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Interpretable Visual Understanding with Cognitive Attention Network

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Interpretable Visual Understanding with Cognitive Attention Network

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators