Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3615522.3615547acmotherconferencesArticle/Chapter ViewAbstractPublication PagesvinciConference Proceedingsconference-collections
research-article
Open access

Visual Analysis of Scene-Graph-Based Visual Question Answering

Published: 20 October 2023 Publication History

Abstract

Scene-graph-based Visual Question Answering (VQA) has emerged as a burgeoning field in Deep Learning research, with a growing demand for robust and interpretable VQA systems. In this paper, we present a novel visual analysis approach that addresses two critical objectives in VQA: identifying and correcting prediction issues and providing insights into model decision-making processes through visualizing internal information. Our approach builds on the GraphVQA framework, which uses graph neural networks to process scene graphs representing images and which was trained on the widely-used GQA dataset. Our analysis tool aims at users familiar with the basics of graph-based VQA. By leveraging query-based scene analysis and visualization of crucial internal states, we are able to detect and pinpoint reasons for inaccurate predictions, facilitating model refinement and dataset curation. Identifying expressive internal states is a challenge. Through rigorous computer-based evaluations and presentation of a use case, we demonstrate the effectiveness of our analysis tool and model state visualization.

Supplementary Material

MP4 File (VQA_Paper_Vinci.mp4)
Presentation video

References

[1]
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. 2015. VQA: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision. 2425–2433.
[2]
X. Chang, P. Ren, P. Xu, Z. Li, X. Chen, and A. Hauptmann. 2023. A comprehensive survey of scene graphs: Generation and application. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2023), 1–26.
[3]
J. Choo and S. Liu. 2018. Visual analytics for explainable deep learning. IEEE Computer Graphics and Applications 38, 4 (2018), 84–92.
[4]
K. A. Cook and J. J Thomas. 2005. Illuminating the path: The research and development agenda for visual analytics. Technical Report. Pacific Northwest National Laboratory.
[5]
V. Damodaran, S. Chakravarthy, A. Kumar, A. Umapathy, T. Mitamura, Y. Nakashima, N. Garcia, and C. Chu. 2021. Understanding the role of scene graphs in visual question answering. arXiv 2101.05479 (2021).
[6]
M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, and P. Sen. 2020. A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 447–459.
[7]
F. K. Došilović, M. Brčić, and N. Hlupić. 2018. Explainable artificial intelligence: A survey. In 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics. IEEE, 0210–0215.
[8]
R. Garcia, T. Munz, and D. Weiskopf. 2021. Visual analytics tool for the interpretation of hidden states in recurrent neural networks. Visual Computing for Industry, Biomedicine, and Art 4, 1 (2021), 24.
[9]
R. Garcia, A. C. Telea, B. C. da Silva, J. Tørresen, and J. L. D. Comba. 2018. A task-and-technique centered survey on visual analytics for deep learning model engineering. Computers & Graphics 77 (2018), 30–49.
[10]
M. Gervautz and W. Purgathofer. 1988. A simple method for color quantization: Octree quantization. In New Trends in Computer Graphics: Proceedings of CG International’88. Springer, 219–231.
[11]
S. Ghosh, G. Burachas, A. Ray, and A.Ziskind. 2019. Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention. arXiv 1902.05715 (2019).
[12]
Y. Goyal, A. Mohapatra, D. Parikh, and D. Batra. 2016. Towards transparent AI systems: Interpreting visual question answering models. arXiv: 1608.08974 (2016).
[13]
F. Hohman, M. Kahng, R. Pienta, and D. H. Chau. 2018. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics 25, 8 (2018), 2674–2693.
[14]
Z. Huang, Z. Zeng, B. Liu, D. Fu, and J. Fu. 2020. Pixel-BERT: Aligning image pixels with text by deep multi-modal transformers. arXiv 2004.00849 (2020).
[15]
D. A. Hudson and C. D. Manning. 2019. GQA: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 6693–6702.
[16]
J. Johnson, R. Krishna, M. Stark, L. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei. 2015. Image retrieval using scene graphs. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers (IEEE), 3668–3678.
[17]
M. Kahng, P. Y. Andrews, A. Kalro, and D. H. Chau. 2018. ActiVis: Visual exploration of industry-scale deep neural network models. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 88–97.
[18]
D. Keim, F. Mansmann, J. Schneidewind, J. Thomas, and H. Ziegler. 2008. Visual analytics: Scope and challenges. In Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, S. J. Simoff, M. H. Böhlen, and A. Mazeika (Eds.). Springer, Berlin, Heidelberg, 76–90.
[19]
X. Li, Q.and Tang and Y. Jian. 2021. Adversarial learning with bidirectional attention for visual question answering. Sensors 21, 21 (2021), 7164.
[20]
W. Liang, Y. Jiang, and Z. Liu. 2021. GraphVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering. In Proceedings of the Third Workshop on Multimodal Artificial Intelligence. Association for Computational Linguistics, 79–86.
[21]
S. Liu, X. Wang, M. Liu, and J. Zhu. 2017. Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics 1, 1 (2017), 48 – 56.
[22]
Y. Ming, S. Cao, R. Zhang, Z. Li, Y. Chen, Y. Song, and H. Qu. 2017. Understanding hidden memories of recurrent neural networks. In Proceedings of the 2017 IEEE Conference on Visual Analytics Science and Technology. 13–24.
[23]
T. Munz, D. Väth, P. Kuznecov, N. T. Vu, and D. Weiskopf. 2022. Visualization-based improvement of neural machine translation. Computers & Graphics 103 (2022), 45–60.
[24]
W. Norcliffe-Brown, S. Vafeias, and S. Parisot. 2018. Learning conditioned graph structures for interpretable visual question answering. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.
[25]
J. Pennington, R. Socher, and C. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1532–1543.
[26]
N. F. Rajani and R. J. Mooney. 2017. Ensembling Visual Explanations for VQA. In Proceedings of the NIPS 2017 Workshop on Visually-Grounded Interaction and Language.
[27]
A. Ray, M. Cogswell, X. Lin, K. Alipour, A. Divakaran, Y. Yao, and G. Burachas. 2021. Generating and evaluating explanations of attended and error-inducing input regions for VQA models. Applied AI Letters 2, 4 (2021), e51.
[28]
N. Schäfer, P. Tilli, T. Munz-Körner, S. Künzel, S. Vidyapu, N. T. Vu, and D. Weiskopf. 2023. Visual analysis system for scene-graph-based visual question answering. https://doi.org/10.18419/darus-3589
[29]
H. Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush. 2018. LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 667–676.
[30]
M. H. Vu, T. Löfstedt, T. Nyholm, and R. Sznitman. 2020. A question-centric model for visual question answering in medical imaging. IEEE Transactions on Medical Imaging 39, 9 (2020), 2856–2868.
[31]
F. Xu, H. Uszkoreit, Y. Du, W. Fan, D. Zhao, and J. Zhu. 2019. Explainable AI: A brief survey on history, research areas, approaches and challenges. In Proceedings of Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019. Springer, 563–574.
[32]
J. Yuan, C. Chen, W. Yang, M. Liu, J. Xia, and S. Liu. 2020. A survey of visual analytics techniques for machine learning. Computational Visual Media 7 (2020), 3–36.
[33]
R. Yusuf, J. Owusu, H. Wang, K. Qin, Z. Lawal, and Y. Dong. 2022. VQA and visual reasoning: An overview of recent datasets, methods and challenges. arXiv 2212.13296 (2022).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
VINCI '23: Proceedings of the 16th International Symposium on Visual Information Communication and Interaction
September 2023
308 pages
ISBN:9798400707513
DOI:10.1145/3615522
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2023

Check for updates

Author Tags

  1. VQA
  2. explainable AI
  3. scene graphs
  4. visual analytics

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)

Conference

VINCI 2023

Acceptance Rates

Overall Acceptance Rate 71 of 193 submissions, 37%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 333
    Total Downloads
  • Downloads (Last 12 months)333
  • Downloads (Last 6 weeks)24
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media