Towards video text visual question answering: benchmark and baseline
Abstract
Supplementary Material
- Download
- 10.40 MB
References
Index Terms
- Towards video text visual question answering: benchmark and baseline
Recommendations
Multimodal attention-driven visual question answering for Malayalam
AbstractVisual question answering is a challenging task that necessitates for sophisticated reasoning over the visual elements to provide an accurate answer to a question. Majority of the state-of-the-art VQA models are only applicable to English ...
Visual question answering: Which investigated applications?
Highlights- The paper presents concrete applications of Visual Question Answering
- Domains where VQA has been experimented are presented together with the exploited dataset
- The paper suggests some challenging techniques that can be especially ...
AbstractVisual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic information is ...
Two-Stage Multimodality Fusion for High-Performance Text-Based Visual Question Answering
Computer Vision – ACCV 2022AbstractText-based visual question answering (TextVQA) is to answer a text-related question by reading texts in a given image, which needs to jointly reason over three modalities—question, visual objects and scene texts in images. Most existing works ...
Comments
Information & Contributors
Information
Published In
- Editors:
- S. Koyejo,
- S. Mohamed,
- A. Agarwal,
- D. Belgrave,
- K. Cho,
- A. Oh
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0