research-article

Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts

Authors:

Yimin Sun,

Chao Wang,

Yan PengAuthors Info & Claims

ASENS '24: Proceedings of the International Conference on Algorithms, Software Engineering, and Network Security

Pages 587 - 590

https://doi.org/10.1145/3677182.3677287

Published: 03 August 2024 Publication History

Get Access

Abstract

The zero-shot performance of visual question answering (VQA) models relies heavily on prompts. For example, a zero-shot VQA for disaster scenarios could leverage well-designed Chain of Thought (CoT) prompts to stimulate the model's potential. However, using CoT prompts has some problems, such as causing an incorrect answer in the end due to the hallucination in the thought process. In this paper, we propose a zero-shot VQA named Flood Disaster VQA with Two-Stage Prompt (VQA-TSP). The model generates the thought process in the first stage and then uses the thought process to generate the final answer in the second stage. In particular, visual context is added in the second stage to relieve the hallucination problem that exists in the thought process. Experimental results show that our method exceeds the performance of state-of-the-art zero-shot VQA models for flood disaster scenarios in total. Our study provides a research basis for improving the performance of CoT-based zero-shot VQA.

References

[1]

Center, A. D. R. 2015. Sendai framework for disaster risk reduction 2015–2030. United Nations Office for Disaster Risk Reduction: Geneva, Switzerland.

Google Scholar

[2]

Tiong, A. M. H., Li, J., Li, B., Savarese, S., & Hoi, S. C. 2022. Plug-and-play vqa: Zero-shot vqa by conjoining large pretrained models with zero training. arXiv preprint arXiv:2210.08773.

Google Scholar

[3]

Zhang, Z., Zhang, A., Li, M., Zhao, H., Karypis, G., & Smola, A. 2023. Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923.

Google Scholar

[4]

Wang, J., Li, J., & Zhao, H. 2023. Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning. arXiv preprint arXiv:2310.13552.

Google Scholar

[5]

Li, J., Li, D., Xiong, C., & Hoi, S. 2022, June. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning (pp. 12888-12900). PMLR.

Google Scholar

[6]

Sun, Y., Wang, C., & Peng, Y. 2023. Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario. arXiv preprint arXiv:2312.01882. in press.

Google Scholar

[7]

Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., ... & Hashimoto, T. B. 2023. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6), 7.

Google Scholar

[8]

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824-24837.

Google Scholar

[9]

Artstein, R. 2017. Inter-annotator agreement. Handbook of linguistic annotation, 297-313.

Google Scholar

[10]

Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.

Google Scholar

[11]

Lobry, S., Marcos, D., Murray, J., & Tuia, D. 2020. RSVQA: Visual question answering for remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, 58(12), 8555-8566.

Crossref

Google Scholar

Index Terms

Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
    2. Natural language processing
      1. Natural language generation

Recommendations

Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario
ICAICE '23: Proceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering

Visual question answering (VQA) is a fundamental and essential AI task, and VQA-based disaster scenario understanding is a hot research topic. For instance, we can ask questions about a disaster image by the VQA model and the answer can help identify ...
CNN Based Approach for Post Disaster Damage Assessment
ICDCN '20: Proceedings of the 21st International Conference on Distributed Computing and Networking

After any disaster, the Government rehabilitates the victims based on the severity of the damage caused to their properties. Since a huge number of rehabilitation claims flow in after the disaster, it takes up a lot of manual labor in inspecting and ...
Construction of Urban Flood Disaster Emergency Management System Using Scenario Construction Technology
Due to the global climate anomaly, ecological environment damage, human activities, and other reasons, flood disaster has become a major threat to mankind. The emergency rescue resources under different subject management and different emergencies have ...

Comments

Information & Contributors

Information

Published In

ASENS '24: Proceedings of the International Conference on Algorithms, Software Engineering, and Network Security

April 2024

759 pages

ISBN:9798400709784

DOI:10.1145/3677182

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ASENS 2024

ASENS 2024: International Conference on Algorithms, Software Engineering, and Network Security

April 26 - 28, 2024

Nanchang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Index Terms

Recommendations

Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario

CNN Based Approach for Post Disaster Damage Assessment

Construction of Urban Flood Disaster Emergency Management System Using Scenario Construction Technology

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations