Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3677182.3677287acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasensConference Proceedingsconference-collections
research-article

Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts

Published: 03 August 2024 Publication History

Abstract

The zero-shot performance of visual question answering (VQA) models relies heavily on prompts. For example, a zero-shot VQA for disaster scenarios could leverage well-designed Chain of Thought (CoT) prompts to stimulate the model's potential. However, using CoT prompts has some problems, such as causing an incorrect answer in the end due to the hallucination in the thought process. In this paper, we propose a zero-shot VQA named Flood Disaster VQA with Two-Stage Prompt (VQA-TSP). The model generates the thought process in the first stage and then uses the thought process to generate the final answer in the second stage. In particular, visual context is added in the second stage to relieve the hallucination problem that exists in the thought process. Experimental results show that our method exceeds the performance of state-of-the-art zero-shot VQA models for flood disaster scenarios in total. Our study provides a research basis for improving the performance of CoT-based zero-shot VQA.

References

[1]
Center, A. D. R. 2015. Sendai framework for disaster risk reduction 2015–2030. United Nations Office for Disaster Risk Reduction: Geneva, Switzerland.
[2]
Tiong, A. M. H., Li, J., Li, B., Savarese, S., & Hoi, S. C. 2022. Plug-and-play vqa: Zero-shot vqa by conjoining large pretrained models with zero training. arXiv preprint arXiv:2210.08773.
[3]
Zhang, Z., Zhang, A., Li, M., Zhao, H., Karypis, G., & Smola, A. 2023. Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923.
[4]
Wang, J., Li, J., & Zhao, H. 2023. Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning. arXiv preprint arXiv:2310.13552.
[5]
Li, J., Li, D., Xiong, C., & Hoi, S. 2022, June. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning (pp. 12888-12900). PMLR.
[6]
Sun, Y., Wang, C., & Peng, Y. 2023. Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario. arXiv preprint arXiv:2312.01882. in press.
[7]
Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., ... & Hashimoto, T. B. 2023. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6), 7.
[8]
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824-24837.
[9]
Artstein, R. 2017. Inter-annotator agreement. Handbook of linguistic annotation, 297-313.
[10]
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
[11]
Lobry, S., Marcos, D., Murray, J., & Tuia, D. 2020. RSVQA: Visual question answering for remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, 58(12), 8555-8566.

Index Terms

  1. Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ASENS '24: Proceedings of the International Conference on Algorithms, Software Engineering, and Network Security
      April 2024
      759 pages
      ISBN:9798400709784
      DOI:10.1145/3677182
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 August 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ASENS 2024

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 25
        Total Downloads
      • Downloads (Last 12 months)25
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 03 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media