tutorial

Open access

Towards understanding and mitigating the hallucinations in NLP and Speech

Authors:

Vishwajeet Kumar,

Riyaz BhatAuthors Info & Claims

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

Pages 489 - 492

https://doi.org/10.1145/3632410.3633297

Published: 04 January 2024 Publication History

All formats PDF

Abstract

With the recent advances in natural language processing, thanks to deep learning architectures such as the transformer, the performances on many of the challenging NLP tasks such as question answering, machine translation, abstractive summarization, etc. have exponentially improved. However, with the state-of-the-art models, it is observed that even though these models generate natural and fluent-looking text but many times they are unfaithful and may contain facts/information that is irrelevant or not supported by the input. This phenomenon is referred to in the literature as a hallucination. A similar phenomenon is observed in end-to-end speech recognition systems, where the portion of the output text is having different acoustics when compared to the corresponding speech signal.

In this tutorial, we introduce the problem of hallucinations in various Speech and NLP tasks such as machine translation, summarization and speech recognition. We categorize the hallucinations observed in this model and describe the techniques to quantify them. Next, we describe recent techniques to overcome hallucinations for many of these tasks. We draw the attention of the AI community to the potential problems of hallucinations in NLP and speech.

References

[1]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[2]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[3]

Jerome Goddard. 2023. Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers. The American Journal of Medicine (2023).

[4]

Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012).

[5]

Nuno M. Guerreiro, Elena Voita, and André F. T. Martins. 2022. Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation. https://doi.org/10.48550/ARXIV.2208.05309

[6]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. 2022. Survey of hallucination in natural language generation. arXiv preprint arXiv:2202.03629 (2022).

[7]

Marcin Junczys-Dowmunt. 2018. Dual conditional cross-entropy filtering of noisy parallel corpora. arXiv preprint arXiv:1809.00197 (2018).

[8]

Bar Lanyado. 2023. Can you trust chatgpt’s package recommendations?https://vulcan.io/blog/ai-hallucinations-package-risk

[9]

Katherine Lee, Orhan Firat, Ashish Agarwal, Clara Fannjiang, and David Sussillo. 2018. Hallucinations in neural machine translation. (2018).

[10]

Prasanna Parthasarathi, Koustuv Sinha, Joelle Pineau, and Adina Williams. 2021. Sometimes we want ungrammatical translations. In Findings of the Association for Computational Linguistics: EMNLP 2021. 3205–3227.

[11]

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492–28518.

[12]

Vikas Raunak, Arul Menezes, and Marcin Junczys-Dowmunt. 2021. The curious case of hallucinations in neural machine translation. arXiv preprint arXiv:2104.06683 (2021).

[13]

Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval Augmentation Reduces Hallucination in Conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 3784–3803. https://doi.org/10.18653/v1/2021.findings-emnlp.320

[14]

Vinit Unni, Shreya Khare, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi, and Samarth Bharadwaj. 2022. Adaptive Discounting of Implicit Language Models in RNN-Transducers. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8122–8126.

[15]

Vinit S Unni, Ashish Mittal, Preethi Jyothi, and Sunita Sarawagi. 2023. Improving RNN-Transducers with Acoustic LookAhead. arXiv preprint arXiv:2307.05006 (2023).

[16]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[17]

Chaojun Wang and Rico Sennrich. 2020. On exposure bias, hallucination and domain shift in neural machine translation. arXiv preprint arXiv:2005.03642 (2020).

[18]

Benjamin Weiser. 2023. Here’s what happens when your lawyer uses Chatgpt. https://www.nytimes.com/2023/05/27/nyregion/avianca-airline-lawsuit-chatgpt.html

[19]

Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, and Marjan Ghazvininejad. 2020. Detecting hallucinated content in conditional neural sequence generation. arXiv preprint arXiv:2011.02593 (2020).

Cited By

Edwards JNguyen ASobocinski MLämsä Jde Araujo ADang BWhitehead RRoberts AKaarlela MJarvela S(2024)MAI - A Proactive Speech Agent for Metacognitive Mediation in Collaborative LearningProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665585(1-5)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665585
Yeh STang H(2024)Estimating the Completeness of Discrete Speech Units2024 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT61566.2024.10832198(415-422)Online publication date: 2-Dec-2024
https://doi.org/10.1109/SLT61566.2024.10832198
Stoykova RPorter KBeka T(2024)The AI Act in a law enforcement context: The case of automatic speech recognition for transcribing investigative interviewsForensic Science International: Synergy10.1016/j.fsisyn.2024.1005639(100563)Online publication date: 2024
https://doi.org/10.1016/j.fsisyn.2024.100563
Show More Cited By

Index Terms

Towards understanding and mitigating the hallucinations in NLP and Speech
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Understanding Lombard speech: a review of compensation techniques towards improving speech based recognition systems
Abstract
Building voice-based Artificial Intelligence (AI) systems that can efficiently interact with humans through speech has become plausible today due to rapid strides in efficient data-driven AI techniques. Such a human–machine voice interaction in ...
Toward Robust Speech Recognition and Understanding

The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods that ...
Prosodic disambiguation in automatic speech understanding of Thai

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

January 2024

627 pages

ISBN:9798400716348

DOI:10.1145/3632410

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2024

Check for updates

Qualifiers

Tutorial
Research
Refereed limited

Conference

CODS-COMAD 2024

CODS-COMAD 2024: 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

January 4 - 7, 2024

Bangalore, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
1,073
Total Downloads

Downloads (Last 12 months)1,073
Downloads (Last 6 weeks)72

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Edwards JNguyen ASobocinski MLämsä Jde Araujo ADang BWhitehead RRoberts AKaarlela MJarvela S(2024)MAI - A Proactive Speech Agent for Metacognitive Mediation in Collaborative LearningProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665585(1-5)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665585
Yeh STang H(2024)Estimating the Completeness of Discrete Speech Units2024 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT61566.2024.10832198(415-422)Online publication date: 2-Dec-2024
https://doi.org/10.1109/SLT61566.2024.10832198
Stoykova RPorter KBeka T(2024)The AI Act in a law enforcement context: The case of automatic speech recognition for transcribing investigative interviewsForensic Science International: Synergy10.1016/j.fsisyn.2024.1005639(100563)Online publication date: 2024
https://doi.org/10.1016/j.fsisyn.2024.100563
Seifert DJöckel LTrendowicz ACiolkowski MHonroth TJedlitschka A(2024)Can Large Language Models (LLMs) Compete with Human Requirements Reviewers? – Replication of an Inspection Experiment on Requirements DocumentsProduct-Focused Software Process Improvement10.1007/978-3-031-78386-9_3(27-42)Online publication date: 27-Nov-2024
https://doi.org/10.1007/978-3-031-78386-9_3
Grasso FLocci S(2024)Assessing Generative Language Models in Classification Tasks: Performance and Self-evaluation Capabilities in the Environmental and Climate Change DomainNatural Language Processing and Information Systems10.1007/978-3-031-70242-6_29(302-313)Online publication date: 20-Sep-2024
https://doi.org/10.1007/978-3-031-70242-6_29

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten