The project demonstrates the background about LLM hallucination 👻 as well as the mitigation methods regarding uncertainty 🤔 & knowledge 📓. The research works are collected and systematically clustered in various directions and methods for reliable AI development. The project provides a framework of improving LLMs' factuality perception and eliciting factual expressions to address the hallucination issue.
Welcome to participate in this project to share valuable papers and exchange great ideas!
- Reliable LLM: Hallucination & Knowledge & Uncertainty (From Factuality Perception to Expression)
- Related Works of Hallucination & Knowledge & Uncertainty
- 🔭 Future Directions
The definitions of hallucination vary and depend on specific tasks. This project focuses on hallucination issues in knowledge-intensive tasks (closed-book QA, dialogue, RAG, commonsense reasoning, translation, etc.), where hallucinations refer to the non-factual, incorrect knowledge in generations unfaithful with world knowledge.
The causes of hallucinations vary in unfiltered incorrect statements in pertaining data, limited input length of model architecture, maximum likelihood training strategy, and diverse decoding strategies.
Architectures and input lengths, pertaining data and strategy of released LLMs are fixed. Tracing incorrect texts in substantial pertaining data is challenging. This project mainly focuses on detecting hallucinations by tracing what LLMs learn in the pertaining stage and mitigating hallucinations in fine-tuning and decoding.
Comparing open-generation tasks, knowledge-intensive tasks have specific grounding-truth reference - world knowledge. Therefore, we can estimate the knowledge boundary map of an LLM to specify what it knows. It is crucial to ensure the certainty level or honesty of LLMs to a piece of factual knowledge for hallucination detection (from grey area to green area).
The above diagram can roughly and simply represent the knowledge boundary. However, in reality, like humans, for much knowledge, we exist in a state of uncertainty, rather than only in a state of knowing or not knowing. Moreover, maximum likelihood prediction in pertaining makes LLMs be prone to generate over-confident responses. Even if the LLM knows a fact, how to make LLMs accurately tell what they know is also important.
This adds complexity to determining the knowledge boundary, which leads to two challenging questions:
-
How to accurately perceive (Perception) the knowledge boundary?
(Example: Given a question, such as "What is the capital of France?", the model is required to provide its confidence level for this question.)
-
How to accurately express (Expression) knowledge where the boundary is somewhat vague? (Previous work U2Align is a method to enhance expressions. Current interests for the second stage “expression” also lie in “alignment” methods.)
(Example: If the confidence level for answering "Paris" to the above question is 40%, should the model refuse to answer or provide a response in this situation?)
- Models are prone to be over-confident in predictions using maximizing likelihood (MLE) training, it is crucial to identify the confidence score or uncertainty estimation for reliable AI applications.
- A model is considered well-calibrated if the confidence score of predictions (SoftMax probability) are well-aligned with the actual probability of answers being correct.
- Expected Calibration Error (ECE) and Reliability Diagram is used to measure the calibration performance.
Uncalibrated (left), over-confident (mid) and well-calibrated (right) models.
- To calibrate generative LLMs, we should quantify the confidence & uncertainty on generated sentences.
- Uncertainty: Categorized into aleatoric (data) and epistemic (model) uncertainty. Frequently measured by the entropy of the prediction to indicate the dispersion of the model prediction.
- Confidence: Generally associated with both the input and the prediction.
- The terms uncertainty and confidence are often used interchangeably.
Although the knowledge boundary is important for knowledge-intensive tasks, there are no specific definitions or concepts in previous works. Current methods for estimating knowledge boundaries refer to confidence/uncertainty estimation methods including ① logit-based methods using token-level probabilities; ② prompt-based methods to make LLMs express confidence in words; ③ sampling-based methods to calculate consistency; and ④ training-based methods to learn the ability to express uncertainty.
This part of works focus on improving confidence expressions of LLMs in a two-stage form by 1) self-prompting LLMs to generate responses to queries and then collecting the samples to construct a dataset with specific features, and 2) fine-tuning LLMs on the collected dataset to improve the specific capability of LLMs.
- More advanced methods to assist LLMs hallucination detection and human decisions. (A new paradigm)
- Confidence estimation for long-term generations like code, novel, etc. (Benchmark)
- Learning to explain and clarify its confidence estimation and calibration. (Natural language)
- Calibration on human variation (Misalignment between LM measures and human disagreement).
- Confidence estimation and calibration for multi-modal LLMs.