Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Zhao, Yu; Du, Xiaotang; Hong, Giwon; Gema, Aryo Pradipta; Devoto, Alessio; Wang, Hongru; He, Xuanli; Wong, Kam-Fai; Minervini, Pasquale

Computer Science > Computation and Language

arXiv:2410.16090 (cs)

[Submitted on 21 Oct 2024 (v1), last revised 9 Feb 2025 (this version, v2)]

Title:Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Authors:Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini

View PDF HTML (experimental)

Abstract:Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. However, their parametric knowledge may conflict with the information provided in the context. Such conflicts can lead to undesirable model behaviour, such as reliance on outdated or incorrect information. In this work, we investigate whether LLMs can identify knowledge conflicts and whether it is possible to know which source of knowledge the model will rely on by analysing the residual stream of the LLM. Through probing tasks, we find that LLMs can internally register the signal of knowledge conflict in the residual stream, which can be accurately detected by probing the intermediate model activations. This allows us to detect conflicts within the residual stream before generating the answers without modifying the input or model parameters. Moreover, we find that the residual stream shows significantly different patterns when the model relies on contextual knowledge versus parametric knowledge to resolve conflicts. This pattern can be employed to estimate the behaviour of LLMs when conflict happens and prevent unexpected answers before producing the answers. Our analysis offers insights into how LLMs internally manage knowledge conflicts and provides a foundation for developing methods to control the knowledge selection processes.

Comments:	Foundation Model Interventions Workshop @ NeurIPS 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2410.16090 [cs.CL]
	(or arXiv:2410.16090v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.16090

Submission history

From: Yu Zhao [view email]
[v1] Mon, 21 Oct 2024 15:12:51 UTC (10,653 KB)
[v2] Sun, 9 Feb 2025 17:47:52 UTC (10,653 KB)

Computer Science > Computation and Language

Title:Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators