Revealing Fine-Grained Values and Opinions in Large Language Models

Wright, Dustin; Arora, Arnav; Borenstein, Nadav; Yadav, Srishti; Belongie, Serge; Augenstein, Isabelle

Computer Science > Computation and Language

arXiv:2406.19238 (cs)

[Submitted on 27 Jun 2024 (v1), last revised 31 Oct 2024 (this version, v2)]

Title:Revealing Fine-Grained Values and Opinions in Large Language Models

Authors:Dustin Wright, Arnav Arora, Nadav Borenstein, Srishti Yadav, Serge Belongie, Isabelle Augenstein

View PDF HTML (experimental)

Abstract:Uncovering latent values and opinions embedded in large language models (LLMs) can help identify biases and mitigate potential harm. Recently, this has been approached by prompting LLMs with survey questions and quantifying the stances in the outputs towards morally and politically charged statements. However, the stances generated by LLMs can vary greatly depending on how they are prompted, and there are many ways to argue for or against a given position. In this work, we propose to address this by analysing a large and robust dataset of 156k LLM responses to the 62 propositions of the Political Compass Test (PCT) generated by 6 LLMs using 420 prompt variations. We perform coarse-grained analysis of their generated stances and fine-grained analysis of the plain text justifications for those stances. For fine-grained analysis, we propose to identify tropes in the responses: semantically similar phrases that are recurrent and consistent across different prompts, revealing natural patterns in the text that a given LLM is prone to produce. We find that demographic features added to prompts significantly affect outcomes on the PCT, reflecting bias, as well as disparities between the results of tests when eliciting closed-form vs. open domain responses. Additionally, patterns in the plain text rationales via tropes show that similar justifications are repeatedly generated across models and prompts even with disparate stances.

Comments:	Findings of EMNLP 2024; 28 pages, 20 figures, 7 tables
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2406.19238 [cs.CL]
	(or arXiv:2406.19238v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.19238

Submission history

From: Dustin Wright [view email]
[v1] Thu, 27 Jun 2024 15:01:53 UTC (23,403 KB)
[v2] Thu, 31 Oct 2024 16:06:22 UTC (21,899 KB)

Computer Science > Computation and Language

Title:Revealing Fine-Grained Values and Opinions in Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Revealing Fine-Grained Values and Opinions in Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators