Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses

Ohmer, Xenia; Bruni, Elia; Hupkes, Dieuwke

Computer Science > Computation and Language

arXiv:2305.11662 (cs)

[Submitted on 19 May 2023 (v1), last revised 20 Dec 2023 (this version, v3)]

Title:Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses

Authors:Xenia Ohmer, Elia Bruni, Dieuwke Hupkes

View PDF HTML (experimental)

Abstract:At the staggering pace with which the capabilities of large language models (LLMs) are increasing, creating future-proof evaluation sets to assess their understanding becomes more and more challenging. In this paper, we propose a novel paradigm for evaluating LLMs which leverages the idea that correct world understanding should be consistent across different (Fregean) senses of the same meaning. Accordingly, we measure understanding not in terms of correctness but by evaluating consistency across multiple senses that are generated by the model itself. We showcase our approach by instantiating a test where the different senses are different languages, hence using multilingual self-consistency as a litmus test for the model's understanding and simultaneously addressing the important topic of multilinguality. Taking one of the latest versions of ChatGPT as our object of study, we evaluate multilingual consistency for two different tasks across three different languages. We show that its multilingual consistency is still lacking, and that its task and world understanding are thus not language-independent. As our approach does not require any static evaluation corpora in languages other than English, it can easily and cheaply be extended to different languages and tasks and could become an integral part of future benchmarking efforts.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.11662 [cs.CL]
	(or arXiv:2305.11662v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.11662

Submission history

From: Xenia Ohmer [view email]
[v1] Fri, 19 May 2023 13:23:51 UTC (272 KB)
[v2] Tue, 23 May 2023 15:12:45 UTC (274 KB)
[v3] Wed, 20 Dec 2023 12:57:34 UTC (266 KB)

Computer Science > Computation and Language

Title:Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators