ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models

Wijesiriwardene, Thilini; Wickramarachchi, Ruwan; Gajera, Bimal G.; Gowaikar, Shreeyash Mukul; Gupta, Chandan; Chadha, Aman; Reganti, Aishwarya Naresh; Sheth, Amit; Das, Amitava

Computer Science > Computation and Language

arXiv:2305.05050 (cs)

[Submitted on 8 May 2023 (v1), last revised 25 May 2023 (this version, v3)]

Title:ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models

Authors:Thilini Wijesiriwardene, Ruwan Wickramarachchi, Bimal G. Gajera, Shreeyash Mukul Gowaikar, Chandan Gupta, Aman Chadha, Aishwarya Naresh Reganti, Amit Sheth, Amitava Das

View PDF

Abstract:Over the past decade, analogies, in the form of word-level analogies, have played a significant role as an intrinsic measure of evaluating the quality of word embedding methods such as word2vec. Modern large language models (LLMs), however, are primarily evaluated on extrinsic measures based on benchmarks such as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs can draw analogies between long texts. In this paper, we present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text with six levels of complexity -- (i) word, (ii) word vs. sentence, (iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using thirteen datasets and three different distance measures, we evaluate the abilities of eight LLMs in identifying analogical pairs in the semantic vector space. Our evaluation finds that it is increasingly challenging for LLMs to identify analogies when going up the analogy taxonomy.

Comments:	Accepted as a long paper at Findings of ACL 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.05050 [cs.CL]
	(or arXiv:2305.05050v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.05050

Submission history

From: Thilini Wijesiriwardene [view email]
[v1] Mon, 8 May 2023 21:12:20 UTC (9,325 KB)
[v2] Sun, 14 May 2023 16:40:09 UTC (9,325 KB)
[v3] Thu, 25 May 2023 20:38:17 UTC (9,325 KB)

Computer Science > Computation and Language

Title:ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators