modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

Chi, Nathan A.; Malchev, Teodor; Kong, Riley; Chi, Ryan A.; Huang, Lucas; Chi, Ethan A.; McCoy, R. Thomas; Radev, Dragomir

Computer Science > Computation and Language

arXiv:2406.17038 (cs)

[Submitted on 24 Jun 2024]

Title:modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

Authors:Nathan A. Chi, Teodor Malchev, Riley Kong, Ryan A. Chi, Lucas Huang, Ethan A. Chi, R. Thomas McCoy, Dragomir Radev

View PDF HTML (experimental)

Abstract:We introduce modeLing, a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems. Solving these puzzles necessitates inferring aspects of a language's grammatical structure from a small number of examples. Such puzzles provide a natural testbed for language models, as they require compositional generalization and few-shot inductive reasoning. Consisting solely of new puzzles written specifically for this work, modeLing has no risk of appearing in the training data of existing AI systems: this ameliorates the risk of data leakage, a potential confounder for many prior evaluations of reasoning. Evaluating several large open source language models and GPT on our benchmark, we observe non-negligible accuracy, demonstrating few-shot emergent reasoning ability which cannot merely be attributed to shallow memorization. However, imperfect model performance suggests that modeLing can be used to measure further progress in linguistic reasoning.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.17038 [cs.CL]
	(or arXiv:2406.17038v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.17038

Submission history

From: Nathan Chi [view email]
[v1] Mon, 24 Jun 2024 18:00:59 UTC (9,365 KB)

Computer Science > Computation and Language

Title:modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators