TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Yüksel, Arda; Köksal, Abdullatif; Şenel, Lütfi Kerem; Korhonen, Anna; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:2407.12402 (cs)

[Submitted on 17 Jul 2024 (v1), last revised 3 Oct 2024 (this version, v2)]

Title:TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Authors:Arda Yüksel, Abdullatif Köksal, Lütfi Kerem Şenel, Anna Korhonen, Hinrich Schütze

View PDF HTML (experimental)

Abstract:Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs' understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models. We provide an extensive evaluation, including zero-shot and few-shot evaluation of LLMs, chain-of-thought reasoning, and question difficulty analysis along with model performance. We provide an in-depth analysis of the Turkish capabilities and limitations of current LLMs to provide insights for future LLMs for the Turkish language. We publicly release our code for the dataset and evaluation: this https URL.

Comments:	EMNLP 2024 - Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.12402 [cs.CL]
	(or arXiv:2407.12402v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.12402

Submission history

From: Abdullatif Köksal [view email]
[v1] Wed, 17 Jul 2024 08:28:55 UTC (405 KB)
[v2] Thu, 3 Oct 2024 15:45:52 UTC (447 KB)

Computer Science > Computation and Language

Title:TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators