The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Li, Jiajia; Yang, Lu; Tang, Mingni; Chen, Cong; Li, Zuchao; Wang, Ping; Zhao, Hai

Computer Science > Sound

arXiv:2406.15885 (cs)

[Submitted on 22 Jun 2024]

Title:The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Authors:Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

View PDF HTML (experimental)

Abstract:Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs. ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs' performance in the domain of music. Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities. With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs' music-related abilities. The dataset is available at GitHub\footnote{this https URL} and HuggingFace\footnote{this https URL}.

Comments:	Accepted to ACL-Findings 2024
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2406.15885 [cs.SD]
	(or arXiv:2406.15885v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2406.15885

Submission history

From: Lu Yang [view email]
[v1] Sat, 22 Jun 2024 16:24:42 UTC (8,968 KB)

Computer Science > Sound

Title:The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators