SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Li, Bohao; Wang, Rui; Wang, Guangzhi; Ge, Yuying; Ge, Yixiao; Shan, Ying

Computer Science > Computation and Language

arXiv:2307.16125 (cs)

[Submitted on 30 Jul 2023 (v1), last revised 2 Aug 2023 (this version, v2)]

Title:SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Authors:Bohao Li, Rui Wang, Guangzhi Wang, Yuying Ge, Yixiao Ge, Ying Shan

View PDF

Abstract:Based on powerful Large Language Models (LLMs), recent generative Multimodal Large Language Models (MLLMs) have gained prominence as a pivotal research area, exhibiting remarkable capability for both comprehension and generation. In this work, we address the evaluation of generative comprehension in MLLMs as a preliminary step towards a comprehensive assessment of generative models, by introducing a benchmark named SEED-Bench. SEED-Bench consists of 19K multiple choice questions with accurate human annotations (x 6 larger than existing benchmarks), which spans 12 evaluation dimensions including the comprehension of both the image and video modality. We develop an advanced pipeline for generating multiple-choice questions that target specific evaluation dimensions, integrating both automatic filtering and manual verification processes. Multiple-choice questions with groundtruth options derived from human annotation enables an objective and efficient assessment of model performance, eliminating the need for human or GPT intervention during evaluation. We further evaluate the performance of 18 models across all 12 dimensions, covering both the spatial and temporal understanding. By revealing the limitations of existing MLLMs through evaluation results, we aim for SEED-Bench to provide insights for motivating future research. We will launch and consistently maintain a leaderboard to provide a platform for the community to assess and investigate model capability.

Comments:	Technical Report; Project released at: this https URL
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.16125 [cs.CL]
	(or arXiv:2307.16125v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2307.16125

Submission history

From: Bohao Li [view email]
[v1] Sun, 30 Jul 2023 04:25:16 UTC (6,535 KB)
[v2] Wed, 2 Aug 2023 08:02:35 UTC (6,539 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators