Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Zhang, Jie; Wang, Zhongqi; Lei, Mengqi; Yuan, Zheng; Yan, Bei; Shan, Shiguang; Chen, Xilin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.18849 (cs)

[Submitted on 27 Jun 2024 (v1), last revised 26 Jul 2024 (this version, v2)]

Title:Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Authors:Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang Shan, Xilin Chen

View PDF HTML (experimental)

Abstract:Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and noisy scenarios unexplored. In response to these challenges, we propose a dynamic and scalable benchmark named Dysca for evaluating LVLMs by leveraging synthesis images. Specifically, we leverage Stable Diffusion and design a rule-based method to dynamically generate novel images, questions and the corresponding answers. We consider 51 kinds of image styles and evaluate the perception capability in 20 subtasks. Moreover, we conduct evaluations under 4 scenarios (i.e., Clean, Corruption, Print Attacking and Adversarial Attacking) and 3 question types (i.e., Multi-choices, True-or-false and Free-form). Thanks to the generative paradigm, Dysca serves as a scalable benchmark for easily adding new subtasks and scenarios. A total of 8 advanced open-source LVLMs with 10 checkpoints are evaluated on Dysca, revealing the drawbacks of current LVLMs. The benchmark is released in \url{this https URL}.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.18849 [cs.CV]
	(or arXiv:2406.18849v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.18849

Submission history

From: Zhongqi Wang [view email]
[v1] Thu, 27 Jun 2024 02:40:35 UTC (19,928 KB)
[v2] Fri, 26 Jul 2024 03:18:35 UTC (19,939 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators