Benchmarking Large Multimodal Models against Common Corruptions

Zhang, Jiawei; Pang, Tianyu; Du, Chao; Ren, Yi; Li, Bo; Lin, Min

Computer Science > Machine Learning

arXiv:2401.11943 (cs)

[Submitted on 22 Jan 2024]

Title:Benchmarking Large Multimodal Models against Common Corruptions

Authors:Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin

View PDF

Abstract:This technical report aims to fill a deficiency in the assessment of large multimodal models (LMMs) by specifically examining the self-consistency of their outputs when subjected to common corruptions. We investigate the cross-modal interactions between text, image, and speech, encompassing four essential generation tasks: text-to-image, image-to-text, text-to-speech, and speech-to-text. We create a comprehensive benchmark, named MMCBench, that covers more than 100 popular LMMs (totally over 150 model checkpoints). A thorough evaluation under common corruptions is critical for practical deployment and facilitates a better understanding of the reliability of cutting-edge LMMs. The benchmarking code is available at this https URL

Comments:	Technical report
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2401.11943 [cs.LG]
	(or arXiv:2401.11943v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.11943

Submission history

From: Tianyu Pang [view email]
[v1] Mon, 22 Jan 2024 13:33:53 UTC (7,628 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-01

Change to browse by:

cs
cs.CL
cs.CR
cs.CV
cs.MM

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Benchmarking Large Multimodal Models against Common Corruptions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Benchmarking Large Multimodal Models against Common Corruptions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators