Mass-Producing Failures of Multimodal Systems with Language Models

Tong, Shengbang; Jones, Erik; Steinhardt, Jacob

Computer Science > Machine Learning

arXiv:2306.12105v1 (cs)

[Submitted on 21 Jun 2023 (this version), latest version 1 Mar 2024 (v2)]

Title:Mass-Producing Failures of Multimodal Systems with Language Models

Authors:Shengbang Tong, Erik Jones, Jacob Steinhardt

View PDF

Abstract:Deployed multimodal systems can fail in ways that evaluators did not anticipate. In order to find these failures before deployment, we introduce MultiMon, a system that automatically identifies systematic failures -- generalizable, natural-language descriptions of patterns of model failures. To uncover systematic failures, MultiMon scrapes a corpus for examples of erroneous agreement: inputs that produce the same output, but should not. It then prompts a language model (e.g., GPT-4) to find systematic patterns of failure and describe them in natural language. We use MultiMon to find 14 systematic failures (e.g., "ignores quantifiers") of the CLIP text-encoder, each comprising hundreds of distinct inputs (e.g., "a shelf with a few/many books"). Because CLIP is the backbone for most state-of-the-art multimodal systems, these inputs produce failures in Midjourney 5.1, DALL-E, VideoFusion, and others. MultiMon can also steer towards failures relevant to specific use cases, such as self-driving cars. We see MultiMon as a step towards evaluation that autonomously explores the long tail of potential system failures. Code for MULTIMON is available at this https URL.

Comments:	Under Review
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Software Engineering (cs.SE)
Cite as:	arXiv:2306.12105 [cs.LG]
	(or arXiv:2306.12105v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.12105

Submission history

From: Shengbang Tong [view email]
[v1] Wed, 21 Jun 2023 08:43:29 UTC (2,162 KB)
[v2] Fri, 1 Mar 2024 21:28:00 UTC (19,527 KB)

Computer Science > Machine Learning

Title:Mass-Producing Failures of Multimodal Systems with Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Mass-Producing Failures of Multimodal Systems with Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators