Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Liu, Xuannan; Cui, Xing; Li, Peipei; Li, Zekun; Huang, Huaibo; Xia, Shuhan; Zhang, Miaoxuan; Zou, Yueying; He, Ran

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.09259 (cs)

[Submitted on 14 Nov 2024 (v1), last revised 9 Dec 2024 (this version, v2)]

Title:Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Authors:Xuannan Liu, Xing Cui, Peipei Li, Zekun Li, Huaibo Huang, Shuhan Xia, Miaoxuan Zhang, Yueying Zou, Ran He

View PDF HTML (experimental)

Abstract:The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output. Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models. Additionally, we cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any within generative systems. Finally, we highlight current research challenges and propose potential directions for future research. The open-source repository corresponding to this work can be found at this https URL.

Comments:	ongoing work
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2411.09259 [cs.CV]
	(or arXiv:2411.09259v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.09259

Submission history

From: Xuannan Liu [view email]
[v1] Thu, 14 Nov 2024 07:51:51 UTC (1,427 KB)
[v2] Mon, 9 Dec 2024 14:22:14 UTC (1,404 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators