Efficient LLM-Jailbreaking by Introducing Visual Modality

Niu, Zhenxing; Sun, Yuyao; Ren, Haodong; Ji, Haoxuan; Wang, Quan; Ma, Xiaoke; Hua, Gang; Jin, Rong

Computer Science > Artificial Intelligence

arXiv:2405.20015 (cs)

[Submitted on 30 May 2024]

Title:Efficient LLM-Jailbreaking by Introducing Visual Modality

Authors:Zhenxing Niu, Yuyao Sun, Haodong Ren, Haoxuan Ji, Quan Wang, Xiaoke Ma, Gang Hua, Rong Jin

View PDF HTML (experimental)

Abstract:This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an efficient MLLM-jailbreak to generate jailbreaking embeddings embJS. Finally, we convert the embJS into text space to facilitate the jailbreaking of the target LLM. Compared to direct LLM-jailbreaking, our approach is more efficient, as MLLMs are more vulnerable to jailbreaking than pure LLM. Additionally, to improve the attack success rate (ASR) of jailbreaking, we propose an image-text semantic matching scheme to identify a suitable initial input. Extensive experiments demonstrate that our approach surpasses current state-of-the-art methods in terms of both efficiency and effectiveness. Moreover, our approach exhibits superior cross-class jailbreaking capabilities.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2405.20015 [cs.AI]
	(or arXiv:2405.20015v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2405.20015

Submission history

From: Zhenxing Niu [view email]
[v1] Thu, 30 May 2024 12:50:32 UTC (1,787 KB)

Computer Science > Artificial Intelligence

Title:Efficient LLM-Jailbreaking by Introducing Visual Modality

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Efficient LLM-Jailbreaking by Introducing Visual Modality

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators