Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Yang, Wenhan; Gao, Jingdong; Mirzasoleiman, Baharan

Computer Science > Machine Learning

arXiv:2310.05862 (cs)

[Submitted on 5 Oct 2023 (v1), last revised 10 Jun 2024 (this version, v2)]

Title:Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Authors:Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman

View PDF HTML (experimental)

Abstract:Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification and enabled transferability to new domains. However, CLIP is extremely more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP pre-training data is enough to make targeted data poisoning attacks successful. This is four orders of magnitude smaller than what is required to poison supervised models. Despite this vulnerability, existing methods are very limited in defending CLIP models during pre-training. In this work, we propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks. SAFECLIP warms up the model by applying unimodal contrastive learning (CL) on image and text modalities separately. Then, it divides the data into safe and risky sets, by applying a Gaussian Mixture Model to the cosine similarity of image-caption pair representations. SAFECLIP pre-trains the model by applying the CLIP loss to the safe set and applying unimodal CL to image and text modalities of the risky set separately. By gradually increasing the size of the safe set during pre-training, SAFECLIP effectively breaks targeted data poisoning and backdoor attacks without harming the CLIP performance. Our extensive experiments on CC3M, Visual Genome, and MSCOCO demonstrate that SAFECLIP significantly reduces the success rate of targeted data poisoning attacks from 93.75% to 0% and that of various backdoor attacks from up to 100% to 0%, without harming CLIP's performance.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.05862 [cs.LG]
	(or arXiv:2310.05862v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.05862

Submission history

From: Wenhan Yang [view email]
[v1] Thu, 5 Oct 2023 19:42:03 UTC (654 KB)
[v2] Mon, 10 Jun 2024 21:01:11 UTC (1,226 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Machine Learning

Title:Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Machine Learning

Title:Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators