Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data

Popp, Niclas; Metzen, Jan Hendrik; Hein, Matthias

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.16637 (cs)

[Submitted on 25 Apr 2024]

Title:Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data

Authors:Niclas Popp, Jan Hendrik Metzen, Matthias Hein

View PDF HTML (experimental)

Abstract:Multi-modal foundation models such as CLIP have showcased impressive zero-shot capabilities. However, their applicability in resource-constrained environments is limited due to their large number of parameters and high inference time. While existing approaches have scaled down the entire CLIP architecture, we focus on training smaller variants of the image encoder, which suffices for efficient zero-shot classification. The use of synthetic data has shown promise in distilling representations from larger teachers, resulting in strong few-shot and linear probe performance. However, we find that this approach surprisingly fails in true zero-shot settings when using contrastive losses. We identify the exploitation of spurious features as being responsible for poor generalization between synthetic and real data. However, by using the image feature-based L2 distillation loss, we mitigate these problems and train students that achieve zero-shot performance which on four domain-specific datasets is on-par with a ViT-B/32 teacher model trained on DataCompXL, while featuring up to 92% fewer parameters.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.16637 [cs.CV]
	(or arXiv:2404.16637v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.16637

Submission history

From: Niclas Popp [view email]
[v1] Thu, 25 Apr 2024 14:24:41 UTC (599 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators