Substance or Style: What Does Your Image Embedding Know?

Rashtchian, Cyrus; Herrmann, Charles; Ferng, Chun-Sung; Chakrabarti, Ayan; Krishnan, Dilip; Sun, Deqing; Juan, Da-Cheng; Tomkins, Andrew

Computer Science > Machine Learning

arXiv:2307.05610 (cs)

[Submitted on 10 Jul 2023]

Title:Substance or Style: What Does Your Image Embedding Know?

Authors:Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins

View PDF

Abstract:Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.

Comments:	27 pages, 9 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.05610 [cs.LG]
	(or arXiv:2307.05610v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.05610

Submission history

From: Cyrus Rashtchian [view email]
[v1] Mon, 10 Jul 2023 22:40:10 UTC (37,708 KB)

Computer Science > Machine Learning

Title:Substance or Style: What Does Your Image Embedding Know?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Substance or Style: What Does Your Image Embedding Know?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators