No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Pouget, Angéline; Beyer, Lucas; Bugliarello, Emanuele; Wang, Xiao; Steiner, Andreas Peter; Zhai, Xiaohua; Alabdulmohsin, Ibrahim

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.13777 (cs)

[Submitted on 22 May 2024 (v1), last revised 23 Oct 2024 (this version, v3)]

Title:No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Authors:Angéline Pouget, Lucas Beyer, Emanuele Bugliarello, Xiao Wang, Andreas Peter Steiner, Xiaohua Zhai, Ibrahim Alabdulmohsin

View PDF HTML (experimental)

Abstract:We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to attention several important findings. First, the common filtering of training data to English image-text pairs disadvantages communities of lower socioeconomic status and negatively impacts cultural understanding. Notably, this performance gap is not captured by - and even at odds with - the currently popular evaluation metrics derived from the Western-centric ImageNet and COCO datasets. Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks. Third, we introduce the task of geo-localization as a novel evaluation metric to assess cultural diversity in VLMs. Our work underscores the value of using diverse data to create more inclusive multimodal systems and lays the groundwork for developing VLMs that better represent global perspectives.

Comments:	17 pages, 5 figures, 4 tables. 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.13777 [cs.CV]
	(or arXiv:2405.13777v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.13777

Submission history

From: Angéline Pouget [view email]
[v1] Wed, 22 May 2024 16:04:22 UTC (5,259 KB)
[v2] Fri, 24 May 2024 14:39:24 UTC (5,259 KB)
[v3] Wed, 23 Oct 2024 21:25:39 UTC (1,575 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators