Contrastive Corpus Attribution for Explaining Representations

Lin, Chris; Chen, Hugh; Kim, Chanwoo; Lee, Su-In

Computer Science > Machine Learning

arXiv:2210.00107 (cs)

[Submitted on 30 Sep 2022 (v1), last revised 12 Jun 2023 (this version, v2)]

Title:Contrastive Corpus Attribution for Explaining Representations

Authors:Chris Lin, Hugh Chen, Chanwoo Kim, Su-In Lee

View PDF

Abstract:Despite the widespread use of unsupervised models, very few methods are designed to explain them. Most explanation methods explain a scalar model output. However, unsupervised models output representation vectors, the elements of which are not good candidates to explain because they lack semantic meaning. To bridge this gap, recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained (i.e., an explicand). Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand's representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. We demonstrate that contrastive corpus similarity is compatible with many post-hoc feature attribution methods to generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that features important to the corpus are identified. We showcase the utility of COCOA in two ways: (i) we draw insights by explaining augmentations of the same image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot object localization by explaining the similarity of image representations to jointly learned text representations (CLIP).

Comments:	Updated for the final camera-ready version of ICLR 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2210.00107 [cs.LG]
	(or arXiv:2210.00107v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.00107

Submission history

From: Chris Lin [view email]
[v1] Fri, 30 Sep 2022 21:59:10 UTC (8,486 KB)
[v2] Mon, 12 Jun 2023 22:23:56 UTC (8,486 KB)

Computer Science > Machine Learning

Title:Contrastive Corpus Attribution for Explaining Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Contrastive Corpus Attribution for Explaining Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators