Learning Multimodal VAEs through Mutual Supervision

Joy, Tom; Shi, Yuge; Torr, Philip H. S.; Rainforth, Tom; Schmon, Sebastian M.; Siddharth, N.

Computer Science > Machine Learning

arXiv:2106.12570 (cs)

[Submitted on 23 Jun 2021 (v1), last revised 16 Dec 2022 (this version, v3)]

Title:Learning Multimodal VAEs through Mutual Supervision

Authors:Tom Joy, Yuge Shi, Philip H.S. Torr, Tom Rainforth, Sebastian M. Schmon, N. Siddharth

View PDF

Abstract:Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing -- something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image-image) and CUB (image-text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.12570 [cs.LG]
	(or arXiv:2106.12570v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.12570

Submission history

From: Thomas Joy [view email]
[v1] Wed, 23 Jun 2021 17:54:35 UTC (22,306 KB)
[v2] Thu, 1 Jul 2021 11:15:52 UTC (11,145 KB)
[v3] Fri, 16 Dec 2022 09:29:56 UTC (19,381 KB)

Computer Science > Machine Learning

Title:Learning Multimodal VAEs through Mutual Supervision

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning Multimodal VAEs through Mutual Supervision

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators