Self-supervised learning by cross-modal audio-video clustering
Abstract
Supplementary Material
- Download
- 1.46 MB
References
Recommendations
SCLAV: Supervised Cross-modal Contrastive Learning for Audio-Visual Coding
MM '23: Proceedings of the 31st ACM International Conference on MultimediaAudio and vision are important senses for high-level cognition, and their special strong correlation makes audio-visual coding a crucial factor in many multimodal tasks. However, there are two challenges in audio-visual coding. First, the heterogeneity ...
Self-Supervised Correlation Learning for Cross-Modal Retrieval
Cross-modal retrieval aims to retrieve relevant data from another modality when given a query of one modality. Although most existing methods that rely on the label information of multimedia data have achieved promising results, the performance benefiting ...
Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
Computer Vision – ECCV 2022AbstractThis paper focuses on the weakly-supervised audio-visual video parsing task, which aims to recognize all events belonging to each modality and localize their temporal boundaries. This task is challenging because only overall labels indicating the ...
Comments
Information & Contributors
Information
Published In
- Editors:
- H. Larochelle,
- M. Ranzato,
- R. Hadsell,
- M.F. Balcan,
- H. Lin
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 36Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)1
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in