Leveraging Category Information for Single-Frame Visual Sound Source Separation

Zhu, Lingyu; Rahtu, Esa

Computer Science > Computer Vision and Pattern Recognition

arXiv:2007.07984 (cs)

[Submitted on 15 Jul 2020 (v1), last revised 16 Apr 2021 (this version, v2)]

Title:Leveraging Category Information for Single-Frame Visual Sound Source Separation

Authors:Lingyu Zhu, Esa Rahtu

View PDF

Abstract:Visual sound source separation aims at identifying sound components from a given sound mixture with the presence of visual cues. Prior works have demonstrated impressive results, but with the expense of large multi-stage architectures and complex data representations (e.g. optical flow trajectories). In contrast, we study simple yet efficient models for visual sound separation using only a single video frame. Furthermore, our models are able to exploit the information of the sound source category in the separation process. To this end, we propose two models where we assume that i) the category labels are available at the training time, or ii) we know if the training sample pairs are from the same or different category. The experiments with the MUSIC dataset show that our model obtains comparable or better performance compared to several recent baseline methods. The code is available at this https URL

Comments:	6 pages. The code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2007.07984 [cs.CV]
	(or arXiv:2007.07984v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2007.07984

Submission history

From: Lingyu Zhu [view email]
[v1] Wed, 15 Jul 2020 20:35:29 UTC (3,138 KB)
[v2] Fri, 16 Apr 2021 14:30:19 UTC (3,242 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Esa Rahtu

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Category Information for Single-Frame Visual Sound Source Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Category Information for Single-Frame Visual Sound Source Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators