Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Hu, Di; Li, Xuhong; Mou, Lichao; Jin, Pu; Chen, Dong; Jing, Liping; Zhu, Xiaoxiang; Dou, Dejing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2005.08449 (cs)

[Submitted on 18 May 2020 (v1), last revised 16 Jul 2020 (this version, v2)]

Title:Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Authors:Di Hu, Xuhong Li, Lichao Mou, Pu Jin, Dong Chen, Liping Jing, Xiaoxiang Zhu, Dejing Dou

View PDF

Abstract:Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.

Comments:	ECCV 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Cite as:	arXiv:2005.08449 [cs.CV]
	(or arXiv:2005.08449v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2005.08449

Submission history

From: Di Hu [view email]
[v1] Mon, 18 May 2020 04:14:16 UTC (2,476 KB)
[v2] Thu, 16 Jul 2020 03:33:17 UTC (1,970 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators