Correlation Net : spatio temporal multimodal deep learning

Yudistira, Novanto; Kurita, Takio

Computer Science > Computer Vision and Pattern Recognition

arXiv:1807.08291v4 (cs)

[Submitted on 22 Jul 2018 (v1), revised 20 Mar 2019 (this version, v4), latest version 16 Dec 2019 (v6)]

Title:Correlation Net : spatio temporal multimodal deep learning

Authors:Novanto Yudistira, Takio Kurita

View PDF

Abstract:This letter describes a network that is able to capture spatiotemporal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over spatiotemporal regions. Recently, multimodal fusion has been extensively researched in deep learning. For action recognition, the spatial and temporal streams are vital components of deep Convolutional Neural Network (CNNs), but reducing the occurrence of overfitting and fusing these two streams remain open problems. The existing fusion approach is to average the two streams. To this end, we propose a correlation network with a Shannon fusion to learn a CNN that has already been trained. Long-range video may consist of spatiotemporal correlation over arbitrary times. This correlation can be captured using simple fully connected layers to form the correlation network. This is found to be complementary to the existing network fusion methods. We evaluate our approach on the UCF-101 and HMDB-51 datasets, and the resulting improvement in accuracy demonstrates the importance of multimodal correlation.

Comments:	under review Pattern Recognition Letters
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1807.08291 [cs.CV]
	(or arXiv:1807.08291v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1807.08291

Submission history

From: Novanto Yudistira [view email]
[v1] Sun, 22 Jul 2018 14:48:32 UTC (718 KB)
[v2] Sat, 6 Oct 2018 06:59:46 UTC (354 KB)
[v3] Thu, 27 Dec 2018 05:01:31 UTC (910 KB)
[v4] Wed, 20 Mar 2019 10:28:43 UTC (313 KB)
[v5] Thu, 9 May 2019 01:40:45 UTC (830 KB)
[v6] Mon, 16 Dec 2019 06:57:10 UTC (1,424 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Correlation Net : spatio temporal multimodal deep learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Correlation Net : spatio temporal multimodal deep learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators