Recurrent Mixture Density Network for Spatiotemporal Visual Attention

Bazzani, Loris; Larochelle, Hugo; Torresani, Lorenzo

Computer Science > Computer Vision and Pattern Recognition

arXiv:1603.08199 (cs)

[Submitted on 27 Mar 2016 (v1), last revised 11 Feb 2017 (this version, v4)]

Title:Recurrent Mixture Density Network for Spatiotemporal Visual Attention

Authors:Loris Bazzani, Hugo Larochelle, Lorenzo Torresani

View PDF

Abstract:In many computer vision tasks, the relevant information to solve the problem at hand is mixed to irrelevant, distracting information. This has motivated researchers to design attentional models that can dynamically focus on parts of images or videos that are salient, e.g., by down-weighting irrelevant pixels. In this work, we propose a spatiotemporal attentional model that learns where to look in a video directly from human fixation data. We model visual attention with a mixture of Gaussians at each frame. This distribution is used to express the probability of saliency for each pixel. Time consistency in videos is modeled hierarchically by: 1) deep 3D convolutional features to represent spatial and short-term time relations and 2) a long short-term memory network on top that aggregates the clip-level representation of sequential clips and therefore expands the temporal domain from few frames to seconds. The parameters of the proposed model are optimized via maximum likelihood estimation using human fixations as training data, without knowledge of the action in each video. Our experiments on Hollywood2 show state-of-the-art performance on saliency prediction for video. We also show that our attentional model trained on Hollywood2 generalizes well to UCF101 and it can be leveraged to improve action classification accuracy on both datasets.

Comments:	ICLR 2017
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1603.08199 [cs.CV]
	(or arXiv:1603.08199v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1603.08199

Submission history

From: Loris Bazzani [view email]
[v1] Sun, 27 Mar 2016 10:34:22 UTC (609 KB)
[v2] Sun, 3 Apr 2016 14:17:51 UTC (620 KB)
[v3] Sun, 15 May 2016 11:55:35 UTC (626 KB)
[v4] Sat, 11 Feb 2017 10:05:06 UTC (776 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Recurrent Mixture Density Network for Spatiotemporal Visual Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Recurrent Mixture Density Network for Spatiotemporal Visual Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators