Revisiting Deep Architectures for Head Motion Prediction in 360{\deg} Videos

Rondon, Miguel Fabian Romero; Sassatelli, Lucile; Pardo, Ramon Aparicio; Precioso, Frederic

Computer Science > Computer Vision and Pattern Recognition

arXiv:1911.11702 (cs)

[Submitted on 26 Nov 2019 (v1), last revised 14 Apr 2021 (this version, v3)]

Title:Revisiting Deep Architectures for Head Motion Prediction in 360° Videos

Authors:Miguel Fabian Romero Rondon, Lucile Sassatelli, Ramon Aparicio Pardo, Frederic Precioso

View PDF

Abstract:We consider predicting the user's head motion in 360-degree videos, with 2 modalities only: the past user's positions and the video content (not knowing other users' traces). We make two main contributions. First, we re-examine existing deep-learning approaches for this problem and identify hidden flaws from a thorough root-cause analysis. Second, from the results of this analysis, we design a new proposal establishing state-of-the-art performance. First, re-assessing the existing methods that use both modalities, we obtain the surprising result that they all perform worse than baselines using the user's trajectory only. A root-cause analysis of the metrics, datasets and neural architectures shows in particular that (i) the content can inform the prediction for horizons longer than 2 to 3 sec. (existing methods consider shorter horizons), and that (ii) to compete with the baselines, it is necessary to have a recurrent unit dedicated to process the positions, but this is not sufficient. Second, from a re-examination of the problem supported with the concept of Structural-RNN, we design a new deep neural architecture, named TRACK. TRACK achieves state-of-the-art performance on all considered datasets and prediction horizons, outperforming competitors by up to 20 percent on focus-type videos and horizons 2-5 seconds. The entire framework (codes and datasets) is online and received an ACM reproducibility badge.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Cite as:	arXiv:1911.11702 [cs.CV]
	(or arXiv:1911.11702v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1911.11702

Submission history

From: Miguel Fabian Romero Rondon [view email]
[v1] Tue, 26 Nov 2019 17:13:00 UTC (749 KB)
[v2] Wed, 12 Feb 2020 14:07:32 UTC (726 KB)
[v3] Wed, 14 Apr 2021 16:13:35 UTC (8,372 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Revisiting Deep Architectures for Head Motion Prediction in 360° Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Revisiting Deep Architectures for Head Motion Prediction in 360° Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators