research-article

Public Access

Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks

Authors:

Yong LiuAuthors Info & Claims

MMVE '19: Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems

Pages 37 - 42

https://doi.org/10.1145/3304113.3326118

Published: 18 June 2019 Publication History

Abstract

In this paper, we propose attention-based neural encoder-decoder networks for predicting user Field-of-View (FoV) in 360-degree videos. Our proposed prediction methods are based on the attention mechanism that learns the weighted prediction power of historical FoV time series through end-to-end training. Attention-based neural encoder-decoder networks do not involve recursion, thus can be highly parallelized during training. Using publicly available 360-degree head movement datasets, we demonstrate that our FoV prediction models outperform the state-of-art FoV prediction models, achieving lower prediction error, higher training throughput, and faster convergence. Better FoV prediction leads to reduced bandwidth consumption, better video quality, and improved user quality of experience.

References

[1]

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]

Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR abs/1607.06450 (2016). arXiv:1607.06450

[3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473 (2014). arXiv:1409.0473

[4]

Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara Ah Ramli, and Xin Liu. 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In IEEE Big Data 2016. 1161--1170.

[5]

Y. Bengio, P. Simard, and P. Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (March 1994), 157--166.

Digital Library

[6]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP.

[7]

Xavier Corbillon, Francesca De Simone, and Gwendal Simon. 2017. 360Degreee Video Head Movement Dataset. In Proceedings of the 8th ACM on Multimedia Systems Conference. New York, NY, USA, 199--204.

Digital Library

[8]

S. Liu S. Srinivasan F. Duanmu, Y. Mao and Y. Wang. 2018. A Subjective Study of Viewer Navigation Behaviors When Watching 360-degree Videos on Computers. Proc. of IEEE International Conference on Multimedia Expo (ICME), San Diego, California, USA, 2018.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997), 1735--1780.

Digital Library

[11]

Huawei. 2016. Whitepaper on the VR-Oriented Bearer Network Requirement. http://www-file.huawei.com/-/media/CORPORATE/PDF/white20paper/whitepaper-on-the-vr-oriented-bearer-network-requirement.pdf

[12]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems. 3104--3112.

Digital Library

[13]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. https://arxiv.org/pdf/1706.03762.pdf

Digital Library

[14]

Chenglei Wu, Zhihao Tan, Zhi Wang, and Shiqiang Yang. 2017. A Dataset for Exploring User Behaviors in VR Spherical Video Streaming. In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSys'17). ACM, New York, NY, USA, 193--198.

Digital Library

[15]

Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze Prediction in Dynamic 360Âř Immersive Videos. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]

Jianzhong Zhang Madhukar Budagavi Xueshi Hou, Sujit Dey. 2018. Predictive View Generation to Enable Mobile 360-degree and VR Experiences. ACM SIGCOMM VR/AR workshop (2018).

Digital Library

[17]

Zhimin Xu Xinggong Zhang Zongming Guo Yixuan Ban, Lan Xie. 2018. CUB360: Exploiting Cross-Users Behaviors for Viewport Prediction in 360 Video Adaptive Streaming. IEEE ICME 2018, San Diego, California, USA.

Cited By

Su WLi YChen HMa Z(2024)Compass: A Prefetching Framework with Viewport Patching for 360° Video StreamingProceedings of the 2024 SIGCOMM Workshop on Emerging Multimedia Systems10.1145/3672196.3673396(45-51)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3672196.3673396
Nguyen AYan Z(2023)Enhancing 360 Video Streaming through Salient Content in Head-Mounted DisplaysSensors10.3390/s2308401623:8(4016)Online publication date: 15-Apr-2023
https://doi.org/10.3390/s23084016
Kumar SFranklin A AJin JDong Y(2023)Seer: Learning-Based $360^{\circ }$ Video Streaming for MEC-Equipped Cellular NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.3257403(1-14)Online publication date: 2023
https://doi.org/10.1109/TNSE.2023.3257403
Show More Cited By

Index Terms

Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction
Highlights
- State-of-the-art attention-based RNN is developed for dissolved oxygen prediction.
Abstract
Accurate prediction of dissolved oxygen is important for the intelligent management and control in aquaculture. However, due to the interference of external factors and the irregularity of its own changes, it is still a difficult ...
Vessel Traffic Flow Prediction Using LSTM Encoder-Decoder
SPML '22: Proceedings of the 2022 5th International Conference on Signal Processing and Machine Learning

Accurate vessel traffic flow prediction is of vital indispensable for the enhancement of capability of navigation, the optimal allocation of port resources and the improvement of the navigation safety. In order to improve the accuracy of vessel traffic ...
Neural methods for dynamic branch prediction

This article presents a new and highly accurate method for branch prediction. The key idea is to use one of the simplest possible neural methods, the perceptron, as an alternative to the commonly used two-bit counters. The source of our predictor's ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMVE '19: Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems

June 2019

53 pages

ISBN:9781450362993

DOI:10.1145/3304113

Conference Chairs:
Mario Montagud
i2CAT Foundation & University of Valencia, Spain
,
Francesca De Simone
CWI, Netherlands
,
General Chair:
Niall Murray
Athlone Institute of Technology, Ireland

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

MMSys '19

Sponsor:

SIGMM

MMSys '19: 10th ACM Multimedia Systems Conference

June 18, 2019

Massachusetts, Amherst

Acceptance Rates

MMVE '19 Paper Acceptance Rate 9 of 18 submissions, 50%;

Overall Acceptance Rate 26 of 44 submissions, 59%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
548
Total Downloads

Downloads (Last 12 months)86
Downloads (Last 6 weeks)15

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Su WLi YChen HMa Z(2024)Compass: A Prefetching Framework with Viewport Patching for 360° Video StreamingProceedings of the 2024 SIGCOMM Workshop on Emerging Multimedia Systems10.1145/3672196.3673396(45-51)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3672196.3673396
Nguyen AYan Z(2023)Enhancing 360 Video Streaming through Salient Content in Head-Mounted DisplaysSensors10.3390/s2308401623:8(4016)Online publication date: 15-Apr-2023
https://doi.org/10.3390/s23084016
Kumar SFranklin A AJin JDong Y(2023)Seer: Learning-Based $360^{\circ }$ Video Streaming for MEC-Equipped Cellular NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.3257403(1-14)Online publication date: 2023
https://doi.org/10.1109/TNSE.2023.3257403
Lan CQiu XMiao CZheng M(2023)A self-attention model for viewport prediction based on distance constraintThe Visual Computer10.1007/s00371-023-03149-640:9(5997-6014)Online publication date: 28-Nov-2023
https://doi.org/10.1007/s00371-023-03149-6
Guimard QSassatelli LMarchetti FBecattini FSeidenari LBimbo AMurray NSimon GFarias MViola IMontagud M(2022)Deep variational learning for multiple trajectory prediction of 360° head movementsProceedings of the 13th ACM Multimedia Systems Conference10.1145/3524273.3528176(12-26)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3524273.3528176
Yaqoob ATogou MMuntean G(2022)Dynamic Viewport Selection-Based Prioritized Bitrate Adaptation for Tile-Based 360° Video StreamingIEEE Access10.1109/ACCESS.2022.315733910(29377-29392)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3157339
Feng XLi WWei SAlay ÖHsu CBegen A(2021)LiveROIProceedings of the 12th ACM Multimedia Systems Conference10.1145/3458305.3463378(132-145)Online publication date: 24-Jun-2021
https://dl.acm.org/doi/10.1145/3458305.3463378
Afzal SChen JRamakrishnan K(2020)Viewing the 360° Future: Trade-Off Between User Field-of-View Prediction, Network Bandwidth, and Delay2020 29th International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN49398.2020.9209659(1-11)Online publication date: Aug-2020
https://doi.org/10.1109/ICCCN49398.2020.9209659
Yaqoob ABi TMuntean G(2020)A Survey on Adaptive 360° Video Streaming: Solutions, Challenges and OpportunitiesIEEE Communications Surveys & Tutorials10.1109/COMST.2020.300699922:4(2801-2838)Online publication date: Dec-2021
https://doi.org/10.1109/COMST.2020.3006999

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents