Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3304113.3326118acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article
Public Access

Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks

Published: 18 June 2019 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we propose attention-based neural encoder-decoder networks for predicting user Field-of-View (FoV) in 360-degree videos. Our proposed prediction methods are based on the attention mechanism that learns the weighted prediction power of historical FoV time series through end-to-end training. Attention-based neural encoder-decoder networks do not involve recursion, thus can be highly parallelized during training. Using publicly available 360-degree head movement datasets, we demonstrate that our FoV prediction models outperform the state-of-art FoV prediction models, achieving lower prediction error, higher training throughput, and faster convergence. Better FoV prediction leads to reduced bandwidth consumption, better video quality, and improved user quality of experience.

    References

    [1]
    Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [2]
    Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR abs/1607.06450 (2016). arXiv:1607.06450
    [3]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473 (2014). arXiv:1409.0473
    [4]
    Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara Ah Ramli, and Xin Liu. 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In IEEE Big Data 2016. 1161--1170.
    [5]
    Y. Bengio, P. Simard, and P. Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (March 1994), 157--166.
    [6]
    Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP.
    [7]
    Xavier Corbillon, Francesca De Simone, and Gwendal Simon. 2017. 360Degreee Video Head Movement Dataset. In Proceedings of the 8th ACM on Multimedia Systems Conference. New York, NY, USA, 199--204.
    [8]
    S. Liu S. Srinivasan F. Duanmu, Y. Mao and Y. Wang. 2018. A Subjective Study of Viewer Navigation Behaviors When Watching 360-degree Videos on Computers. Proc. of IEEE International Conference on Multimedia Expo (ICME), San Diego, California, USA, 2018.
    [9]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [10]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997), 1735--1780.
    [11]
    Huawei. 2016. Whitepaper on the VR-Oriented Bearer Network Requirement. http://www-file.huawei.com/-/media/CORPORATE/PDF/white20paper/whitepaper-on-the-vr-oriented-bearer-network-requirement.pdf
    [12]
    Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems. 3104--3112.
    [13]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. https://arxiv.org/pdf/1706.03762.pdf
    [14]
    Chenglei Wu, Zhihao Tan, Zhi Wang, and Shiqiang Yang. 2017. A Dataset for Exploring User Behaviors in VR Spherical Video Streaming. In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSys'17). ACM, New York, NY, USA, 193--198.
    [15]
    Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze Prediction in Dynamic 360Âř Immersive Videos. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [16]
    Jianzhong Zhang Madhukar Budagavi Xueshi Hou, Sujit Dey. 2018. Predictive View Generation to Enable Mobile 360-degree and VR Experiences. ACM SIGCOMM VR/AR workshop (2018).
    [17]
    Zhimin Xu Xinggong Zhang Zongming Guo Yixuan Ban, Lan Xie. 2018. CUB360: Exploiting Cross-Users Behaviors for Viewport Prediction in 360 Video Adaptive Streaming. IEEE ICME 2018, San Diego, California, USA.

    Cited By

    View all
    • (2024)Compass: A Prefetching Framework with Viewport Patching for 360° Video StreamingProceedings of the 2024 SIGCOMM Workshop on Emerging Multimedia Systems10.1145/3672196.3673396(45-51)Online publication date: 4-Aug-2024
    • (2023)Enhancing 360 Video Streaming through Salient Content in Head-Mounted DisplaysSensors10.3390/s2308401623:8(4016)Online publication date: 15-Apr-2023
    • (2023)Seer: Learning-Based $360^{\circ }$ Video Streaming for MEC-Equipped Cellular NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.3257403(1-14)Online publication date: 2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMVE '19: Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems
    June 2019
    53 pages
    ISBN:9781450362993
    DOI:10.1145/3304113
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 360 degree videos
    2. attention
    3. encoder decoder networks
    4. field of view prediction
    5. neural networks

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MMSys '19
    Sponsor:
    MMSys '19: 10th ACM Multimedia Systems Conference
    June 18, 2019
    Massachusetts, Amherst

    Acceptance Rates

    MMVE '19 Paper Acceptance Rate 9 of 18 submissions, 50%;
    Overall Acceptance Rate 26 of 44 submissions, 59%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)86
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Compass: A Prefetching Framework with Viewport Patching for 360° Video StreamingProceedings of the 2024 SIGCOMM Workshop on Emerging Multimedia Systems10.1145/3672196.3673396(45-51)Online publication date: 4-Aug-2024
    • (2023)Enhancing 360 Video Streaming through Salient Content in Head-Mounted DisplaysSensors10.3390/s2308401623:8(4016)Online publication date: 15-Apr-2023
    • (2023)Seer: Learning-Based $360^{\circ }$ Video Streaming for MEC-Equipped Cellular NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.3257403(1-14)Online publication date: 2023
    • (2023)A self-attention model for viewport prediction based on distance constraintThe Visual Computer10.1007/s00371-023-03149-640:9(5997-6014)Online publication date: 28-Nov-2023
    • (2022)Deep variational learning for multiple trajectory prediction of 360° head movementsProceedings of the 13th ACM Multimedia Systems Conference10.1145/3524273.3528176(12-26)Online publication date: 14-Jun-2022
    • (2022)Dynamic Viewport Selection-Based Prioritized Bitrate Adaptation for Tile-Based 360° Video StreamingIEEE Access10.1109/ACCESS.2022.315733910(29377-29392)Online publication date: 2022
    • (2021)LiveROIProceedings of the 12th ACM Multimedia Systems Conference10.1145/3458305.3463378(132-145)Online publication date: 24-Jun-2021
    • (2020)Viewing the 360° Future: Trade-Off Between User Field-of-View Prediction, Network Bandwidth, and Delay2020 29th International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN49398.2020.9209659(1-11)Online publication date: Aug-2020
    • (2020)A Survey on Adaptive 360° Video Streaming: Solutions, Challenges and OpportunitiesIEEE Communications Surveys & Tutorials10.1109/COMST.2020.300699922:4(2801-2838)Online publication date: Dec-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media