research-article

Mobile Volumetric Video Streaming System through Implicit Neural Representation

Authors:

Fangxin WangAuthors Info & Claims

EMS '23: Proceedings of the 2023 Workshop on Emerging Multimedia Systems

Pages 1 - 7

https://doi.org/10.1145/3609395.3610593

Published: 26 September 2023 Publication History

Abstract

Volumetric video (VV) emerges as a new video paradigm with six degree-of-freedom (DoF) immersive viewing experience. Most existing VV systems focus on the point cloud (PtCl)-based architecture, which is however far from effective due to the huge video size, unrealistic color variations, and specialized player platform requirement. The recent advance of implicit neural representations (INR) such as NeRF brings great opportunities to VV given its potential in creating photorealistic 3D appearances and lighting consistency. However, there still exist arduous challenges in many aspects such as model training, display rendering, streaming optimization, and system implementation. To address the above challenges, we develop NeRVo, an INR-based VV representation for mobile VV. NeRVo improves the training and rendering speed over 300x and 1000x with photorealism, mobile compatibility, and desirable datarates compared to NeRF. We adopt NeRVo as a building block, design and implement a holistic INR-enhanced VV streaming system VoINR.

References

[1]

8i voxelized full bodies-a voxelized point cloud dataset. http://plenodb.jpeg.org/pc/8ilabs/.

[2]

Blender.

[3]

Jpeg pleno database:microsoft voxelized upper bodies-a voxelized point cloud dataset. https://plenodb.jpeg.org/pc/microsoft.

[4]

Lepcc. https://github.com/Esri/lepcc.

[5]

Point cloud library. http://pointclouds.org.

[6]

Point cloud visualizer addon. https://www.blendermarket.com/products/pcv.

[7]

Stop motion obj addon. https://github.com/neverhood311/Stop-motion-OBJ.

[8]

Draco 3d. https://google.github.io/draco, 2018.

[9]

R. Adhikari and R. K. Agrawal. An introductory study on time series modeling and forecasting. arXiv preprint arXiv:1302.6613, 2013.

[10]

Z. Chen, T. Funkhouser, P. Hedman, and A. Tagliasacchi. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In Proceedings of the IEEE/CVF CVPR, pages 16569--16578, 2023.

[11]

L. De Luigi, A. Cardace, R. Spezialetti, P. Z. Ramirez, S. Salti, and L. Di Stefano. Deep learning on implicit neural representations of shapes. arXiv preprint arXiv:2302.05438, 2023.

[12]

A. Hamdi, B. Ghanem, and M. Nießner. Sparf: Large-scale learning of 3d sparse radiance fields from few input images. arXiv preprint arXiv:2212.09100, 2022.

[13]

B. Han, Y. Liu, and F. Qian. Vivo: Visibility-aware mobile volumetric video streaming. In Proceedings of the 26th annual international conference on mobile computing and networking, pages 1--13, 2020.

Digital Library

[14]

P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec. Baking neural radiance fields for real-time view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5875--5884, 2021.

[15]

S.-M. Hu, D. Liang, G.-Y. Yang, G.-W. Yang, and W.-Y. Zhou. Jittor: a novel deep learning framework with meta-operators and unified graph execution. Science China Information Sciences, pages 1--21, 2020.

[16]

Y. Jin, J. Liu, and F. Wang. Ebublio: Edge assisted multi-user 360-degree video streaming. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pages 600--601. IEEE, 2022.

[17]

K. Lee, J. Yi, Y. Lee, S. Choi, and Y. M. Kim. Groot: a real-time streaming system of high-fidelity volumetric videos. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.

Digital Library

[18]

J. Li, C. Zhang, Z. Liu, R. Hong, and H. Hu. Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Transactions on Multimedia, 2022.

Digital Library

[19]

Y. Liu, B. Han, F. Qian, A. Narayanan, and Z.-L. Zhang. Vues: practical mobile volumetric video streaming through multiview transcoding. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, pages 514--527, 2022.

Digital Library

[20]

S. Lombardi, T. Simon, J. Saragih, G. Schwartz, A. Lehrmann, and Y. Sheikh. Neural volumes: Learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751, 2019.

[21]

H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with pensieve. In Proceedings of the conference of the ACM special interest group on data communication, pages 197--210, 2017.

Digital Library

[22]

B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1--14, 2019.

[23]

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405--421. Springer, 2020.

Digital Library

[24]

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928--1937. PMLR, 2016.

Digital Library

[25]

T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1--15, 2022.

[26]

J. Obert, J. van Waveren, and G. Sellers. Virtual texturing in software and hardware. In ACM SIGGRAPH 2012 Courses, pages 1--29. 2012.

Digital Library

[27]

J. Park, P. A. Chou, and J.-N. Hwang. Rate-utility optimized streaming of volumetric media for augmented reality. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(1):149--162, 2019.

[28]

I. Reimat, E. Alexiou, J. Jansen, I. Viola, S. Subramanyam, and P. Cesar. Cwipc-sxr: Point cloud dynamic human dataset for social xr. In Proceedings of the 12th ACM Multimedia Systems Conference, pages 300--306, 2021.

Digital Library

[29]

A. Rosinol, J. J. Leonard, and L. Carlone. Nerf-slam: Real-time dense monocular slam with neural radiance fields. arXiv preprint arXiv:2210.13641, 2022.

[30]

T. Takikawa, A. Evans, J. Tremblay, T. Müller, M. McGuire, A. Jacobson, and S. Fidler. Variable bitrate neural fields. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1--9, 2022.

Digital Library

[31]

T. Takikawa, A. Evans, J. Tremblay, T. Müller, M. McGuire, A. Jacobson, and S. Fidler. Variable bitrate neural fields. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1--9, 2022.

Digital Library

[32]

H. Yeo, C. J. Chong, Y. Jung, J. Ye, and D. Han. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.

Digital Library

[33]

A. Zhang, C. Wang, B. Han, and F. Qian. Yuzu: Neural-enhanced volumetric video streaming. In 19th USENIX Symposium on Networked Systems Design and Implementation, pages 137--154, 2022.

[34]

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586--595, 2018.

[35]

Z. Zhu, S. Peng, V. Larsson, Z. Cui, M. R. Oswald, A. Geiger, and M. Pollefeys. Nicer-slam: Neural implicit scene encoding for rgb slam. arXiv preprint arXiv:2302.03594, 2023.

Cited By

Index Terms

Mobile Volumetric Video Streaming System through Implicit Neural Representation
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Mixed / augmented reality
  2. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools
2. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

Hermes: Leveraging Implicit Inter-Frame Correlation for Bandwidth-Efficient Mobile Volumetric Video Streaming
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Volumetric videos offer viewers more immersive experiences, enabling a variety of applications. However, state-of-the-art streaming systems still need hundreds of Mbps, exceeding the common bandwidth capabilities of mobile devices. We find a research gap ...
Vues: practical mobile volumetric video streaming through multiview transcoding
MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

The emerging volumetric videos offer a fully immersive, six degrees of freedom (6DoF) viewing experience, at the cost of extremely high bandwidth demand. In this paper, we design, implement, and evaluate Vues, an edge-assisted transcoding system that ...
Toward Next-generation Volumetric Video Streaming with Neural-based Content Representations
ImmerCom '23: Proceedings of the 1st ACM Workshop on Mobile Immersive Computing, Networking, and Systems

Striking a balance between minimizing bandwidth consumption and maintaining high visual quality stands as the paramount objective in volumetric content delivery. However, achieving this ambitious target is a substantial challenge, especially for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EMS '23: Proceedings of the 2023 Workshop on Emerging Multimedia Systems

September 2023

65 pages

ISBN:9798400703034

DOI:10.1145/3609395

Program Chairs:
Mallesham Dasari
Carnegie Mellon University
,
Junchen Jiang
University of Chicago
,
Maria Gorlatova
Duke University

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

EMS '23

Sponsor:

SIGCOMM

EMS '23: 2023 Workshop on Emerging Multimedia Systems

September 10, 2023

NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 9 of 15 submissions, 60%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
311
Total Downloads

Downloads (Last 12 months)311
Downloads (Last 6 weeks)24

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents