Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3609395.3610593acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Mobile Volumetric Video Streaming System through Implicit Neural Representation

Published: 26 September 2023 Publication History

Abstract

Volumetric video (VV) emerges as a new video paradigm with six degree-of-freedom (DoF) immersive viewing experience. Most existing VV systems focus on the point cloud (PtCl)-based architecture, which is however far from effective due to the huge video size, unrealistic color variations, and specialized player platform requirement. The recent advance of implicit neural representations (INR) such as NeRF brings great opportunities to VV given its potential in creating photorealistic 3D appearances and lighting consistency. However, there still exist arduous challenges in many aspects such as model training, display rendering, streaming optimization, and system implementation. To address the above challenges, we develop NeRVo, an INR-based VV representation for mobile VV. NeRVo improves the training and rendering speed over 300x and 1000x with photorealism, mobile compatibility, and desirable datarates compared to NeRF. We adopt NeRVo as a building block, design and implement a holistic INR-enhanced VV streaming system VoINR.

References

[1]
8i voxelized full bodies-a voxelized point cloud dataset. http://plenodb.jpeg.org/pc/8ilabs/.
[2]
[3]
Jpeg pleno database:microsoft voxelized upper bodies-a voxelized point cloud dataset. https://plenodb.jpeg.org/pc/microsoft.
[4]
Lepcc. https://github.com/Esri/lepcc.
[5]
Point cloud library. http://pointclouds.org.
[6]
Point cloud visualizer addon. https://www.blendermarket.com/products/pcv.
[7]
Stop motion obj addon. https://github.com/neverhood311/Stop-motion-OBJ.
[8]
Draco 3d. https://google.github.io/draco, 2018.
[9]
R. Adhikari and R. K. Agrawal. An introductory study on time series modeling and forecasting. arXiv preprint arXiv:1302.6613, 2013.
[10]
Z. Chen, T. Funkhouser, P. Hedman, and A. Tagliasacchi. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In Proceedings of the IEEE/CVF CVPR, pages 16569--16578, 2023.
[11]
L. De Luigi, A. Cardace, R. Spezialetti, P. Z. Ramirez, S. Salti, and L. Di Stefano. Deep learning on implicit neural representations of shapes. arXiv preprint arXiv:2302.05438, 2023.
[12]
A. Hamdi, B. Ghanem, and M. Nießner. Sparf: Large-scale learning of 3d sparse radiance fields from few input images. arXiv preprint arXiv:2212.09100, 2022.
[13]
B. Han, Y. Liu, and F. Qian. Vivo: Visibility-aware mobile volumetric video streaming. In Proceedings of the 26th annual international conference on mobile computing and networking, pages 1--13, 2020.
[14]
P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec. Baking neural radiance fields for real-time view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5875--5884, 2021.
[15]
S.-M. Hu, D. Liang, G.-Y. Yang, G.-W. Yang, and W.-Y. Zhou. Jittor: a novel deep learning framework with meta-operators and unified graph execution. Science China Information Sciences, pages 1--21, 2020.
[16]
Y. Jin, J. Liu, and F. Wang. Ebublio: Edge assisted multi-user 360-degree video streaming. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pages 600--601. IEEE, 2022.
[17]
K. Lee, J. Yi, Y. Lee, S. Choi, and Y. M. Kim. Groot: a real-time streaming system of high-fidelity volumetric videos. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.
[18]
J. Li, C. Zhang, Z. Liu, R. Hong, and H. Hu. Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Transactions on Multimedia, 2022.
[19]
Y. Liu, B. Han, F. Qian, A. Narayanan, and Z.-L. Zhang. Vues: practical mobile volumetric video streaming through multiview transcoding. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, pages 514--527, 2022.
[20]
S. Lombardi, T. Simon, J. Saragih, G. Schwartz, A. Lehrmann, and Y. Sheikh. Neural volumes: Learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751, 2019.
[21]
H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with pensieve. In Proceedings of the conference of the ACM special interest group on data communication, pages 197--210, 2017.
[22]
B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1--14, 2019.
[23]
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405--421. Springer, 2020.
[24]
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928--1937. PMLR, 2016.
[25]
T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1--15, 2022.
[26]
J. Obert, J. van Waveren, and G. Sellers. Virtual texturing in software and hardware. In ACM SIGGRAPH 2012 Courses, pages 1--29. 2012.
[27]
J. Park, P. A. Chou, and J.-N. Hwang. Rate-utility optimized streaming of volumetric media for augmented reality. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(1):149--162, 2019.
[28]
I. Reimat, E. Alexiou, J. Jansen, I. Viola, S. Subramanyam, and P. Cesar. Cwipc-sxr: Point cloud dynamic human dataset for social xr. In Proceedings of the 12th ACM Multimedia Systems Conference, pages 300--306, 2021.
[29]
A. Rosinol, J. J. Leonard, and L. Carlone. Nerf-slam: Real-time dense monocular slam with neural radiance fields. arXiv preprint arXiv:2210.13641, 2022.
[30]
T. Takikawa, A. Evans, J. Tremblay, T. Müller, M. McGuire, A. Jacobson, and S. Fidler. Variable bitrate neural fields. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1--9, 2022.
[31]
T. Takikawa, A. Evans, J. Tremblay, T. Müller, M. McGuire, A. Jacobson, and S. Fidler. Variable bitrate neural fields. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1--9, 2022.
[32]
H. Yeo, C. J. Chong, Y. Jung, J. Ye, and D. Han. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.
[33]
A. Zhang, C. Wang, B. Han, and F. Qian. Yuzu: Neural-enhanced volumetric video streaming. In 19th USENIX Symposium on Networked Systems Design and Implementation, pages 137--154, 2022.
[34]
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586--595, 2018.
[35]
Z. Zhu, S. Peng, V. Larsson, Z. Cui, M. R. Oswald, A. Geiger, and M. Pollefeys. Nicer-slam: Neural implicit scene encoding for rgb slam. arXiv preprint arXiv:2302.03594, 2023.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EMS '23: Proceedings of the 2023 Workshop on Emerging Multimedia Systems
September 2023
65 pages
ISBN:9798400703034
DOI:10.1145/3609395
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. volumetric video streaming
  2. implicit neural representation
  3. mobile mixed reality

Qualifiers

  • Research-article

Conference

EMS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 9 of 15 submissions, 60%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 311
    Total Downloads
  • Downloads (Last 12 months)311
  • Downloads (Last 6 weeks)24
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media