Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Pyramid NeRF: Frequency Guided Fast Radiance Field Optimization

  • Manuscript
  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Novel view synthesis using implicit neural functions such as Neural Radiance Field (NeRF) has achieved significant progress recently. However, it is very computationally expensive to train a NeRF due to the disordered frequency optimization. In this paper, we propose the Pyramid NeRF, which guides the NeRF training in a ‘low-frequency first, high-frequency second’ style using the image pyramids and could improve the training and inference speed at \(15\times \) and \(805\times \), respectively. The high training efficiency is guaranteed by (i) organized frequency-guided optimization could improve the convergency speed and efficiently reduce the training iterations and (ii) progressive subdivision, which replaces a single large multi-layer perceptron (MLP) with thousands of tiny MLPs, could significantly decrease the execution time of running MLPs. Experiments on various synthetic and real scenes verify the high efficiency of the Pyramid NeRF. Meanwhile, the structure and perceptual similarities could be better recovered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availibility Statement

The original multi-view datasets come from the public datasets (Mildenhall et al., 2020; Lombardi et al., 2019; Yao et al., 2020). All codes and models will be publicly available to the research community to facilitate reproducible research once the paper is accepted. The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. Note that a global world coordinate system with same unit is used in all scales, e.g. the ray from pixel (1, 1) in the scale S is coincident with the rays from pixels (2, 2) and (4, 4) in scales \(S-1\) and \(S-2\), respectively.

References

  • Attal, B., Ling, S., Gokaslan, A., Richardt, C., & Tompkin, J. (2020). Matryodshka: Real-time 6dof video view synthesis using multi-sphere images. In Proceedings of the European conference on computer vision. Springer: Berlin (pp. 441–459).

  • Bergman, A., Kellnhofer, P., & Wetzstein, G. (2021). Fast training of neural lumigraph representations using meta learning. Advances in Neural Information Processing Systems, 34. ,172–186.

  • Broxton, M., Flynn, J., Overbeck, R., Erickson, D., Hedman, P., Duvall, M., Dourgarian, J., Busch, J., Whalen, M., & Debevec, P. (2020). Immersive light field video with a layered mesh representation. ACM Transactions on Graphics, 39(4), 86–1.

    Article  Google Scholar 

  • Chaurasia, G., Sorkine, O., & Drettakis, G. (2011). Silhouette-aware warping for image-based rendering. Computer Graphics Forum, Wiley Online Library, 30, 1223–1232.

    Article  Google Scholar 

  • Chaurasia, G., Duchene, S., Sorkine-Hornung, O., & Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics, 32(3), 1–12.

    Article  Google Scholar 

  • Chen, Z., & Zhang, H. (2019). Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5939–5948).

  • Eisemann, M., De Decker, B., Magnor, M., Bekaert, P., De Aguiar, E., Ahmed, N., Theobalt, C., & Sellent, A. (2008). Floating textures. Computer graphics forum, Wiley Online Library, 27, 409–418.

    Article  Google Scholar 

  • Flynn, J., Neulander, I., Philbin, J., & Snavely, N. (2016). Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5515–5524).

  • Garbin, S. J., Kowalski, M., Johnson, M., Shotton, J., & Valentin, J. (2021). Fastnerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14346–14355).

  • Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., Klowsky, R., Steedly, D., & Szeliski, R. (2010). Ambient point clouds for view interpolation. In ACM SIGGRAPH 2010 papers (pp. 1–6).

  • Hedman, P., Philip, J., Price, T., Frahm, J. M., Drettakis, G., & Brostow, G. (2018). Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics, 37(6), 1–15.

    Article  Google Scholar 

  • Hedman, P., Srinivasan, P. P., Mildenhall, B., Barron, J. T., & Debevec, P. (2021). Baking neural radiance fields for real-time view synthesis. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5875–5884).

  • Kellnhofer, P., Jebe, L. C., Jones, A., Spicer, R., Pulli, K., & Wetzstein, G. (2021). Neural lumigraph rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4287–4297).

  • Levin, A., & Durand, F. (2010). Linear view synthesis using a dimensionality gap light field prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE (pp. 1831–1838).

  • Lindeberg, T. (1994). Scale-space theory: A basic tool for analyzing structures at different scales. Journal of Applied Statistics, 21(1–2), 225–270.

    Article  Google Scholar 

  • Lindell, D. B., Van Veen, D., Park, J. J., & Wetzstein, G. (2022). Bacon: Band-limited coordinate networks for multiscale scene representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16252–16262)

  • Liu, S., Zhang, X., Zhang, Z., Zhang, R., Zhu, J. Y., & Russell, B. (2021). Editing conditional radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5773–5783)

  • Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., & Sheikh, Y. (2019). Neural volumes: Learning dynamic renderable volumes from images. ACM Transactions on Graphics, 38(4), 65.1-65.14.

    Article  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Mildenhall, B., Srinivasan, P. P., Ortiz-Cayon, R., Kalantari, N. K., Ramamoorthi, R., Ng, R., & Kar, A. (2019). Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics, 38(4), 1–14.

    Article  Google Scholar 

  • Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European conference on computer vision. Springer: Berlin (pp. 405–421)

  • Müller, T., Evans, A., Schied, C., & Keller, A. (2022). Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 41(4), 102:1-102:15.

    Article  Google Scholar 

  • Niemeyer, M., Mescheder, L., Oechsle, M., & Geiger, A. (2020). Differentiable volumetric rendering: Learning implicit 3D representations without 3d supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3504–3515).

  • Pan, X., Xu, X., Loy, C. C., Theobalt, C., & Dai, B. (2021). A shading-guided generative implicit model for shape-accurate 3D-aware image synthesis. Advances in Neural Information Processing Systems, 34, 20002–20013.

    Google Scholar 

  • Park, J. J., Florence, P., Straub, J., Newcombe, R., & Lovegrove, S. (2019). Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 165–174).

  • Penner, E., & Zhang, L. (2017). Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics, 36(6), 1–11.

    Article  Google Scholar 

  • Pujades, S., Devernay, F., & Goldluecke, B. (2014). Bayesian view synthesis and image-based rendering principles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3906–3913).

  • Reiser, C., Peng, S., Liao, Y., & Geiger, A. (2021). Kilonerf: Speeding up neural radiance fields with thousands of tiny MLPs. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14335–14345).

  • Riegler, G., & Koltun, V. (2020). Free view synthesis. In Proceedings of the European conference on computer vision. Springer: Berlin (pp. 623–640).

  • Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., & Kanazawa, A. (2022). Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

  • Schwarz, K., Sauer, A., Niemeyer, M., Liao, Y., & Geiger, A. (2022). Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. arXiv preprintarXiv:2206.07695

  • Shi, L., Hassanieh, H., Davis, A., Katabi, D., & Durand, F. (2014). Light field reconstruction using sparsity in the continuous Fourier domain. ACM Transactions on Graphics, 34(1), 1–13.

    Article  Google Scholar 

  • Sun, C., Sun, M., & Chen, H. (2022). Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

  • Tancik, M., Mildenhall, B., Wang, T., Schmidt, D., Srinivasan, P. P., Barron, J. T., & Ng, R. (2021). Learned initializations for optimizing coordinate-based neural representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2846–2855).

  • Tang, J. (2022). Torch-NGP: A pytorch implementation of instant-ngp. https://github.com/ashawkey/torch-ngp

  • Tewari, A., Thies, J., Mildenhall, B., Srinivasan, P., Tretschk, E., Wang, Y., Lassner, C., Sitzmann, V., Martin-Brualla, R., Lombardi, S., Simon, T., Theobalt, C., Niessner, M., Barron, J. T., Wetzstein, G., Zollhoefer, M., & Golyanik, V. (2021). Advances in neural rendering. arXiv preprint arXiv:2111.05849

  • Vagharshakyan, S., Bregovic, R., & Gotchev, A. (2017). Light field reconstruction using Shearlet transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(1), 133–147.

    Article  Google Scholar 

  • Wang, C., Chai, M., He, M., Chen, D., & Liao, J. (2021). Clip-NeRF: text-and-image driven manipulation of neural radiance fields. arXiv preprint arXiv:2112.05139

  • Wanner, S., & Goldluecke, B. (2013). Variational light field analysis for disparity estimation and super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 606–619.

    Article  Google Scholar 

  • Wu, G., Zhao, M., Wang, L., Dai, Q., Chai, T., & Liu, Y. (2017). Light field reconstruction using deep convolutional network on epi. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6319–6327).

  • Xiangli, Y., Xu, L., Pan, X., Zhao, N., Rao, A., Theobalt, C., Dai, B., & Lin, D. (2022). Bungeenerf: Progressive neural radiance field forextreme multi-scale scene rendering. In European conference on computer vision

  • Yang, B., Zhang, Y., Xu, Y., Li, Y., Zhou, H., Bao, H., Zhang, G., & Cui, Z. (2021). Learning object-compositional neural radiance field for editable scene rendering. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13779–13788).

  • Yao, Y., Luo, Z., Li, S., Zhang, J., Ren, Y., Zhou, L., Fang, T., & Quan, L. (2020). Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1790–1799).

  • Yeung, P. H., Hesse, L., Aliasi, M., Haak, the INTERGROWTH-21st Consortium, M., Xie, W., & Namburete, A. I. (2021). Implicitvol: Sensorless 3D ultrasound reconstruction with deep implicit representation. arXiv preprint arXiv:2109.12108

  • Yu, A., Li, R., Tancik, M., Li, H., Ng, R., & Kanazawa, A. (2021). Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5752–5761).

  • Zhang, R., Isola, P., Efros, A.A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 586–595).

  • Zhou, T., Tucker, R., Flynn, J., Fyffe, G., & Snavely, N. (2018). Stereo magnification: Learning view synthesis using multiplane images. ACM Transactions on Graphics, 37(4), 1-12.

    Article  Google Scholar 

  • Zhu, H., Guo, M., Li, H., Wang, Q., & Robles-Kelly, A. (2019). Revisiting spatio-angular trade-off in light field cameras and extended applications in super-resolution. IEEE Transactions on Visualization and Computer Graphics, 27(6), 3019–3033.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xun Cao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work was supported by NSFC under Grants 62101242, 62022038, 62025108 and Leading Technology of Jiangsu Basic Research Plan (BK20192003).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, J., Zhu, H., Zhang, Q. et al. Pyramid NeRF: Frequency Guided Fast Radiance Field Optimization. Int J Comput Vis 131, 2649–2664 (2023). https://doi.org/10.1007/s11263-023-01829-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01829-3

Keywords