Abstract
This paper proposes a novel solution of real-time depth range and correct focusing estimation in light field videos represented by arrays of video sequences. This solution, compared to previous approaches, offers a novel way to render high-quality synthetic views from light field videos on contemporary hardware in real-time. Only the video frames containing color information with no other attributes, such as captured depth, are needed. The drawbacks of the previous proposals such as block artifacts in the defocused parts of the scene or manual setting of the depth range are also solved in this paper. This paper describes a complete solution that solves the main memory and performance issues of light field rendering on contemporary personal computers. The whole integration of high-quality light field videos supersedes the approaches in previous works and the paper also provides measurements and experimental results. While reaching the same quality as slower state-of-the-art approaches, this method can still be used in real-time which makes it suitable for industry and real-life scenarios as an alternative to standard 3D rendering approaches.
Similar content being viewed by others
Code Availability
The code is publicly available https://www.fit.vutbr.cz/~ichlubna/research and free to use.
Change history
14 December 2023
The original version of this paper was updated to correct the Code Availability link.
Notes
Blender Demo Files - Barcelona Pavillion by Hamza Cheggour
GL_NV_vdpau_interop extension for OpenGL
References
Trottnow, J., Spielmann, S., Lange, T., Chelli, K., Solony, M., Smrz, P., Zemcik, P., Aenchbacher, W., Grogan, M., Alain, M., Smolic, A., Canham, T., Vu-Thanh, O., Vázquez-Corral, J., & Bertalmío, M. (2019). The potential of light fields in media productions. In: SIGGRAPH Asia 2019 Technical Briefs. SA ’19, pp. 71–74. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3355088.3365158
Chlubna, T., Milet, T., & Zemčík, P. (2021). Real-time per-pixel focusing method for light field rendering. Computational Visual Media, 2021(7), 319–333. https://doi.org/10.1007/s41095-021-0205-0
Adelson, E. H., & Bergen, J. R. (1991). The plenoptic function and the elements of early vision. In M. S. Landy & A. J. Movshon (Eds.), Computational Models of Visual Processing (pp. 3–20). Cambridge, MA: MIT Press.
Levoy, M., & Hanrahan, P. (1996) Light field rendering. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’96, pp. 31–42. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/237170.237199
Gortler, S. J., Grzeszczuk, R., Szeliski, R., Cohen, M. F. (1996). The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’96, pp. 43–54. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/237170.237200
Isaksen, A., McMillan, L., Gortler, S. J. (2000). Dynamically reparameterized light fields. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’00, pp. 297–306. ACM Press/Addison-Wesley Publishing Co., USA. https://doi.org/10.1145/344779.344929
Schmeing, M., & Jiang, X. (2011). In: Wang, P.S.P. (ed.) Depth Image Based Rendering, pp. 279–310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22407-2_12
Lee, S., Kim, Y., & Eisemann, E. (2018). Iterative depth warping. ACM Transactions on Graphics, 37, 5. https://doi.org/10.1145/3190859
Rosenthal, P., & Linsen, L. (2008). Image-space point cloud rendering. In: Proceedings of Computer Graphics International, pp. 136–143.
Waschbüsch, M., Würmlin, S., & Gross, M. (2007). 3d video billboard clouds. In: Computer Graphics Forum, 26, 561–569. Wiley Online Library.
Broxton, M., Flynn, J., Overbeck, R., Erickson, D., Hedman, P., DuVall, M., Dourgarian, J., Busch, J., Whalen, M., & Debevec, P. (2020). Immersive light field video with a layered mesh representation, 39(4), 86–18615.
Yamanoue, H., Okui, M., & Yuyama, I. (2000). A study on the relationship between shooting conditions and cardboard effect of stereoscopic images. IEEE Transactions on Circuits and Systems for Video Technology, 10(3), 411–416. https://doi.org/10.1109/76.836285
Wilburn, B. S., Smulski, M., Lee, H. -H. K., & Horowitz, M. A. (2001). Light field video camera. In: Media Processors 2002, 4674, 29–36. International Society for Optics and Photonics.
Yang, J. C., Everett, M., Buehler, C., & McMillan, L. (2002). A real-time distributed light field camera. Rendering Techniques, 2002, 77–86.
Georgiev, T., Yu, Z., Lumsdaine, A., & Goma, S. (2013). Lytro camera technology: theory, algorithms, performance analysis. In: Multimedia Content and Mobile Devices, 8667, 86671. International Society for Optics and Photonics.
Lin, X., Wu, J., Zheng, G., & Dai, Q. (2015). Camera array based light field microscopy. Biomedical optics express, 6(9), 3179–3189.
Chelli, K., Lange, T., Thorsten, H., Solony, M., Smrz, P., Alain, M., Smolic, A., Trottnow, J., & Helzle, V. (2020). A versatile 5d light field capture array. In: NEM Summit 2020. New European Media Initiative.
Lin, Z., & Shum, H. -Y. (2004). A geometric analysis of light field rendering. International Journal of Computer Vision, 58(2), 121–138. https://doi.org/10.1023/B:VISI.0000015916.91741.27
Hamzah, R. A., & Ibrahim, H. (2016). Literature survey on stereo vision disparity map algorithms. Journal of Sensors 2016.
Alain, M., Aenchbacher, W., & Smolic, A. (2019). Interactive light field tilt-shift refocus with generalized shift-and-sum. ArXiv abs/1910.04699
Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., Hanrahan, P. (2005). Light field photography with a hand-held plenoptic camera. PhD thesis, Stanford University.
Sugita, K., Naemura, T., Harashima, H., & Takahashi, K. (2004). Focus measurement on programmable graphics hardware for all in-focus rendering from light fields. In: Virtual Reality Conference, IEEE, p. 255. IEEE Computer Society, Los Alamitos, CA, USA. https://doi.org/10.1109/VR.2004.1310096
Yang, R., Welch, G., & Bishop, G. (2002). Real-time consensus-based scene reconstruction using commodity graphics hardware+, 22, 225–234. https://doi.org/10.1109/PCCGA.2002.1167864
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2495–2504.
Shi, L., Hassanieh, H., Davis, A., Katabi, D., & Durand, F. (2015). Light field reconstruction using sparsity in the continuous fourier domain. ACM Transactions on Graphics, 34(1). https://doi.org/10.1145/2682631
Vagharshakyan, S., Bregovic, R., & Gotchev, A. (2018). Light field reconstruction using shearlet transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(1), 133–147. https://doi.org/10.1109/TPAMI.2017.2653101
Hirschmuller, H. (2005). Accurate and efficient stereo processing by semi-global matching and mutual information. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2, 807–814. IEEE.
Anisimov, Y., Wasenmüller, O., & Stricker, D. (2019). Rapid light field depth estimation with semi-global matching. 2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP), 109–116.
Kolmogorov, V., & Zabih, R. (2001). Multi-camera scene reconstruction via graph cuts, 2352. https://doi.org/10.1007/3-540-47977-5_6
Wu, Y., Wang, Y., Liang, J., Bajic, I. V., & Wang, A. (2020). Light field all-in-focus image fusion based on spatially-guided angular information. Journal of Visual Communication and Image Representation, 72, 102878. https://doi.org/10.1016/j.jvcir.2020.102878
Sun, D., Roth, S., & Black, M. J. (2010). Secrets of optical flow estimation and their principles. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2432–2439. https://doi.org/10.1109/CVPR.2010.5539939
Jiang, X., Pendu, M. L., & Guillemot, C. (2018). Depth estimation with occlusion handling from a sparse set of light field views. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 634–638. https://doi.org/10.1109/ICIP.2018.8451466
Chen, Y., Alain, M., & Smolic, A. (2017). Fast and accurate optical flow based depth map estimation from light fields. In: Irish Machine Vision and Image Processing Conference (IMVIP).
Lin, H., Chen, C., Kang, S. B., & Yu, J. (2015). Depth recovery from light field using focal stack symmetry. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3451–3459.
Tao, M. W., Hadap, S., Malik, J., & Ramamoorthi, R. (2013). Depth from combining defocus and correspondence using light-field cameras. In: 2013 IEEE International Conference on Computer Vision, pp. 673–680.
Neri, A., Carli, M., & Battisti, F. (2015). A multi-resolution approach to depth field estimation in dense image arrays. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 3358–3362.
Hosni, A., Bleyer, M., Rhemann, C., Gelautz, M., & Rother, C. (2011). Real-time local stereo matching using guided image filtering. In: 2011 IEEE International Conference on Multimedia and Expo, pp. 1–6. https://doi.org/10.1109/ICME.2011.6012131
Penner, E., & Zhang, L. (2017). Soft 3d reconstruction for view synthesis. ACM Transactions on Graphics, 36(6). https://doi.org/10.1145/3130800.3130855
Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14, pp. 2366–2374. MIT Press, Cambridge, MA, USA.
Peng, J., Xiong, Z., Liu, D., & Chen, X. (2018). Unsupervised depth estimation from light field using a convolutional neural network. In: 2018 International Conference on 3D Vision (3DV), pp. 295–303. https://doi.org/10.1109/3DV.2018.00042
Eslami, S. M. A., JimenezRezende, D., Besse, F., Viola, F., Morcos, A. ., Garnelo, M., Ruderman, A., Rusu, A. A., Danihelka, I., Gregor, K., Reichert, D. P., Buesing, L., Weber, T., Vinyals, O., Rosenbaum, D., Rabinowitz, N., King, H., Hillier, C., Botvinick, M., Wierstra, D., Kavukcuoglu, K., & Hassabis, D. (2018). Neural scene representation and rendering. Science, 360(6394), 1204–1210. https://doi.org/10.1126/science.aar6170
Han, X., Laga, H., & Bennamoun, M. (2019). Image-based 3d object reconstruction: State-of-the-art and trends in the deep learning era. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. https://doi.org/10.1109/tpami.2019.2954885.
Ni, L., Jiang, H., Cai, J., Zheng, J., Li, H., & Liu, X. (2019). Unsupervised Dense Light Field Reconstruction with Occlusion Awareness. Computer Graphics Forum, 38(7), 425–436. https://doi.org/10.1111/cgf.13849
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. In: ECCV.
Navarro, J., & Sabater, N. (2021). Learning occlusion-aware view synthesis for light fields. Pattern Analysis and Applications, 24(3), 1319–1334. https://doi.org/10.1007/s10044-021-00956-2
Mildenhall, B., Srinivasan, P. P., Ortiz-Cayon, R., Kalantari, N. K., Ramamoorthi, R., Ng, R., & Kar, A. (2019). Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines.
Jiang, H., Sun, D., Jampani, V., Yang, M. -H., Learned-Miller, E., & Kautz, J. (2017). Super slomo: High quality estimation of multiple intermediate frames for video interpolation. CVPR 2018. https://doi.org/10.48550/ARXIV.1712.00080
Wang, H., Sun, M., & Yang, R. (2007). Space-time light field rendering. IEEE Transactions on Visualization and Computer Graphics, 13(4), 697–710.
Wang, T. -C., Zhu, J. -Y., Kalantari, N. K., Efros, A. A., & Ramamoorthi, R. (2017). Light field video capture using a learning-based hybrid imaging system. ACM Transactions on Graphics (TOG), 36(4), 1–13.
Sabater, N., Boisson, G., Vandame, B., Kerbiriou, P., Babon, F., Hog, M., Gendrot, R., Langlois, T., Bureller, O., Schubert, A., et al. (2017). Dataset and pipeline for multi-view light-field video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 30–40.
Daqbala, L., Ziegler, M., Didyk, P., Zilly, F., Keinert, J., Myszkowski, K., Seidel, H.-P., Rokita, P., & Ritschel, T. (2016). Efficient Multi-image Correspondences for On-line Light Field Video Processing. Computer Graphics Forum. https://doi.org/10.1111/cgf.13037
Salvador, G., Chau, J., Quesada, J., & Carranza, C. (2018). Efficient gpu-based implementation of the median filter based on a multi-pixel-per-thread framework, pp. 121–124. https://doi.org/10.1109/SSIAI.2018.8470318
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1), 7–42.
Kawase, M. (2003). Frame buffer postprocessing effects in double-steal (wrechless). In: Game Developers Conference 2003, 3.
Vaish, V., & Adams, A. (2008). The (new) stanford light field archive. Computer Graphics Laboratory, Stanford University, 6(7).
Rerabek, M., & Ebrahimi, T. (2016). New light field image dataset. In: 8th International Conference on Quality of Multimedia Experience (QoMEX).
Reda, F., Kontkanen, J., Tabellion, E., Sun, D., Pantofaru, C., & Curless, B. (2022). Film: Frame interpolation for large motion. ECCV 2022.
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping, 3024, 25–36. https://doi.org/10.1007/978-3-540-24673-2_3
Müller, T., Evans, A., Schied, C., & Keller, A. (2022). Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4), 102–110215. https://doi.org/10.1145/3528223.3530127
Reda, F., Kontkanen, J., Tabellion, E., Sun, D., Pantofaru, C., & Curless, B. (2022). Tensorflow 2 Implementation of "FILM: Frame Interpolation for Large Motion". GitHub.
Choi, M., Choi, J., Baik, S., Kim, T. H., & Lee, K. M. (2020). Scene-adaptive video frame interpolation via meta-learning. In: CVPR.
Bařina, D., Chlubna, T., Šolony, M., Dlabaja, D., & Zemčík, P. (2019). Evaluation of 4d light field compression methods. In: International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), Part I. Computer Science Research Notes (CSRN), vol. 2901, pp. 55–61. Union Agency. https://doi.org/10.24132/CSRN.2019.2901.1.7
Acknowledgements
This work was supported by the KDT JU project AIDOaRt, grant agreement No 101007350. The authors would like to thank the anonymous reviewers who helped to improve the quality of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chlubna, T., Milet, T., Zemčík, P. et al. Real-Time Light Field Video Focusing and GPU Accelerated Streaming. J Sign Process Syst 95, 703–719 (2023). https://doi.org/10.1007/s11265-023-01874-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-023-01874-8