Abstract
Monocular depth estimation algorithms successfully predict the relative depth order of objects in a scene. However, because of the fundamental scale ambiguity associated with monocular images, these algorithms fail at correctly predicting true metric depth. In this work, we demonstrate how a depth histogram of the scene, which can be readily captured using a single-pixel time-resolved detector, can be fused with the output of existing monocular depth estimation algorithms to resolve the depth ambiguity problem. We validate this novel sensor fusion technique experimentally and in extensive simulation. We show that it significantly improves the performance of several state-of-the-art monocular depth estimation algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad Siddiqui, T., Madhok, R., O’Toole, M.: An extensible multi-sensor fusion framework for 3D imaging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1008–1009 (2020)
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv:1812.11941v2 (2018)
Burri, S., Bruschini, C., Charbon, E.: Linospad: a compact linear SPAD camera system with 64 FPGA-based TDC modules for versatile 50 ps resolution time-resolved imaging. Instruments 1(1), 6 (2017)
Burri, S., Homulle, H., Bruschini, C., Charbon, E.: Linospad: a time-resolved \(256 \times 1\) CMOS SPAD line sensor system featuring 64 FPGA-based TDC channels running at up to 8.5 giga-events per second. In: Optical Sensing and Detection IV, vol. 9899, p. 98990D. International Society for Optics and Photonics (2016)
Caramazza, P., et al.: Neural network identification of people hidden from view with a single-pixel, single-photon detector. Sci. Rep. 8(1), 11945 (2018)
Chang, J., Wetzstein, G.: Deep optics for monocular depth estimation and 3D object detection. In: Proceedings of ICCV (2019)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of NeurIPS (2014)
Faccio, D., Velten, A., Wetzstein, G.: Non-line-of-sight imaging. Nat. Rev. Phys. 1–10 (2020)
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of CVPR (2018)
Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of ICCV (2019)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of CVPR (2017)
Gonzales, R., Fittes, B.: Gray-level transformations for interactive image enhancement. Mech. Mach. Theory 12(1), 111–122 (1977)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice-Hall Inc, Upper Saddle River (2008)
Gupta, A., Ingle, A., Velten, A., Gupta, M.: Photon-flooded single-photon 3D cameras. In: Proceedings of CVPR. IEEE (2019)
Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings of CVPR (2013)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Proceedings of ECCV (2014)
Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: Proceedings of 3DV (2018)
Heide, F., Diamond, S., Lindell, D.B., Wetzstein, G.: Sub-picosecond photon-efficient 3D imaging using single-photon sensors. Sci. Rep. 8(17726), 1–8 (2018)
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. 24(3), 577–584 (2005)
Karsch, K., Liu, C., Kang, S.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)
Kirmani, A., Venkatraman, D., Shin, D., Colaço, A., Wong, F.N., Shapiro, J.H., Goyal, V.K.: First-photon imaging. Science 343(6166), 58–61 (2014)
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Proceedings of 3DV. IEEE (2016)
Lamb, R., Buller, G.: Single-pixel imaging using 3D scanning time-of-flight photon counting. SPIE Newsroom (2010)
Lasinger, K., Ranftl, R., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv:1907.01341 (2019)
Li, Z.P., et al.: Single-photon computational 3D imaging at 45 km. arXiv preprint arXiv:1904.10341 (2019)
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of ICCV (2013)
Lindell, D.B., O’Toole, M., Wetzstein, G.: Single-photon 3D imaging with deep sensor fusion. ACM Trans. Graph. (SIGGRAPH) 37(4), 113 (2018)
Lindell, D.B., Wetzstein, G., O’Toole, M.: Wave-based non-line-of-sight imaging using fast F-K migration. ACM Trans. Graph. 38(4), 1–13 (2019)
Liu, X., Bauer, S., Velten, A.: Phasor field diffraction based reconstruction for fast non-line-of-sight imaging systems. Nat. Commun. 11(1), 1–13 (2020)
Liu, X., et al.: Non-line-of-sight imaging using phasor-field virtual wave optics. Nature 572(7771), 620–623 (2019)
Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: Proceedings of IROS (2015)
McManamon, P.: Review of ladar: a historic, yet emerging, sensor technology with rich phenomenology. Opt. Eng. 51(6), 060901 (2012)
Morovic, J., Shaw, J., Sun, P.L.: A fast, non-iterative and exact histogram matching algorithm. Pattern Recognit. Lett. 23(1–3), 127–135 (2002)
Niclass, C., Rochas, A., Besse, P.A., Charbon, E.: Design and characterization of a CMOS 3-D image sensor based on single photon avalanche diodes. IEEE J. Solid-State Circuits 40(9), 1847–1854 (2005)
Nikolova, M., Wen, Y.W., Chan, R.: Exact histogram specification for digital images using a variational approach. J. Math. Imaging. Vis. 46(3), 309–325 (2013)
O’Connor, D.V., Phillips, D.: Time-Correlated Single Photon Counting. Academic Press, London (1984)
O’Toole, M., Heide, F., Lindell, D.B., Zang, K., Diamond, S., Wetzstein, G.: Reconstructing transient images from single-photon sensors. In: Proceedings of CVPR (2017)
O’Toole, M., Lindell, D.B., Wetzstein, G.: Confocal non-line-of-sight imaging based on the light-cone transform. Nature 555(7696), 338–341 (2018)
Pawlikowska, A.M., Halimi, A., Lamb, R.A., Buller, G.S.: Single-photon three-dimensional imaging at up to 10 kilometers range. Opt. Express 25(10), 11919–11931 (2017)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of CVPR (2016)
Rapp, J., Ma, Y., Dawson, R.M.A., Goyal, V.K.: Dead time compensation for high-flux depth imaging. In: Proceedings of ICASSP (2019)
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: Proceedings of CVPR (2012)
Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs. In: Proceedings of CVPR (2006)
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings of NeurIPS (2006)
Shin, D., Kirmani, A., Goyal, V.K., Shapiro, J.H.: Photon-efficient computational 3-D and reflectivity imaging with single-photon detectors. IEEE Trans. Computat. Imag. 1(2), 112–125 (2015)
Shin, D., et al.: Photon-efficient imaging with a single-photon camera. Nat. Commun. 7, 12046 (2016)
Shrivastava, A., Gupta, A.: Building part-based object detectors via 3D geometry. In: Proceedings of ICCV (2013)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Proceedings of ECCV (2012)
Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Proceedings of ECCV (2014)
Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: Proceedings of CVPR (2016)
Stoppa, D., Pancheri, L., Scandiuzzo, M., Gonzo, L., Dalla Betta, G.F., Simoni, A.: A CMOS 3-D imager based on single photon avalanche diode. IEEE Trans. Circuits Syst. I Reg. Papers 54(1), 4–12 (2007)
Sun, Z., Lindell, D.B., Solgaard, O., Wetzstein, G.: Spadnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation. Opt. Express 28(10), 14948–14962 (2020)
Swoboda, P., Schnörr, C.: Convex variational image restoration with histogram priors. SIAM J. Imaging Sci. 6(3), 1719–1735 (2013)
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Heidelberg (2010)
Veerappan, C., et al.: A \(160 \times 128\) single-photon image sensor with on-pixel 55ps 10b time-to-digital converter. In: Proceedings of ISSCC (2011)
Wu, Y., Boominathan, V., Chen, H., Sankaranarayanan, A., Veeraraghavan, A.: PhaseCam3D–learning phase masks for passive single view depth estimation. In: Proceedings of ICCP (2019)
Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of CVPR (2015)
Xin, S., Nousias, S., Kutulakos, K.N., Sankaranarayanan, A.C., Narasimhan, S.G., Gkioulekas, I.: A theory of Fermat paths for non-line-of-sight shape reconstruction. In: Proceedings of CVPR (2019)
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Proceedings of CVPR (2017)
Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., Ricci, E.: Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of CVPR (2018)
Zhang, C., Lindner, S., Antolovic, I., Wolf, M., Charbon, E.: A CMOS SPAD imager with collision detection and 128 dynamically reallocating TDCs for single-photon counting and 3D time-of-flight imaging. Sensors 18(11), 4016 (2018)
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM Trans. Graph. 9(4) (2017)
Acknowledgments
D.L. was supported by a Stanford Graduate Fellowship. C.M. was supported by an ORISE Intelligence Community Postdoctoral Fellowship. G.W. was supported by an NSF CAREER Award (IIS 1553333), a Sloan Fellowship, by the KAUST Office of Sponsored Research through the Visual Computing Center CCF grant, and a PECASE by the ARL.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Nishimura, M., Lindell, D.B., Metzler, C., Wetzstein, G. (2020). Disambiguating Monocular Depth Estimation with a Single Transient. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-58589-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)