Disambiguating Monocular Depth Estimation with a Single Transient

Nishimura, Mark; Lindell, David B.; Metzler, Christopher; Wetzstein, Gordon

doi:10.1007/978-3-030-58589-1_9

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

European Conference on Computer Vision

4485 Accesses
11 Citations

Abstract

Monocular depth estimation algorithms successfully predict the relative depth order of objects in a scene. However, because of the fundamental scale ambiguity associated with monocular images, these algorithms fail at correctly predicting true metric depth. In this work, we demonstrate how a depth histogram of the scene, which can be readily captured using a single-pixel time-resolved detector, can be fused with the output of existing monocular depth estimation algorithms to resolve the depth ambiguity problem. We validate this novel sensor fusion technique experimentally and in extensive simulation. We show that it significantly improves the performance of several state-of-the-art monocular depth estimation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Monocular 3D Object Detection with Depth from Motion

Probabilistic multi-modal depth estimation based on camera–LiDAR sensor fusion

Article Open access 29 July 2023

Aberration-robust monocular passive depth sensing using a meta-imaging camera

Article Open access 05 September 2024

References

Ahmad Siddiqui, T., Madhok, R., O’Toole, M.: An extensible multi-sensor fusion framework for 3D imaging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1008–1009 (2020)
Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv:1812.11941v2 (2018)
Burri, S., Bruschini, C., Charbon, E.: Linospad: a compact linear SPAD camera system with 64 FPGA-based TDC modules for versatile 50 ps resolution time-resolved imaging. Instruments 1(1), 6 (2017)
Article Google Scholar
Burri, S., Homulle, H., Bruschini, C., Charbon, E.: Linospad: a time-resolved $256 \times 1$ CMOS SPAD line sensor system featuring 64 FPGA-based TDC channels running at up to 8.5 giga-events per second. In: Optical Sensing and Detection IV, vol. 9899, p. 98990D. International Society for Optics and Photonics (2016)
Google Scholar
Caramazza, P., et al.: Neural network identification of people hidden from view with a single-pixel, single-photon detector. Sci. Rep. 8(1), 11945 (2018)
Article Google Scholar
Chang, J., Wetzstein, G.: Deep optics for monocular depth estimation and 3D object detection. In: Proceedings of ICCV (2019)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of NeurIPS (2014)
Google Scholar
Faccio, D., Velten, A., Wetzstein, G.: Non-line-of-sight imaging. Nat. Rev. Phys. 1–10 (2020)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of CVPR (2018)
Google Scholar
Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of ICCV (2019)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of CVPR (2017)
Google Scholar
Gonzales, R., Fittes, B.: Gray-level transformations for interactive image enhancement. Mech. Mach. Theory 12(1), 111–122 (1977)
Article Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice-Hall Inc, Upper Saddle River (2008)
Google Scholar
Gupta, A., Ingle, A., Velten, A., Gupta, M.: Photon-flooded single-photon 3D cameras. In: Proceedings of CVPR. IEEE (2019)
Google Scholar
Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings of CVPR (2013)
Google Scholar
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Proceedings of ECCV (2014)
Google Scholar
Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: Proceedings of 3DV (2018)
Google Scholar
Heide, F., Diamond, S., Lindell, D.B., Wetzstein, G.: Sub-picosecond photon-efficient 3D imaging using single-photon sensors. Sci. Rep. 8(17726), 1–8 (2018)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. 24(3), 577–584 (2005)
Article Google Scholar
Karsch, K., Liu, C., Kang, S.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)
Article Google Scholar
Kirmani, A., Venkatraman, D., Shin, D., Colaço, A., Wong, F.N., Shapiro, J.H., Goyal, V.K.: First-photon imaging. Science 343(6166), 58–61 (2014)
Article Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Proceedings of 3DV. IEEE (2016)
Google Scholar
Lamb, R., Buller, G.: Single-pixel imaging using 3D scanning time-of-flight photon counting. SPIE Newsroom (2010)
Google Scholar
Lasinger, K., Ranftl, R., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv:1907.01341 (2019)
Li, Z.P., et al.: Single-photon computational 3D imaging at 45 km. arXiv preprint arXiv:1904.10341 (2019)
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of ICCV (2013)
Google Scholar
Lindell, D.B., O’Toole, M., Wetzstein, G.: Single-photon 3D imaging with deep sensor fusion. ACM Trans. Graph. (SIGGRAPH) 37(4), 113 (2018)
Article Google Scholar
Lindell, D.B., Wetzstein, G., O’Toole, M.: Wave-based non-line-of-sight imaging using fast F-K migration. ACM Trans. Graph. 38(4), 1–13 (2019)
Article Google Scholar
Liu, X., Bauer, S., Velten, A.: Phasor field diffraction based reconstruction for fast non-line-of-sight imaging systems. Nat. Commun. 11(1), 1–13 (2020)
Article Google Scholar
Liu, X., et al.: Non-line-of-sight imaging using phasor-field virtual wave optics. Nature 572(7771), 620–623 (2019)
Article Google Scholar
Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: Proceedings of IROS (2015)
Google Scholar
McManamon, P.: Review of ladar: a historic, yet emerging, sensor technology with rich phenomenology. Opt. Eng. 51(6), 060901 (2012)
Article Google Scholar
Morovic, J., Shaw, J., Sun, P.L.: A fast, non-iterative and exact histogram matching algorithm. Pattern Recognit. Lett. 23(1–3), 127–135 (2002)
Article MATH Google Scholar
Niclass, C., Rochas, A., Besse, P.A., Charbon, E.: Design and characterization of a CMOS 3-D image sensor based on single photon avalanche diodes. IEEE J. Solid-State Circuits 40(9), 1847–1854 (2005)
Article Google Scholar
Nikolova, M., Wen, Y.W., Chan, R.: Exact histogram specification for digital images using a variational approach. J. Math. Imaging. Vis. 46(3), 309–325 (2013)
Article MathSciNet MATH Google Scholar
O’Connor, D.V., Phillips, D.: Time-Correlated Single Photon Counting. Academic Press, London (1984)
Google Scholar
O’Toole, M., Heide, F., Lindell, D.B., Zang, K., Diamond, S., Wetzstein, G.: Reconstructing transient images from single-photon sensors. In: Proceedings of CVPR (2017)
Google Scholar
O’Toole, M., Lindell, D.B., Wetzstein, G.: Confocal non-line-of-sight imaging based on the light-cone transform. Nature 555(7696), 338–341 (2018)
Article Google Scholar
Pawlikowska, A.M., Halimi, A., Lamb, R.A., Buller, G.S.: Single-photon three-dimensional imaging at up to 10 kilometers range. Opt. Express 25(10), 11919–11931 (2017)
Article Google Scholar
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of CVPR (2016)
Google Scholar
Rapp, J., Ma, Y., Dawson, R.M.A., Goyal, V.K.: Dead time compensation for high-flux depth imaging. In: Proceedings of ICASSP (2019)
Google Scholar
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: Proceedings of CVPR (2012)
Google Scholar
Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs. In: Proceedings of CVPR (2006)
Google Scholar
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings of NeurIPS (2006)
Google Scholar
Shin, D., Kirmani, A., Goyal, V.K., Shapiro, J.H.: Photon-efficient computational 3-D and reflectivity imaging with single-photon detectors. IEEE Trans. Computat. Imag. 1(2), 112–125 (2015)
Article MathSciNet Google Scholar
Shin, D., et al.: Photon-efficient imaging with a single-photon camera. Nat. Commun. 7, 12046 (2016)
Article Google Scholar
Shrivastava, A., Gupta, A.: Building part-based object detectors via 3D geometry. In: Proceedings of ICCV (2013)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Proceedings of ECCV (2012)
Google Scholar
Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Proceedings of ECCV (2014)
Google Scholar
Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: Proceedings of CVPR (2016)
Google Scholar
Stoppa, D., Pancheri, L., Scandiuzzo, M., Gonzo, L., Dalla Betta, G.F., Simoni, A.: A CMOS 3-D imager based on single photon avalanche diode. IEEE Trans. Circuits Syst. I Reg. Papers 54(1), 4–12 (2007)
Google Scholar
Sun, Z., Lindell, D.B., Solgaard, O., Wetzstein, G.: Spadnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation. Opt. Express 28(10), 14948–14962 (2020)
Article Google Scholar
Swoboda, P., Schnörr, C.: Convex variational image restoration with histogram priors. SIAM J. Imaging Sci. 6(3), 1719–1735 (2013)
Article MathSciNet MATH Google Scholar
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Heidelberg (2010)
Google Scholar
Veerappan, C., et al.: A $160 \times 128$ single-photon image sensor with on-pixel 55ps 10b time-to-digital converter. In: Proceedings of ISSCC (2011)
Google Scholar
Wu, Y., Boominathan, V., Chen, H., Sankaranarayanan, A., Veeraraghavan, A.: PhaseCam3D–learning phase masks for passive single view depth estimation. In: Proceedings of ICCP (2019)
Google Scholar
Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of CVPR (2015)
Google Scholar
Xin, S., Nousias, S., Kutulakos, K.N., Sankaranarayanan, A.C., Narasimhan, S.G., Gkioulekas, I.: A theory of Fermat paths for non-line-of-sight shape reconstruction. In: Proceedings of CVPR (2019)
Google Scholar
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Proceedings of CVPR (2017)
Google Scholar
Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., Ricci, E.: Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of CVPR (2018)
Google Scholar
Zhang, C., Lindner, S., Antolovic, I., Wolf, M., Charbon, E.: A CMOS SPAD imager with collision detection and 128 dynamically reallocating TDCs for single-photon counting and 3D time-of-flight imaging. Sensors 18(11), 4016 (2018)
Google Scholar
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM Trans. Graph. 9(4) (2017)
Google Scholar

Download references

Acknowledgments

D.L. was supported by a Stanford Graduate Fellowship. C.M. was supported by an ORISE Intelligence Community Postdoctoral Fellowship. G.W. was supported by an NSF CAREER Award (IIS 1553333), a Sloan Fellowship, by the KAUST Office of Sponsored Research through the Visual Computing Center CCF grant, and a PECASE by the ARL.

Author information

Authors and Affiliations

Stanford University, Stanford, CA, USA
Mark Nishimura, David B. Lindell, Christopher Metzler & Gordon Wetzstein

Authors

Mark Nishimura
View author publications
You can also search for this author in PubMed Google Scholar
David B. Lindell
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Metzler
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Wetzstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Nishimura .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 34424 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nishimura, M., Lindell, D.B., Metzler, C., Wetzstein, G. (2020). Disambiguating Monocular Depth Estimation with a Single Transient. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-58589-1_9
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Disambiguating Monocular Depth Estimation with a Single Transient

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Monocular 3D Object Detection with Depth from Motion

Probabilistic multi-modal depth estimation based on camera–LiDAR sensor fusion

Aberration-robust monocular passive depth sensing using a meta-imaging camera

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (zip 34424 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Disambiguating Monocular Depth Estimation with a Single Transient

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Monocular 3D Object Detection with Depth from Motion

Probabilistic multi-modal depth estimation based on camera–LiDAR sensor fusion

Aberration-robust monocular passive depth sensing using a meta-imaging camera

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (zip 34424 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation