Transferring knowledge from monocular completion for self-supervised monocular depth estimation

Sun, Lin; Li, Yi; Liu, Bingzheng; Xu, Liying; Zhang, Zhe; Zhu, Jie

doi:10.1007/s11042-021-11212-4

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
Published: 24 July 2021

Volume 81, pages 42485–42495, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lin Sun¹,
Yi Li¹,
Bingzheng Liu¹,
Liying Xu¹,
Zhe Zhang¹ &
…
Jie Zhu¹

534 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Monocular depth estimation is a very challenging task in computer vision, with the goal to predict per-pixel depth from a single RGB image. Supervised learning methods require large amounts of depth measurement data, which are time-consuming and expensive to obtain. Self-supervised methods are showing great promise, exploiting geometry to provide supervision signals through image warping. Moreover, several works leverage on other visual tasks (e.g. stereo matching and semantic segmentation) to further advance self-supervised monocular depth estimation. In this paper, we propose a novel framework utilizing monocular depth completion as an auxiliary task to assist monocular depth estimation. In particular, a knowledge transfer strategy is employed to enable monocular depth estimation to benefit from the effective feature representations learned by monocular depth completion task. The correlation between monocular depth completion and monocular depth estimation could be fully and effectively utilized in this framework. Only unlabeled stereo images are used in the proposed framework, which achieves a self-supervised learning paradigm. Experimental results on publicly available dataset prove that the proposed approach achieves superior performance to state-of-the-art self-supervised methods and comparable performance with supervised methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Article 30 May 2020

High-Precision Self-supervised Monocular Depth Estimation with Rich-Resource Prior

References

Abadi M, Agarwal A, Barham P, et al. (2015) TensorFlow: Large scale machine learning on heterogeneous systems
Atapour-Abarghouei A, Breckon TP (2018) Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2800–2810
Cao Y, Wu Z, Shen C (2018) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circuits Syst Video Tech 28(11):3174–3182
Article Google Scholar
Chen P, Liu AH, Liu Y, Wang YF (2019) Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2619–2627
Cordts M, Omran M, Ramos S et al (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366–2374
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2002–2011
Garg R, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: Geometry to the rescue. In: European conference on computer vision, pp 740–756
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3354–3361
Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 270–279
Godard C, Aodha OM, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3827–3837
Guizilini V, Ambrus R, Pillai S et al (2020) 3D packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2482–2491
Guizilini V, Hou R, Li J et al (2020) Semantically-guided representation learning for self-supervised monocular depth. In: Proceedings of the eighth international conference on learning representations, pp 1–14
Guo X, Li H, Yi S, Ren J, Wang X (2018) Learning monocular depth by distilling cross-domain stereo networks. In: European conference on computer vision, pp 484–500
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Jiang H, Larsson G, Shakhnarovich M, Miller E (2018) Self-supervised relative depth learning for urban scene understanding. In: European conference on computer vision, pp 20–37
Kuznietsov Y, Stckler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6647–6655
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: International conference on 3D vision, pp 239–248
Lei J, Li X, Peng B, Fang L, Ling N, Huang Q (2021) Deep spatial-spectral subspace clustering for hyperspectral image. IEEE Transactions on Circuits and Systems for Video Technology 31(7):2686–2697
Article Google Scholar
Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (IndRNN): Building a longer and deeper RNN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5457–5466
Li B, Shen C, Dai Y et al (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1119–1127
Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170
Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Machine Intell 38(10):2024–2039
Article Google Scholar
Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5667–5675
Mehta I, Sakurikar P, Narayanan PJ (2018) Structured adversarial training for unsupervised monocular depth estimation. In: International conference on 3D vision, pp 314–323
Mei X, Sun X, Zhou M, Jiao S, Wang H, Zhang X (2011) On building an accurate stereo matching system on graphics hardware. In: Proceedings of the IEEE international conference on computer vision workshops, pp 467–474
Owen AB (2007) A robust hybrid of lasso and ridge regression. Contemp Math 443(7):59–72
Article MathSciNet MATH Google Scholar
Pan Z, Yu W, Lei J, Ling N, Kwong S (2021) TSAN: Synthesized view quality enhancement via two-stream attention network for 3D-HEVC. IEEE Transactions on Circuits and Systems for Video Technology. 1–14 https://doi.org/10.1109/TCSVT.2021.3057518
Peng B, Lei J, Fu H, Jia Y, Zhang Z, Li Y (2021) Deep video action clustering via spatio-temporal feature learning. Neurocomputing 1–9. https://doi.org/10.1016/j.neucom.2020.05.123
Pilzer A, Lathuiliere S, Sebe N et al (2019) Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9760–9769
Poggi M, Tosi F, Mattoccia S (2018) Learning monocular depth estimation with unsupervised trinocular assumptions. In: International conference on 3D vision, pp 324–333
Ramirez P, Poggi M, Tosi F, Mattoccia S, Stefano LD (2018) Geometry meets semantics for semi-supervised monocular depth estimation. In: Asian Conference on Computer Vision, pp 298–313
Ranjan A, Jampani V, Kim K, Sun D, Wulff, Black MJ (2019) Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12232–12241
Russakovsky O, Deng J, Su H (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Saxena A, Chung SH, Ng AY (2008) 3-D depth reconstruction from a single still image. Int J Comput Vision 76(1):53–69
Article Google Scholar
Scharstein D, Szeliski R (2003) High-accuracy stereo depth maps using structured light. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 195–202
Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: European conference on computer vision, pp 746–760
Tonioni A, Poggi M, Mattoccia S, Stefano LD (2020) Unsupervised domain adaptation for depth prediction from images. IEEE Trans Pattern Anal Machine Intell 42(10):2396–2409
Article Google Scholar
Tosi F, Aleotti F, Poggi M et al (2019) Learning monocular depth estimation infusing traditional stereo knowledge. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9791–9801
Tosi F, Poggi M, Tonioni A, Stefano LD, Mattoccia S (2017) Learning confidence measures in the wild. In: 28th British machine vision conference
Wang Z (2004) Image quality assessment: from error visibility to structural similarity, vol 13, pp 600–612
Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille A (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2800–2809
Wong A, Hong B, Soatto S (2019) Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5644–5653
Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2019) Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Trans Pattern Anal Machine Intell 41(6):1426–1440
Article Google Scholar
Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5667–5675
Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry with edge-aware depth-normal consistency. In: 32nd AAAI Conference on artificial intelligence, pp 7493–7500
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6612–6621
Zhu J, Wang L, Yang R, Davis J E, Pan Z (2011) Reliability fusion of time-of-flight depth and stereo geometry for high quality depth maps. IEEE Trans Pattern Anal Machine Intell 33(7):1400–1414
Article Google Scholar
Zou Y, Luo Z, Huang J-B (2018) DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In: European conference on computer vision, pp 36–53

Download references

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Lin Sun, Yi Li, Bingzheng Liu, Liying Xu, Zhe Zhang & Jie Zhu

Authors

Lin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Bingzheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liying Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Natural Science Foundation of Tianjin (No.18ZXZNGX00110).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, L., Li, Y., Liu, B. et al. Transferring knowledge from monocular completion for self-supervised monocular depth estimation. Multimed Tools Appl 81, 42485–42495 (2022). https://doi.org/10.1007/s11042-021-11212-4

Download citation

Received: 28 April 2021
Revised: 15 June 2021
Accepted: 29 June 2021
Published: 24 July 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11042-021-11212-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

High-Precision Self-supervised Monocular Depth Estimation with Rich-Resource Prior

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

High-Precision Self-supervised Monocular Depth Estimation with Rich-Resource Prior

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation