Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Scale ambiguity is a difficulty in monocular visual odometry. This challenge has not yet been well solved, regardless of whether it is a traditional method or a learning-based method. Simultaneously, the performance of unsupervised monocular visual odometry in tricky situations is limited. To tackle these issues, we propose an accurate and efficient end-to-end system that can favorably deal with the problem of scale ambiguity in unsupervised monocular visual odometry, particularly in long-sequence videos in challenging environments. To acquire the depth map and optical flow, we first employ the depth network and the optical flow network. Then, we combine the optical flow information and the camera height using a ground plane geometric model to recover the absolute scale of the camera translation and depth map using our scale recovery algorithm. Finally, we use the recovered scale and our flow-depth loss to unify the training and testing processes across the entire network. Extensive experiments and analyses reveal that, when compared to existing unsupervised approaches, our method achieves state-of-the-art performance on the KITTI dataset not only for depth estimation and optical flow estimation but also for monocular visual odometry.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Yin J, Luo D, Yan F, Zhuang Y (2022) A novel lidar-assisted monocular visual slam framework for mobile robots in outdoor environments. IEEE Trans Instrum Meas 71:1–11

    Google Scholar 

  2. Shamwell EJ, Lindgren K, Leung S, Nothwang WD (2019) Unsupervised deep visual-inertial odometry with online error correction for rgb-d imagery. IEEE Trans Pattern Anal Mach Intell 42(10):2478–2493

    Article  Google Scholar 

  3. Beauvisage A, Ahiska K, Aouf N (2021) Robust multispectral visual-inertial navigation with visual odometry failure recovery. IEEE Trans Intell Transp Syst 23(7):9089–9101

    Article  Google Scholar 

  4. Gong X, Liu Y, Wu Q, Huang J, Zong H, Wang J (2020) An accurate, robust visual odometry and detail-preserving reconstruction system. IEEE Trans Multimedia 23:2820–2832

    Article  Google Scholar 

  5. Fraundorfer F, Scaramuzza D (2011) Visual odometry: Part i: The first 30 years and fundamentals. IEEE Robot Autom Mag 18(4):80–92

    Google Scholar 

  6. Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I, Leonard JJ (2016) Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans Robot 32(6):1309–1332

    Article  Google Scholar 

  7. Saputra MRU, Markham A, Trigoni N (2018) Visual slam and structure from motion in dynamic environments: a survey. ACM Comp Surv (CSUR) 51(2):1–36

    Google Scholar 

  8. Sualeh M, Kim G-W (2019) Simultaneous localization and mapping in the epoch of semantics: a survey. Int J Control Autom Syst 17(3):729–742

    Article  Google Scholar 

  9. Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Robot 33(5):1255–1262

    Article  Google Scholar 

  10. Jakob Engel DC Vladlen Koltun (2016) Direct sparse odometry. arXiv preprint arXiv:1607.02565

  11. Wang R, Schworer M, Cremers D (2017) Stereo dso: Large-scale direct sparse visual odometry with stereo cameras. In: 2017 IEEE international conference on computer vision (ICCV), pp. 3923–3931

  12. Song S, Chandraker M, Guest CC (2015) High accuracy monocular sfm and scale correction for autonomous driving. IEEE Trans Pattern Anal Mach Intell 38(4):730–743

    Article  Google Scholar 

  13. Bian J, Li Z, Wang N, Zhan H, Shen C, Cheng M-M, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. Adv Neural Inf Process Syst. 32

  14. Al Hage J, Mafrica S, El Najjar MEB, Ruffier F (2018) Informational framework for minimalistic visual odometry on outdoor robot. IEEE Trans Instrum Meas 68(8):2988–2995

    Article  Google Scholar 

  15. Chiodini S, Giubilato R, Pertile M, Debei S (2020) Retrieving scale on monocular visual odometry using low-resolution range sensors. IEEE Trans Instrum Meas 69(8):5875–5889

    Article  Google Scholar 

  16. Yang T, Ren Q, Zhang F, Xie B, Ren H, Li J, Zhang Y (2018) Hybrid camera array-based uav auto-landing on moving ugv in gps-denied environment. Remote Sens 10(11):1829

    Article  Google Scholar 

  17. Lentaris G, Stamoulias I, Soudris D, Lourakis M (2015) Hw/sw codesign and fpga acceleration of visual odometry algorithms for rover navigation on mars. IEEE Trans Circuits Syst Video Technol 26(8):1563–1577

    Article  Google Scholar 

  18. Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 2043–2050. IEEE

  19. Wang S, Clark R, Wen H, Trigoni N (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Rob Res 37(4–5):513–542

    Article  Google Scholar 

  20. Konda KR, Memisevic R (2015) Learning visual odometry with a convolutional network. In: VISAPP (1), pp 486–490

  21. Costante G, Mancini M, Valigi P, Ciarfuglia TA (2015) Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Robot Autom Lett 1(1):18–25

    Article  Google Scholar 

  22. Saputra MRU, De Gusmao PP, Wang S, Markham A, Trigoni N (2019) Learning monocular visual odometry through geometry-aware curriculum learning. In: 2019 international conference on robotics and automation (ICRA), pp 3549–3555. IEEE

  23. Xue F, Wang X, Li S, Wang Q, Wang J, Zha H (2019) Beyond tracking: Selecting memory and refining poses for deep visual odometry. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8575–8583

  24. Saputra MRU, De Gusmao PP, Almalioglu Y, Markham A, Trigoni N (2019) Distilling knowledge from a deep pose regressor network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 263–272

  25. Wang G, Zhang C, Wang H, Wang J, Wang Y, Wang X (2020) Unsupervised learning of depth, optical flow and pose with occlusion from 3d geometry. IEEE Trans Intell Transp Syst 23(1):308–320

    Article  Google Scholar 

  26. Jiang S, Campbell D, Liu M, Gould S, Hartley R (2020) Joint unsupervised learning of optical flow and egomotion with bi-level optimization. In: 2020 international conference on 3D vision (3DV), pp 682–691. IEEE

  27. Luo C, Yang Z, Wang P, Wang Y, Xu W, Nevatia R, Yuille A (2019) Every pixel counts++: joint learning of geometry and motion with 3d holistic understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2624–2641

    Article  Google Scholar 

  28. Ranjan A, Jampani V, Balles L, Kim K, Sun D, Wulff J, Black MJ (2019) Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12240–12249

  29. Zhang J-N, Su Q-X, Liu P-Y, Ge H-Y, Zhang Z-F (2019) Mudeepnet: unsupervised learning of dense depth, optical flow and camera pose using multi-view consistency loss. Int J Control Autom Syst 17(10):2586–2596

    Article  Google Scholar 

  30. Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 340–349

  31. Li R, Wang S, Long Z, Gu D (2018) Undeepvo: Monocular visual odometry through unsupervised deep learning. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 7286–7291. IEEE

  32. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858

  33. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  34. Kendall A, Grimes M, Cipolla R (2015) Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946

  35. Kendall A, Cipolla R (2017) Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5974–5983

  36. Pearlmutter BA (1989) Learning state space trajectories in recurrent neural networks. Neural Comput 1(2):263–269

    Article  Google Scholar 

  37. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  38. Walch F, Hazirbas C, Leal-Taixe L, Sattler T, Hilsenbeck S, Cremers D (2017) Image-based localization using lstms for structured feature correlation. In: Proceedings of the IEEE international conference on computer vision, pp 627–637

  39. Clark R, Wang S, Markham A, Trigoni N, Wen H (2017) Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6856–6864

  40. Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766

  41. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst. 27

  42. Agrawal P, Carreira J, Malik J (2015) Learning to see by moving. In: Proceedings of the IEEE international conference on computer vision, pp 37–45

  43. Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170

  44. Liu F, Shen C, Lin G, Reid I (2015) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039

    Article  Google Scholar 

  45. Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: Depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5038–5047

  46. Yin Z, Shi J (2018) Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1983–1992

  47. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 270–279

  48. Zou Y, Luo Z, Huang J-B (2018) Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European conference on computer vision (ECCV), pp 36–53

  49. Chen Y, Schmid C, Sminchisescu C (2019) Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7063–7072

  50. Almalioglu Y, Saputra MRU, De Gusmao PP, Markham A, Trigoni N (2019) Ganvo: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: 2019 international conference on robotics and automation (ICRA), pp 5474–5480. IEEE

  51. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144

    Article  MathSciNet  Google Scholar 

  52. Li S, Xue F, Wang X, Yan Z, Zha H (2019) Sequential adversarial learning for self-supervised deep visual odometry. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2851–2860

  53. Sun D, Yang X, Liu M-Y, Kautz J (2018) Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8934–8943

  54. Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3828–3838

  55. Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans Pattern Anal Mach Intell

  56. Li Z, Dekel T, Cole F, Tucker R, Snavely N, Liu C, Freeman WT (2020) Mannequinchallenge: learning the depths of moving people by watching frozen people. IEEE Trans Pattern Anal Mach Intell 43(12):4229–4241

    Article  Google Scholar 

  57. Klodt M, Vedaldi A (2018) Supervising the new with the old: learning sfm from sfm. In: Proceedings of the European conference on computer vision (ECCV), pp 698–713

  58. Li Z, Snavely N (2018) Megadepth: Learning single-view depth prediction from internet photos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2041–2050

  59. Lucas BD, Kanade T, et al (1981) An iterative image registration technique with an application to stereo vision vol. 81. Vancouver

  60. Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28

  61. Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5667–5675

  62. Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: geometry to the rescue. In: European conference on computer vision, pp 740–756. Springer

  63. Hartley RI (1997) In defense of the eight-point algorithm. IEEE Trans Pattern Anal Mach Intell 19(6):580–593

    Article  Google Scholar 

  64. Garg R, Wadhwa N, Ansari S, Barron JT (2019) Learning single camera depth estimation using dual-pixels. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7628–7637

  65. Lindeberg T (1994) Scale-space theory: a basic tool for analyzing structures at different scales. J Appl Stat 21(1–2):225–270

    Article  Google Scholar 

  66. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Rob. Res. 32(11):1231–1237

    Article  Google Scholar 

  67. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch

  68. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  69. Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4161–4170

  70. Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2462–2470

  71. Janai J, Guney F, Ranjan A, Black M, Geiger A (2018) Unsupervised learning of multi-frame optical flow with occlusions. In: Proceedings of the European conference on computer vision (ECCV), pp 690–706

  72. Meister S, Hur J, Roth S (2018) Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  73. Geiger A, Ziegler J, Stiller C (2011) Stereoscan: Dense 3d reconstruction in real-time. In: 2011 IEEE intelligent vehicles symposium (IV), pp 963–968. Ieee

  74. Umeyama S (1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Trans Pattern Anal Mach Intell 13:376–380

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundations of China under Grants 61733013, 62073245 and U1713211, in part by the Jiangsu Key Research and Development Project under Grant BE2020101.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qijun Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Shen, M. & Chen, Q. Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry. Neural Process Lett 55, 9743–9764 (2023). https://doi.org/10.1007/s11063-023-11224-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11224-1

Keywords