Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Attention U-Net Oriented Towards 3D Depth Estimation

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2024)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1018))

Included in the following conference series:

  • 182 Accesses

Abstract

Advancements in 3D reconstruction, applied to depth estimation in both indoor and outdoor environments, have achieved significant performance in various applications. Typically, outdoor environment reconstruction has relied on traditional approaches such as Structure from Motion (SFM) and its variants. In contrast, indoor environment reconstruction has shifted towards the use of depth-sensing devices. However, these devices have exhibited limitations due to environmental factors, such as lighting conditions. Recent approaches and optimizations have led to the development of novel methods that leverage Convolutional Neural Networks (CNNs), irrespective of whether the environment is enclosed or open. These methods are capable of complementing both approaches. In light of these advancements, alternatives have emerged, including the integration of Attention Layers. These have seen substantial proposals that have evolved over recent years. Thus, this paper proposes a method for 3D depth estimation with a focus on indoor and outdoor images. The goal is to generate high-detail and precise depth maps of real-world scenes using a modified U-Net network with the inclusion of a custom attention mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 159.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aich, S., Vianney, J.M.U., Islam, M.A., Liu, M.K.B.: Bidirectional attention network for monocular depth estimation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11746–11752. IEEE (2021)

    Google Scholar 

  2. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2018)

  3. Asif, S., Yi, W., Ain, Q.U., Hou, J., Yi, T., Si, J.: Improving effectiveness of different deep transfer learning-based models for detecting brain tumors from MR images. IEEE Access 10, 34716–34730 (2022)

    Google Scholar 

  4. Chandel, A., Yadav, S.K.S., Agarwal, A.K., Shukla, S., Poddar, J.: 3D reconstruction of heritage site using terrestrial LiDAR scanner (TLs): a case study of a section of Gulistan-E-Iram, Lucknow. Sustain. Infrastruct. Dev., 79–89 (2022)

    Google Scholar 

  5. Chen, Y., Mancini, M., Zhu, X., Akata, Z.: Semi-supervised and unsupervised deep visual learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022)

    Google Scholar 

  6. Cuenat, S., Couturier, R.: Convolutional neural network (CNN) vs vision transformer (ViT) for digital holography. In: 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), pp. 235–240. IEEE (2022)

    Google Scholar 

  7. Dhillon, A., Verma, G.K.: A multiple object recognition approach via DenseNet-161 model. In: Smart Electrical and Mechanical Systems, pp. 39–64. Elsevier (2022)

    Google Scholar 

  8. Eltner, A., Sofia, G.: Structure from motion photogrammetric technique. In: Developments in Earth Surface Processes, vol. 23, pp. 1–24. Elsevier (2020)

    Google Scholar 

  9. Lore, K.G., Reddy, K., Giering, M., Bernal, E.A.: Generative adversarial networks for depth map estimation from RGB video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1177–1185 (2018)

    Google Scholar 

  10. Huang, B., Zheng, J.-Q., Giannarou, S., Elson, D.S.: H-Net: unsupervised attention-based stereo depth estimation leveraging epipolar geometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4467 (2022)

    Google Scholar 

  11. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  12. Hui, T.-W.: RM-depth: unsupervised learning of recurrent monocular depth in dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1675–1684 (2022)

    Google Scholar 

  13. Iglhaut, J., Cabo, C., Puliti, S., Piermattei, L., O’Connor, J., Rosette, J.: Structure from motion photogrammetry in forestry: a review. Current Forestry Rep. 5(3), 155–168 (2019)

    Article  Google Scholar 

  14. Javidnia, H., Corcoran, P.: Accurate depth map estimation from small motions. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2453–2461 (2017)

    Google Scholar 

  15. Ji, P., Li, R., Bhanu, B., Xu, Y.: MonoIndoor: towards good practice of self-supervised monocular depth estimation for indoor environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12787–12796 (2021)

    Google Scholar 

  16. Jiang, S., Jiang, C., Jiang, W.: Efficient structure from motion for large-scale UAV images: a review and a comparison of SFM tools. ISPRS J. Photogramm. Remote. Sens. 167, 230–251 (2020)

    Article  Google Scholar 

  17. Khan, M.F.F., Devulapally, A., Advani, S., Narayanan, V.: Robust multimodal depth estimation using transformer based generative adversarial networks. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3559–3568 (2022)

    Google Scholar 

  18. Kwak, J., Sung, Y.: DeepLabV3-refiner-based semantic segmentation model for dense 3D point clouds. Remote Sens. 13(8), 1565 (2021)

    Article  Google Scholar 

  19. Li, B.: Facial expression recognition by DenseNet-121. In: Multi-Chaos, Fractal and Multi-Fractional Artificial Intelligence of Different Complex Systems, pp. 263–276. Elsevier (2022)

    Google Scholar 

  20. Li, R., Ji, P., Yi, X., Bhanu, B.: MonoIndoor++: towards better practice of self-supervised monocular depth estimation for indoor environments. IEEE Trans. Circuits Syst. Video Technol. 33(2), 830–846 (2022)

    Article  Google Scholar 

  21. Makarov, I., Bakhanova, M., Nikolenko, S., Gerasimova, O.: Self-supervised recurrent depth estimation with attention mechanisms. PeerJ Comput. Sci. 8, e865 (2022)

    Article  Google Scholar 

  22. Mejia-Trujillo, J.D., et al.: Kinect™ and intel RealSense™ D435 comparison: a preliminary study for motion analysis. In: 2019 IEEE International Conference on E-Health Networking, Application & Services (HealthCom), pp. 1–4. IEEE (2019)

    Google Scholar 

  23. Mousavi, M., Khanal, A., Estrada, R.: AI playground: unreal engine-based data ablation tool for deep learning. In: Bebis, G., et al. (eds.) ISVC 2020. LNCS, vol. 12510, pp. 518–532. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64559-5_41

    Chapter  Google Scholar 

  24. Murez, Z., van As, T., Bartolozzi, J., Sinha, A., Badrinarayanan, V., Rabinovich, A.: Atlas: end-to-end 3D scene reconstruction from posed images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 414–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_25

    Chapter  Google Scholar 

  25. Nguyen, H., Tran, T., Wang, Y., Wang, Z.: Three-dimensional shape reconstruction from single-shot speckle image using deep convolutional neural networks. Opt. Lasers Eng. 143, 106639 (2021)

    Article  Google Scholar 

  26. Puscas, M.M., Xu, D., Pilzer, A., Sebe, N.: Structured coupled generative adversarial networks for unsupervised monocular depth estimation. In: 2019 International Conference on 3D Vision (3DV), pp. 18–26. IEEE (2019)

    Google Scholar 

  27. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  28. Sagar, A.: Monocular depth estimation using multi scale neural network and feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 656–662 (2022)

    Google Scholar 

  29. Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)

    Google Scholar 

  30. Tao, B., Shen, Y., Tong, X., Jiang, D., Chen, B.: Depth estimation using feature pyramid U-Net and polarized self-attention for road scenes. Photonics 9, 468 (2022)

    Google Scholar 

  31. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21

    Chapter  Google Scholar 

  32. Varma, A., Chawla, H., Zonooz, B., Arani, E.: Transformers in self-supervised monocular depth estimation with unknown camera intrinsics. arXiv preprint arXiv:2202.03131 (2022)

  33. Vasiljevic, I., et al.: DIODE: a dense indoor and outdoor depth dataset. arXiv preprint arXiv:1908.00463 (2019)

  34. Vaswani, A.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  35. Wang, S.-H., Zhang, Y.-D.: DenseNet-201-based deep neural network with composite learning factor and precomputation for multiple sclerosis classification. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 16(2s), 1–19 (2020)

    Google Scholar 

  36. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Google Scholar 

  37. Zhang, Y.-D., Satapathy, S.C., Zhang, X., Wang, S.-H.: Covid-19 diagnosis via DenseNet and optimization of transfer learning setting. Cogn. Comput., 1–17 (2021)

    Google Scholar 

  38. Zhou, T., Ye, X., Lu, H., Zheng, X., Qiu, S., Liu, Y., et al.: Dense convolutional network and its application in medical image analysis. BioMed Res. Int. 2022 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonel Jaime Ocsa Sánchez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ocsa Sánchez, L.J., Gutiérrez Cáceres, J.C. (2024). Attention U-Net Oriented Towards 3D Depth Estimation. In: Arai, K. (eds) Intelligent Computing. SAI 2024. Lecture Notes in Networks and Systems, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-62269-4_32

Download citation

Publish with us

Policies and ethics