Attention U-Net Oriented Towards 3D Depth Estimation

Ocsa Sánchez, Leonel Jaime; Gutiérrez Cáceres, Juan Carlos

doi:10.1007/978-3-031-62269-4_32

Leonel Jaime Ocsa Sánchez¹⁰ &
Juan Carlos Gutiérrez Cáceres¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1018))

Included in the following conference series:

Science and Information Conference

182 Accesses

Abstract

Advancements in 3D reconstruction, applied to depth estimation in both indoor and outdoor environments, have achieved significant performance in various applications. Typically, outdoor environment reconstruction has relied on traditional approaches such as Structure from Motion (SFM) and its variants. In contrast, indoor environment reconstruction has shifted towards the use of depth-sensing devices. However, these devices have exhibited limitations due to environmental factors, such as lighting conditions. Recent approaches and optimizations have led to the development of novel methods that leverage Convolutional Neural Networks (CNNs), irrespective of whether the environment is enclosed or open. These methods are capable of complementing both approaches. In light of these advancements, alternatives have emerged, including the integration of Attention Layers. These have seen substantial proposals that have evolved over recent years. Thus, this paper proposes a method for 3D depth estimation with a focus on indoor and outdoor images. The goal is to generate high-detail and precise depth maps of real-world scenes using a modified U-Net network with the inclusion of a custom attention mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing depth estimation with attention U-Net

Article 20 July 2024

Guiding Monocular Depth Estimation Using Depth-Attention Volume

REAL-NET: A Monochromatic Depth Estimation Using REgional Attention and Local Feature Mapping

References

Aich, S., Vianney, J.M.U., Islam, M.A., Liu, M.K.B.: Bidirectional attention network for monocular depth estimation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11746–11752. IEEE (2021)
Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2018)
Asif, S., Yi, W., Ain, Q.U., Hou, J., Yi, T., Si, J.: Improving effectiveness of different deep transfer learning-based models for detecting brain tumors from MR images. IEEE Access 10, 34716–34730 (2022)
Google Scholar
Chandel, A., Yadav, S.K.S., Agarwal, A.K., Shukla, S., Poddar, J.: 3D reconstruction of heritage site using terrestrial LiDAR scanner (TLs): a case study of a section of Gulistan-E-Iram, Lucknow. Sustain. Infrastruct. Dev., 79–89 (2022)
Google Scholar
Chen, Y., Mancini, M., Zhu, X., Akata, Z.: Semi-supervised and unsupervised deep visual learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Google Scholar
Cuenat, S., Couturier, R.: Convolutional neural network (CNN) vs vision transformer (ViT) for digital holography. In: 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), pp. 235–240. IEEE (2022)
Google Scholar
Dhillon, A., Verma, G.K.: A multiple object recognition approach via DenseNet-161 model. In: Smart Electrical and Mechanical Systems, pp. 39–64. Elsevier (2022)
Google Scholar
Eltner, A., Sofia, G.: Structure from motion photogrammetric technique. In: Developments in Earth Surface Processes, vol. 23, pp. 1–24. Elsevier (2020)
Google Scholar
Lore, K.G., Reddy, K., Giering, M., Bernal, E.A.: Generative adversarial networks for depth map estimation from RGB video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1177–1185 (2018)
Google Scholar
Huang, B., Zheng, J.-Q., Giannarou, S., Elson, D.S.: H-Net: unsupervised attention-based stereo depth estimation leveraging epipolar geometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4467 (2022)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Hui, T.-W.: RM-depth: unsupervised learning of recurrent monocular depth in dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1675–1684 (2022)
Google Scholar
Iglhaut, J., Cabo, C., Puliti, S., Piermattei, L., O’Connor, J., Rosette, J.: Structure from motion photogrammetry in forestry: a review. Current Forestry Rep. 5(3), 155–168 (2019)
Article Google Scholar
Javidnia, H., Corcoran, P.: Accurate depth map estimation from small motions. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2453–2461 (2017)
Google Scholar
Ji, P., Li, R., Bhanu, B., Xu, Y.: MonoIndoor: towards good practice of self-supervised monocular depth estimation for indoor environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12787–12796 (2021)
Google Scholar
Jiang, S., Jiang, C., Jiang, W.: Efficient structure from motion for large-scale UAV images: a review and a comparison of SFM tools. ISPRS J. Photogramm. Remote. Sens. 167, 230–251 (2020)
Article Google Scholar
Khan, M.F.F., Devulapally, A., Advani, S., Narayanan, V.: Robust multimodal depth estimation using transformer based generative adversarial networks. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3559–3568 (2022)
Google Scholar
Kwak, J., Sung, Y.: DeepLabV3-refiner-based semantic segmentation model for dense 3D point clouds. Remote Sens. 13(8), 1565 (2021)
Article Google Scholar
Li, B.: Facial expression recognition by DenseNet-121. In: Multi-Chaos, Fractal and Multi-Fractional Artificial Intelligence of Different Complex Systems, pp. 263–276. Elsevier (2022)
Google Scholar
Li, R., Ji, P., Yi, X., Bhanu, B.: MonoIndoor++: towards better practice of self-supervised monocular depth estimation for indoor environments. IEEE Trans. Circuits Syst. Video Technol. 33(2), 830–846 (2022)
Article Google Scholar
Makarov, I., Bakhanova, M., Nikolenko, S., Gerasimova, O.: Self-supervised recurrent depth estimation with attention mechanisms. PeerJ Comput. Sci. 8, e865 (2022)
Article Google Scholar
Mejia-Trujillo, J.D., et al.: Kinect™ and intel RealSense™ D435 comparison: a preliminary study for motion analysis. In: 2019 IEEE International Conference on E-Health Networking, Application & Services (HealthCom), pp. 1–4. IEEE (2019)
Google Scholar
Mousavi, M., Khanal, A., Estrada, R.: AI playground: unreal engine-based data ablation tool for deep learning. In: Bebis, G., et al. (eds.) ISVC 2020. LNCS, vol. 12510, pp. 518–532. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64559-5_41
Chapter Google Scholar
Murez, Z., van As, T., Bartolozzi, J., Sinha, A., Badrinarayanan, V., Rabinovich, A.: Atlas: end-to-end 3D scene reconstruction from posed images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 414–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_25
Chapter Google Scholar
Nguyen, H., Tran, T., Wang, Y., Wang, Z.: Three-dimensional shape reconstruction from single-shot speckle image using deep convolutional neural networks. Opt. Lasers Eng. 143, 106639 (2021)
Article Google Scholar
Puscas, M.M., Xu, D., Pilzer, A., Sebe, N.: Structured coupled generative adversarial networks for unsupervised monocular depth estimation. In: 2019 International Conference on 3D Vision (3DV), pp. 18–26. IEEE (2019)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sagar, A.: Monocular depth estimation using multi scale neural network and feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 656–662 (2022)
Google Scholar
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)
Google Scholar
Tao, B., Shen, Y., Tong, X., Jiang, D., Chen, B.: Depth estimation using feature pyramid U-Net and polarized self-attention for road scenes. Photonics 9, 468 (2022)
Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21
Chapter Google Scholar
Varma, A., Chawla, H., Zonooz, B., Arani, E.: Transformers in self-supervised monocular depth estimation with unknown camera intrinsics. arXiv preprint arXiv:2202.03131 (2022)
Vasiljevic, I., et al.: DIODE: a dense indoor and outdoor depth dataset. arXiv preprint arXiv:1908.00463 (2019)
Vaswani, A.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, S.-H., Zhang, Y.-D.: DenseNet-201-based deep neural network with composite learning factor and precomputation for multiple sclerosis classification. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 16(2s), 1–19 (2020)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Google Scholar
Zhang, Y.-D., Satapathy, S.C., Zhang, X., Wang, S.-H.: Covid-19 diagnosis via DenseNet and optimization of transfer learning setting. Cogn. Comput., 1–17 (2021)
Google Scholar
Zhou, T., Ye, X., Lu, H., Zheng, X., Qiu, S., Liu, Y., et al.: Dense convolutional network and its application in medical image analysis. BioMed Res. Int. 2022 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Católica San Pablo, Arequipa, Peru
Leonel Jaime Ocsa Sánchez & Juan Carlos Gutiérrez Cáceres

Authors

Leonel Jaime Ocsa Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Gutiérrez Cáceres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonel Jaime Ocsa Sánchez .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ocsa Sánchez, L.J., Gutiérrez Cáceres, J.C. (2024). Attention U-Net Oriented Towards 3D Depth Estimation. In: Arai, K. (eds) Intelligent Computing. SAI 2024. Lecture Notes in Networks and Systems, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-62269-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-62269-4_32
Published: 21 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62268-7
Online ISBN: 978-3-031-62269-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Attention U-Net Oriented Towards 3D Depth Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing depth estimation with attention U-Net

Guiding Monocular Depth Estimation Using Depth-Attention Volume

REAL-NET: A Monochromatic Depth Estimation Using REgional Attention and Local Feature Mapping

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Attention U-Net Oriented Towards 3D Depth Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing depth estimation with attention U-Net

Guiding Monocular Depth Estimation Using Depth-Attention Volume

REAL-NET: A Monochromatic Depth Estimation Using REgional Attention and Local Feature Mapping

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation