Abstract
Advancements in 3D reconstruction, applied to depth estimation in both indoor and outdoor environments, have achieved significant performance in various applications. Typically, outdoor environment reconstruction has relied on traditional approaches such as Structure from Motion (SFM) and its variants. In contrast, indoor environment reconstruction has shifted towards the use of depth-sensing devices. However, these devices have exhibited limitations due to environmental factors, such as lighting conditions. Recent approaches and optimizations have led to the development of novel methods that leverage Convolutional Neural Networks (CNNs), irrespective of whether the environment is enclosed or open. These methods are capable of complementing both approaches. In light of these advancements, alternatives have emerged, including the integration of Attention Layers. These have seen substantial proposals that have evolved over recent years. Thus, this paper proposes a method for 3D depth estimation with a focus on indoor and outdoor images. The goal is to generate high-detail and precise depth maps of real-world scenes using a modified U-Net network with the inclusion of a custom attention mechanism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aich, S., Vianney, J.M.U., Islam, M.A., Liu, M.K.B.: Bidirectional attention network for monocular depth estimation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11746–11752. IEEE (2021)
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2018)
Asif, S., Yi, W., Ain, Q.U., Hou, J., Yi, T., Si, J.: Improving effectiveness of different deep transfer learning-based models for detecting brain tumors from MR images. IEEE Access 10, 34716–34730 (2022)
Chandel, A., Yadav, S.K.S., Agarwal, A.K., Shukla, S., Poddar, J.: 3D reconstruction of heritage site using terrestrial LiDAR scanner (TLs): a case study of a section of Gulistan-E-Iram, Lucknow. Sustain. Infrastruct. Dev., 79–89 (2022)
Chen, Y., Mancini, M., Zhu, X., Akata, Z.: Semi-supervised and unsupervised deep visual learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Cuenat, S., Couturier, R.: Convolutional neural network (CNN) vs vision transformer (ViT) for digital holography. In: 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), pp. 235–240. IEEE (2022)
Dhillon, A., Verma, G.K.: A multiple object recognition approach via DenseNet-161 model. In: Smart Electrical and Mechanical Systems, pp. 39–64. Elsevier (2022)
Eltner, A., Sofia, G.: Structure from motion photogrammetric technique. In: Developments in Earth Surface Processes, vol. 23, pp. 1–24. Elsevier (2020)
Lore, K.G., Reddy, K., Giering, M., Bernal, E.A.: Generative adversarial networks for depth map estimation from RGB video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1177–1185 (2018)
Huang, B., Zheng, J.-Q., Giannarou, S., Elson, D.S.: H-Net: unsupervised attention-based stereo depth estimation leveraging epipolar geometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4467 (2022)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Hui, T.-W.: RM-depth: unsupervised learning of recurrent monocular depth in dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1675–1684 (2022)
Iglhaut, J., Cabo, C., Puliti, S., Piermattei, L., O’Connor, J., Rosette, J.: Structure from motion photogrammetry in forestry: a review. Current Forestry Rep. 5(3), 155–168 (2019)
Javidnia, H., Corcoran, P.: Accurate depth map estimation from small motions. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2453–2461 (2017)
Ji, P., Li, R., Bhanu, B., Xu, Y.: MonoIndoor: towards good practice of self-supervised monocular depth estimation for indoor environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12787–12796 (2021)
Jiang, S., Jiang, C., Jiang, W.: Efficient structure from motion for large-scale UAV images: a review and a comparison of SFM tools. ISPRS J. Photogramm. Remote. Sens. 167, 230–251 (2020)
Khan, M.F.F., Devulapally, A., Advani, S., Narayanan, V.: Robust multimodal depth estimation using transformer based generative adversarial networks. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3559–3568 (2022)
Kwak, J., Sung, Y.: DeepLabV3-refiner-based semantic segmentation model for dense 3D point clouds. Remote Sens. 13(8), 1565 (2021)
Li, B.: Facial expression recognition by DenseNet-121. In: Multi-Chaos, Fractal and Multi-Fractional Artificial Intelligence of Different Complex Systems, pp. 263–276. Elsevier (2022)
Li, R., Ji, P., Yi, X., Bhanu, B.: MonoIndoor++: towards better practice of self-supervised monocular depth estimation for indoor environments. IEEE Trans. Circuits Syst. Video Technol. 33(2), 830–846 (2022)
Makarov, I., Bakhanova, M., Nikolenko, S., Gerasimova, O.: Self-supervised recurrent depth estimation with attention mechanisms. PeerJ Comput. Sci. 8, e865 (2022)
Mejia-Trujillo, J.D., et al.: Kinect™ and intel RealSense™ D435 comparison: a preliminary study for motion analysis. In: 2019 IEEE International Conference on E-Health Networking, Application & Services (HealthCom), pp. 1–4. IEEE (2019)
Mousavi, M., Khanal, A., Estrada, R.: AI playground: unreal engine-based data ablation tool for deep learning. In: Bebis, G., et al. (eds.) ISVC 2020. LNCS, vol. 12510, pp. 518–532. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64559-5_41
Murez, Z., van As, T., Bartolozzi, J., Sinha, A., Badrinarayanan, V., Rabinovich, A.: Atlas: end-to-end 3D scene reconstruction from posed images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 414–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_25
Nguyen, H., Tran, T., Wang, Y., Wang, Z.: Three-dimensional shape reconstruction from single-shot speckle image using deep convolutional neural networks. Opt. Lasers Eng. 143, 106639 (2021)
Puscas, M.M., Xu, D., Pilzer, A., Sebe, N.: Structured coupled generative adversarial networks for unsupervised monocular depth estimation. In: 2019 International Conference on 3D Vision (3DV), pp. 18–26. IEEE (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sagar, A.: Monocular depth estimation using multi scale neural network and feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 656–662 (2022)
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)
Tao, B., Shen, Y., Tong, X., Jiang, D., Chen, B.: Depth estimation using feature pyramid U-Net and polarized self-attention for road scenes. Photonics 9, 468 (2022)
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21
Varma, A., Chawla, H., Zonooz, B., Arani, E.: Transformers in self-supervised monocular depth estimation with unknown camera intrinsics. arXiv preprint arXiv:2202.03131 (2022)
Vasiljevic, I., et al.: DIODE: a dense indoor and outdoor depth dataset. arXiv preprint arXiv:1908.00463 (2019)
Vaswani, A.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, S.-H., Zhang, Y.-D.: DenseNet-201-based deep neural network with composite learning factor and precomputation for multiple sclerosis classification. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 16(2s), 1–19 (2020)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Zhang, Y.-D., Satapathy, S.C., Zhang, X., Wang, S.-H.: Covid-19 diagnosis via DenseNet and optimization of transfer learning setting. Cogn. Comput., 1–17 (2021)
Zhou, T., Ye, X., Lu, H., Zheng, X., Qiu, S., Liu, Y., et al.: Dense convolutional network and its application in medical image analysis. BioMed Res. Int. 2022 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ocsa Sánchez, L.J., Gutiérrez Cáceres, J.C. (2024). Attention U-Net Oriented Towards 3D Depth Estimation. In: Arai, K. (eds) Intelligent Computing. SAI 2024. Lecture Notes in Networks and Systems, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-62269-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-62269-4_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62268-7
Online ISBN: 978-3-031-62269-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)