research-article

Investigating ResNet Variants in U-Net to Obtain High-Quality Depth Maps in Indoor Scenes

Authors:

Krisna Pinasthika,

Fitri UtaminingrumAuthors Info & Claims

SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology

Pages 150 - 156

https://doi.org/10.1145/3626641.3626669

Published: 27 December 2023 Publication History

Abstract

Depth estimation has an important role in various applications. Generally, active sensors such as Light Detector and Ranging (LiDAR) or stereo cameras are used in depth estimation. However, they have various drawbacks, primarily the computational cost required. In order to provide lower costs, monocular cameras became the solution. Images captured by monocular cameras, however, do not have depth information, making it impossible to estimate depth values. The success of deep learning in solving various computer vision tasks has been proven and is growing rapidly, one of them is the monocular depth estimation. There are various scenes solved in the case of monocular depth estimation, including indoor and outdoor. In this research, we will focus on indoor scenes due to the fact that the information in indoor scenes is more varied compared to outdoor scenes. This makes indoor scenes more challenging than outdoor scenes. In this research, we use the U-Net architecture by using a pre-trained ResNet at the encoder side. ResNet was chosen because of its proven ability to prevent vanishing and exploding gradients. In this work, we examine ResNet's ability to perform depth estimation and we investigate the impact of using weights in the loss function on the depth map obtained both quantitatively and qualitatively. We use the DIODE dataset: A Dense Indoor and Outdoor Depth Dataset as a benchmark. We achieved quantitative improvement with RMSE of 1.7673, REL of 0.3281, δ<1.25 of 57.38, δ<1.25² of 77.18, and δ<1.25³ of 86.71.

References

[1]

Y. Ming, X. Meng, C. Fan, and H. Yu, “Deep learning for monocular depth estimation: A review,” Neurocomputing, vol. 438, pp. 14–33, May 2021.

[2]

A. Petrovai and S. Nedevschi, “Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1578–1588.

[3]

A. Ghasemieh and R. Kashef, “3D object detection for autonomous driving: Methods, models, sensors, data, and challenges,” Transportation Engineering, vol. 8, Jun. 2022.

[4]

D. Eigen, C. Puhrsch, and R. Fergus, “Depth Map Prediction from a Single Image using a Multi-Scale Deep Network,” in NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Jun. 2014, pp. 2366–2374. Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1406.2283

[5]

I. Alhashim and P. Wonka, “High Quality Monocular Depth Estimation via Transfer Learning,” Dec. 2018, [Online]. Available: http://arxiv.org/abs/1812.11941

[6]

H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep Ordinal Regression Network for Monocular Depth Estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2018, pp. 2002–2011. Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1806.02446

[7]

S. Aich, J. M. U. Vianney, M. A. Islam, M. Kaur, and B. Liu, “Bidirectional Attention Network for Monocular Depth Estimation,” in Proceedings - IEEE International Conference on Robotics and Automation, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 11746–11752.

Digital Library

[8]

Y. Lecun and Y. Bengio, “Convolutional Networks for Images, Speech, and Time-Series,” The handbook of brain theory and neural networks, vol. 3361, 1995.

[9]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Dec. 2015, pp. 770–778. Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1512.03385

[10]

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” Aug. 2016, [Online]. Available: http://arxiv.org/abs/1608.06993

[11]

M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” May 2019, [Online]. Available: http://arxiv.org/abs/1905.11946

[12]

M. Tan and Q. V. Le, “EfficientNetV2: Smaller Models and Faster Training,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.00298

[13]

A. G. Howard, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr. 2017, [Online]. Available: http://arxiv.org/abs/1704.04861

[14]

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.10934

[15]

C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” Jul. 2022, [Online]. Available: http://arxiv.org/abs/2207.02696

[16]

W. Liu, “SSD: Single Shot MultiBox Detector,” Dec. 2015.

[17]

N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni, “U-net and its variants for medical image segmentation: A review of theory and applications,” IEEE Access, 2021.

[18]

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 2015, pp. 234–241.

[19]

O. Oktay, “Attention U-Net: Learning Where to Look for the Pancreas,” Apr. 2018, [Online]. Available: http://arxiv.org/abs/1804.03999

[20]

O. Petit, N. Thome, C. Rambour, and L. Soler, “U-Net Transformer: Self and Cross Attention for Medical Image Segmentation,” Mar. 2021, [Online]. Available: http://arxiv.org/abs/2103.06104

[21]

P. Ji, R. Li, B. Bhanu, and Y. Xu, “MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments,” in In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12787–12796.

[22]

C.-Y. Wu, J. Wang, M. Hall, U. Neumann, and S. Su, “Toward Practical Monocular Indoor Depth Estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3814–3824.

[23]

J. Lienen, N. Nommensen, R. Ewerth, E. Hüllermeier, V. N. Balasubramanian, and I. Tsang, “Robust Regression for Monocular Depth Estimation,” in Asian Conference on Machine Learning, 2021, pp. 1001–1016. [Online]. Available: https://proceedings.mlr.press/v157/lienen21a.html

[24]

I. Vasiljevic, “DIODE: A Dense Indoor and Outdoor DEpth Dataset,” Aug. 2019, Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1908.00463

[25]

F. Khan, S. Salahuddin, and H. Javidnia, “Deep learning-based monocular depth estimation methods—a state-of-the-art review,” Sensors (Switzerland), vol. 20, no. 8. MDPI AG, Apr. 02, 2020.

[26]

B. Xu, N. Wang, T. Chen, and M. Li, “Empirical Evaluation of Rectified Activations in Convolutional Network,” May 2015, [Online]. Available: http://arxiv.org/abs/1505.00853

[27]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” 2004. [Online]. Available: http://www.cns.nyu.edu/∼lcv/ssim/.

Digital Library

[28]

D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15, 2015.

Index Terms

Investigating ResNet Variants in U-Net to Obtain High-Quality Depth Maps in Indoor Scenes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems

Recommendations

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry
Computer Vision – ECCV 2018
Abstract
Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep ...
Self-supervised Monocular Depth Estimation on Unseen Synthetic Cameras
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Abstract
Monocular depth estimation is a critical task in computer vision, and self-supervised deep learning methods have achieved remarkable results in recent years. However, these models often struggle on camera generalization, i.e. at sequences captured ...
Generation of high-quality depth maps using hybrid camera system for 3-D video

In this paper, we present a hybrid camera system combining one time-of-flight depth camera and multiple video cameras to generate multi-view video sequences and their corresponding depth maps. In order to obtain the multi-view video-plus-depth data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology

October 2023

722 pages

ISBN:9798400708503

DOI:10.1145/3626641

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

International Cooperation Research Grant/Hibah Penelitian Kerjasama Internasional (HAPKI)

Conference

SIET 2023

SIET 2023: International Conference on Sustainable Information Engineering and Technology

October 24 - 25, 2023

Badung, Bali, Indonesia

Acceptance Rates

Overall Acceptance Rate 45 of 57 submissions, 79%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
21
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents