Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626641.3626669acmotherconferencesArticle/Chapter ViewAbstractPublication PagessietConference Proceedingsconference-collections
research-article

Investigating ResNet Variants in U-Net to Obtain High-Quality Depth Maps in Indoor Scenes

Published: 27 December 2023 Publication History

Abstract

Depth estimation has an important role in various applications. Generally, active sensors such as Light Detector and Ranging (LiDAR) or stereo cameras are used in depth estimation. However, they have various drawbacks, primarily the computational cost required. In order to provide lower costs, monocular cameras became the solution. Images captured by monocular cameras, however, do not have depth information, making it impossible to estimate depth values. The success of deep learning in solving various computer vision tasks has been proven and is growing rapidly, one of them is the monocular depth estimation. There are various scenes solved in the case of monocular depth estimation, including indoor and outdoor. In this research, we will focus on indoor scenes due to the fact that the information in indoor scenes is more varied compared to outdoor scenes. This makes indoor scenes more challenging than outdoor scenes. In this research, we use the U-Net architecture by using a pre-trained ResNet at the encoder side. ResNet was chosen because of its proven ability to prevent vanishing and exploding gradients. In this work, we examine ResNet's ability to perform depth estimation and we investigate the impact of using weights in the loss function on the depth map obtained both quantitatively and qualitatively. We use the DIODE dataset: A Dense Indoor and Outdoor Depth Dataset as a benchmark. We achieved quantitative improvement with RMSE of 1.7673, REL of 0.3281, δ<1.25 of 57.38, δ<1.25² of 77.18, and δ<1.25³ of 86.71.

References

[1]
Y. Ming, X. Meng, C. Fan, and H. Yu, “Deep learning for monocular depth estimation: A review,” Neurocomputing, vol. 438, pp. 14–33, May 2021.
[2]
A. Petrovai and S. Nedevschi, “Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1578–1588.
[3]
A. Ghasemieh and R. Kashef, “3D object detection for autonomous driving: Methods, models, sensors, data, and challenges,” Transportation Engineering, vol. 8, Jun. 2022.
[4]
D. Eigen, C. Puhrsch, and R. Fergus, “Depth Map Prediction from a Single Image using a Multi-Scale Deep Network,” in NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Jun. 2014, pp. 2366–2374. Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1406.2283
[5]
I. Alhashim and P. Wonka, “High Quality Monocular Depth Estimation via Transfer Learning,” Dec. 2018, [Online]. Available: http://arxiv.org/abs/1812.11941
[6]
H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep Ordinal Regression Network for Monocular Depth Estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2018, pp. 2002–2011. Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1806.02446
[7]
S. Aich, J. M. U. Vianney, M. A. Islam, M. Kaur, and B. Liu, “Bidirectional Attention Network for Monocular Depth Estimation,” in Proceedings - IEEE International Conference on Robotics and Automation, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 11746–11752.
[8]
Y. Lecun and Y. Bengio, “Convolutional Networks for Images, Speech, and Time-Series,” The handbook of brain theory and neural networks, vol. 3361, 1995.
[9]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Dec. 2015, pp. 770–778. Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1512.03385
[10]
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” Aug. 2016, [Online]. Available: http://arxiv.org/abs/1608.06993
[11]
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” May 2019, [Online]. Available: http://arxiv.org/abs/1905.11946
[12]
M. Tan and Q. V. Le, “EfficientNetV2: Smaller Models and Faster Training,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.00298
[13]
A. G. Howard, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr. 2017, [Online]. Available: http://arxiv.org/abs/1704.04861
[14]
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.10934
[15]
C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” Jul. 2022, [Online]. Available: http://arxiv.org/abs/2207.02696
[16]
W. Liu, “SSD: Single Shot MultiBox Detector,” Dec. 2015.
[17]
N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni, “U-net and its variants for medical image segmentation: A review of theory and applications,” IEEE Access, 2021.
[18]
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 2015, pp. 234–241.
[19]
O. Oktay, “Attention U-Net: Learning Where to Look for the Pancreas,” Apr. 2018, [Online]. Available: http://arxiv.org/abs/1804.03999
[20]
O. Petit, N. Thome, C. Rambour, and L. Soler, “U-Net Transformer: Self and Cross Attention for Medical Image Segmentation,” Mar. 2021, [Online]. Available: http://arxiv.org/abs/2103.06104
[21]
P. Ji, R. Li, B. Bhanu, and Y. Xu, “MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments,” in In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12787–12796.
[22]
C.-Y. Wu, J. Wang, M. Hall, U. Neumann, and S. Su, “Toward Practical Monocular Indoor Depth Estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3814–3824.
[23]
J. Lienen, N. Nommensen, R. Ewerth, E. Hüllermeier, V. N. Balasubramanian, and I. Tsang, “Robust Regression for Monocular Depth Estimation,” in Asian Conference on Machine Learning, 2021, pp. 1001–1016. [Online]. Available: https://proceedings.mlr.press/v157/lienen21a.html
[24]
I. Vasiljevic, “DIODE: A Dense Indoor and Outdoor DEpth Dataset,” Aug. 2019, Accessed: Oct. 29, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.1908.00463
[25]
F. Khan, S. Salahuddin, and H. Javidnia, “Deep learning-based monocular depth estimation methods—a state-of-the-art review,” Sensors (Switzerland), vol. 20, no. 8. MDPI AG, Apr. 02, 2020.
[26]
B. Xu, N. Wang, T. Chen, and M. Li, “Empirical Evaluation of Rectified Activations in Convolutional Network,” May 2015, [Online]. Available: http://arxiv.org/abs/1505.00853
[27]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” 2004. [Online]. Available: http://www.cns.nyu.edu/∼lcv/ssim/.
[28]
D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15, 2015.

Index Terms

  1. Investigating ResNet Variants in U-Net to Obtain High-Quality Depth Maps in Indoor Scenes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology
    October 2023
    722 pages
    ISBN:9798400708503
    DOI:10.1145/3626641
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 December 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Computer vision
    2. Monocular depth estimation
    3. Pre-trained
    4. ResNet
    5. U-Net

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • International Cooperation Research Grant/Hibah Penelitian Kerjasama Internasional (HAPKI)

    Conference

    SIET 2023

    Acceptance Rates

    Overall Acceptance Rate 45 of 57 submissions, 79%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 21
      Total Downloads
    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media