Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3638550.3641122acmconferencesArticle/Chapter ViewAbstractPublication PageshotmobileConference Proceedingsconference-collections
research-article

Mobile AR Depth Estimation: Challenges & Prospects

Published: 28 February 2024 Publication History

Abstract

Accurate metric depth can help achieve more realistic user interactions such as object placement and occlusion detection in mobile augmented reality (AR). However, it can be challenging to obtain metricly accurate depth estimation in practice. We tested four different state-of-the-art (SOTA) monocular depth estimation models on a newly introduced dataset (ARKitScenes) and observed obvious performance gaps on this real-world mobile dataset. We categorize the challenges to hardware, data, and model-related challenges and propose promising future directions, including (i) using more hardware-related information from the mobile device's camera and other available sensors, (ii) capturing high-quality data to reflect real-world AR scenarios, and (iii) designing a model architecture to utilize the new information.

References

[1]
Apple. https://developer.apple.com/augmented-reality/, 2017.
[2]
G. Baruch, Z. Chen, A. Dehghan, T. Dimry, Y. Feigin, P. Fu, T. Gebauer, B. Joffe, D. Kurz, A. Schwartz, and E. Shulman. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. In NeurIPS Datasets and Benchmarks Track, 2021.
[3]
S. F. Bhat, I. Alhashim, and P. Wonka. Localbins: Improving depth estimation by learning local distributions. In ECCV, 2022.
[4]
S. F. Bhat, R. Birkl, D. Wofk, P. Wonka, and M. Müller. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv:2302.12288.
[5]
R. Birkl, D. Wofk, and M. Müller. Midas v3.1--a model zoo for robust monocular relative depth estimation. arXiv:2307.14460, 2023.
[6]
G. Brazil, A. Kumar, J. Straub, N. Ravi, J. Johnson, and G. Gkioxari. Omni3D: A large benchmark and model for 3D object detection in the wild. In CVPR, 2023.
[7]
J. Cho, D. Min, Y. Kim, and K. Sohn. DIML/CVL RGB-D Dataset: 2M RGB-D Images of Natural Indoor and Outdoor Scenes. arXiv: 2110.11590, 2021.
[8]
A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
[9]
S. Farooq Bhat, I. Alhashim, and P. Wonka. AdaBins: Depth Estimation Using Adaptive Bins. In CVPR, 2021.
[10]
Y. Fujimura, M. Iiyama, T. Funatomi, and Y. Mukaigawa. Deep depth from focal stack with defocus model for camera-setting invariance. arXiv:2202.13055, 2022.
[11]
A. Ganj, Y. Zhao, F. Galbiati, and T. Guo. Toward Scalable and Controllable AR Experimentation. In ImmerCom, 2023.
[12]
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. IJRR, 2013.
[13]
V. Guizilini, I. Vasiljevic, D. Chen, R. Ambrus, and A. Gaidon. Towards zero-shot scale-aware monocular depth estimation. In ICCV, 2023.
[14]
S. Hwang, J. Lee, W. J. Kim, S. Woo, K. Lee, and S. Lee. Lidar depth completion using color-embedded information via knowledge distillation. IEEE Transactions on Intelligent Transportation Systems, 2022.
[15]
Intel. https://www.intelrealsense.com/wp-content/uploads/2023/07/Intel-RealSense-D400-Series-Datasheet-July-2023.pdf, 2023.
[16]
M. Maximov, K. Galim, and L. Leal-Taixe. Focus on defocus: Bridging the synthetic to real domain gap for depth estimation. In CVPR, 2020.
[17]
P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.
[18]
M. Norman, V. Kellen, S. Smallen, B. DeMeulle, S. Strande, E. Lazowska, N. Alterman, R. Fatland, S. Stone, A. Tan, K. Yelick, E. Van Dusen, and J. Mitchell. Cloudbank: Managed services to simplify cloud access for computer science research and education. In Practice and Experience in Advanced Research Computing, PEARC '21, New York, NY, USA, 2021. Association for Computing Machinery.
[19]
R. Ranftl, A. Bochkovskiy, and V. Koltun. Vision transformers for dense prediction. In ICCV, 2021.
[20]
R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 2020.
[21]
M. Sayed, J. Gibson, J. Watson, V. Prisacariu, M. Firman, and C. Godard. Simplerecon: 3d reconstruction without 3d convolutions. In ECCV, 2022.
[22]
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-d slam systems. In IROS, 2012.
[23]
F. Tapia Benavides, A. Ignatov, and R. Timofte. Phonedepth: A dataset for monocular depth estimation on mobile devices. In CVPRW, 2022.
[24]
N.-H. Wang, R. Wang, Y.-L. Liu, Y.-H. Huang, Y.-L. Chang, C.-P. Chen, and K. Jou. Bridging unsupervised and supervised depth from focus via all-in-focus supervision. In ICCV, 2021.
[25]
C.-Y. Wu, J. Wang, M. Hall, U. Neumann, and S. Su. Toward practical monocular indoor depth estimation. In CVPR, 2022.
[26]
W. Yin, C. Zhang, H. Chen, Z. Cai, G. Yu, K. Wang, X. Chen, and C. Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. 2023.
[27]
J. Zhang, H. Yang, J. Ren, D. Zhang, B. He, T. Cao, Y. Li, Y. Zhang, and Y. Liu. Mobidepth: Real-time depth estimation using on-device dual cameras. MobiCom, 2022.
[28]
Y. Zhang, T. Scargill, A. Vaishnav, G. Premsankar, M. Di Francesco, and M. Gorlatova. Indepth: Real-time depth inpainting for mobile augmented reality. IMWUT, 2022.

Cited By

View all
  • (2025)Boosting Depth Estimation for Self-Driving in a Self-Supervised Framework via Improved Pose NetworkIEEE Open Journal of the Computer Society10.1109/OJCS.2024.35058766(109-118)Online publication date: 2025
  • (2024)Enhancing Visual Perception in Immersive VR and AR Environments: AI-Driven Color and Clarity Adjustments Under Dynamic Lighting ConditionsTechnologies10.3390/technologies1211021612:11(216)Online publication date: 3-Nov-2024
  • (2024)Towards In-context Environment Sensing for Mobile Augmented RealityProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3696211(2091-2097)Online publication date: 4-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HOTMOBILE '24: Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications
February 2024
167 pages
ISBN:9798400704970
DOI:10.1145/3638550
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

HOTMOBILE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 96 of 345 submissions, 28%

Upcoming Conference

HOTMOBILE '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)251
  • Downloads (Last 6 weeks)20
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Boosting Depth Estimation for Self-Driving in a Self-Supervised Framework via Improved Pose NetworkIEEE Open Journal of the Computer Society10.1109/OJCS.2024.35058766(109-118)Online publication date: 2025
  • (2024)Enhancing Visual Perception in Immersive VR and AR Environments: AI-Driven Color and Clarity Adjustments Under Dynamic Lighting ConditionsTechnologies10.3390/technologies1211021612:11(216)Online publication date: 3-Nov-2024
  • (2024)Towards In-context Environment Sensing for Mobile Augmented RealityProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3696211(2091-2097)Online publication date: 4-Dec-2024
  • (2024)Toward Robust Depth Fusion for Mobile AR With Depth from Focus and Single-Image Priors2024 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct64951.2024.00149(517-520)Online publication date: 21-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media