🚀Sascha’s Paper Club
Depth Anything —A Foundation Model for Monocular Depth Estimation
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by L. Yang et. al.
Monocular depth estimation, the prediction of distance in 3D space from a 2D image. The “ill posed and inherently ambiguous problem”, as stated in literally every paper on depth estimation, is a fundamental problem in computer vision and robotics. At the same time foundation models dominate the scene in deep learning based NLP and computer vision. Wouldn’t it be awesome if we could leverage their success for depth estimation too?
In today’s paper walkthrough we’ll dive into Depth Anything, a foundation model for monocular depth estimation. We will discover its architecture, the tricks used to train it and how it is used for metric depth estimation.
Paper: Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Lihe Yang et.al., 19 Jan. 2024
Resources: GitHub — Project Page — Demo — Checkpoints
Conference: CVPR2024
Category: foundation models, monocular depth estimation
Other Walkthroughs:
[BYOL] — [CLIP] — [GLIP] — [Segment Anything] — [DINO] — [DDPM]