🚀Sascha’s Paper Club

Depth Anything —A Foundation Model for Monocular Depth Estimation

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by L. Yang et. al.

Sascha Kirch

Published in

Towards Data Science

11 min readMar 20, 2024

“Depth Anything” paper illustration by Sascha Kirch — Image created from publication by Sascha Kirch

Monocular depth estimation, the prediction of distance in 3D space from a 2D image. The “ill posed and inherently ambiguous problem”, as stated in literally every paper on depth estimation, is a fundamental problem in computer vision and robotics. At the same time foundation models dominate the scene in deep learning based NLP and computer vision. Wouldn’t it be awesome if we could leverage their success for depth estimation too?

In today’s paper walkthrough we’ll dive into Depth Anything, a foundation model for monocular depth estimation. We will discover its architecture, the tricks used to train it and how it is used for metric depth estimation.

Paper: Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Lihe Yang et.al., 19 Jan. 2024
Resources: GitHub — Project Page — Demo — Checkpoints
Conference: CVPR2024
Category: foundation models, monocular depth estimation
Other Walkthroughs:
[BYOL] — [CLIP] — [GLIP] — [Segment Anything] — [DINO] — [DDPM]

🚀Sascha’s Paper Club

Depth Anything —A Foundation Model for Monocular Depth Estimation

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by L. Yang et. al.

Written by Sascha Kirch