article

3-D Depth Reconstruction from a Single Still Image

Authors:

Ashutosh Saxena,

Andrew Y. NgAuthors Info & Claims

International Journal of Computer Vision, Volume 76, Issue 1

Pages 53 - 69

https://doi.org/10.1007/s11263-007-0071-y

Published: 01 January 2008 Publication History

Abstract

We consider the task of 3-d depth estimation from a single still image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured indoor and outdoor environments which include forests, sidewalks, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the value of the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a hierarchical, multiscale Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models the depths and the relation between depths at different points in the image. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. We further propose a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.

References

[1]

Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., & Davis, J. (2005). SCAPE: shape completion and animation of people. ACM Transactions on Graphics, 24(3), 408-416.

Digital Library

[2]

Barron, J. L., Fleet, D. J., & Beauchemin, S. S. (1994). Performance of optical flow techniques. International Journal of Computer Vision, 12, 43-77.

Digital Library

[3]

Brown, M. Z., Burschka, D., & Hager, G. D. (2003). Advances in computational stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8), 993-1008.

Digital Library

[4]

Bulthoff, I., Bulthoff, H., & Sinha, R (1998). Top-down influences on stereoscopic depth-perception. Nature Neuroscience, 1, 254-257.

[5]

Cornelis, N., Leibe, B., Cornelis, K., & Van Gool, L. (2006). 3d city modeling using cognitive loops. In Video proceedings of CVPR (VPCVPR).

[6]

Criminisi, A., Reid, I., & Zisserman, A. (2000). Single view metrology. International Journal of Computer Vision, 40, 123-148.

Digital Library

[7]

Das, S., & Ahuja, N. (1995). Performance analysis of stereo, vergence, and focus as depth cues for active vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(12), 1213-1219.

Digital Library

[8]

Davies, E. R. (1997). Laws' texture energy in TEXTURE. In Machine vision: theory, algorithms, practicalities (2nd ed.). San Diego: Academic Press.

Digital Library

[9]

Delage, E., Lee, H., & Ng, A. Y. (2005). Automatic single-image 3d reconstructions of indoor Manhattan world scenes. In 12th International Symposium of Robotics Research (ISRR).

[10]

Delage, E., Lee, H., & Ng, A. Y. (2006). A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In Computer vision and pattern recognition (CVPR).

[11]

Forsyth, D. A., & Ponce, J. (2003). Computer vision: a modern approach . New York: Prentice Hall.

[12]

Frueh, C., & Zakhor, A. (2003). Constructing 3D city models by merging ground-based and airborne views. In Computer vision and pattern recognition (CVPR).

[13]

Gini, G., & Marchi, A. (2002). Indoor robot navigation with single camera vision. In PRIS.

[14]

Harkness, L. (1977). Chameleons use accommodation cues to judge distance. Nature, 267, 346-349.

[15]

He, X., Zemel, R., & Perpinan, M. (2004). Multiscale conditional random fields for image labeling. In Computer vision and pattern recognition (CVPR).

Digital Library

[16]

Hertzmann, A., & Seitz, S. M. (2005). Example-based photometric stereo: Shape reconstruction with general, varying brdfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1254-1264.

Digital Library

[17]

Hoiem, D., Efros, A. A., & Herbert, M. (2005a). Geometric context from a single image. In International conference on computer vision (ICCV).

[18]

Hoiem, D., Efros, A. A., & Herbert, M. (2005b). Automatic photo pop-up. In ACM SIGGRAPH.

[19]

Hoiem, D., Efros, A. A., & Herbert, M. (2006). Putting objects in perspective. In Computer vision and pattern recognition (CVPR).

[20]

Huang, J., Lee, A. B., & Mumford, D. (2000). Statistics of range images. In Computer vision and pattern recognition (CVPR).

[21]

Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., & Rother, C. (2006). Probabilistic fusion of stereo with color and contrast for bilayer segmentation. IEEE Pattern Analysis and Machine Intelligence, 28(9), 1480-1492.

Digital Library

[22]

Konishi, S., & Yuille, A. (2000). Statistical cues for domain specific image segmentation with performance analysis. In Computer vision and pattern recognition (CVPR).

[23]

Kumar, S., & Hebert, M. (2003). Discriminative fields for modeling spatial dependencies in natural images. In Neural information processing systems (NIPS) (Vol. 16).

[24]

Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning (ICML).

[25]

Lindeberg, T., & Garding, J. (1993). Shape from texture from a multi-scale perspective. In International conference on computer vision (ICCV).

[26]

Loomis, J. M. (2001). Looking down is looking up. Nature News and Views, 414, 155-156.

[27]

Maki, A., Watanabe, M., & Wiles, C. (2002). Geotensity: combining motion and lighting for 3d surface reconstruction. International Journal of Computer Vision, 48(2), 75-90.

[28]

Malik, J., & Perona, P. (1990). Preattentive texture discrimination with early vision mechanisms. Journal of the Optical Society of America A, 7(5), 923-932.

[29]

Malik, J., & Rosenholtz, R. (1997). Computing local surface orientation and shape from texture for curved surfaces. International Journal of Computer Vision, 23(2), 149-168.

Digital Library

[30]

Michels, J., Saxena, A., & Ng, A. Y. (2005). High speed obstacle avoidance using monocular vision and reinforcement learning. In 22nd international conference on machine learning (ICML).

[31]

Moldovan, T. M., Roth, S., & Black, M. J. (2006). Denoising archival films using a learned Bayesian model. In International conference on image processing (ICIP).

[32]

Mortensen, E. N., Deng, H., & Shapiro, L. (2005). A SIFT descriptor with global context. In Computer vision and pattern recognition (CVPR).

[33]

Murphy, K., Torralba, A., & Freeman, W. T. (2003). Using the forest to see the trees: a graphical model relating features, objects, and scenes. In Neural information processing systems (NIPS) (Vol. 16).

[34]

Nagai, T., Naruse, T., Ikehara, M., & Kurematsu, A. (2002). Hmm-based surface reconstruction from single images. In IEEE international conference on image processing (ICIP).

[35]

Narasimhan, S. G., & Nayar, S. K. (2003). Shedding light on the weather. In Computer vision and pattern recognition (CVPR).

[36]

Nestares, O., Navarro, R., Portilia, J., & Tabernero, A. (1998). Efficient spatial-domain implementation of a multiscale image representation based on Gabor functions. Journal of Electronic Imaging, 7(1), 166-173.

[37]

Oliva, A., & Torralba, A. (2006). Building the gist of a scene: the role of global image features in recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 155, 23-36.

[38]

Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: a strategy employed by v1 ? Vision Research, 37, 3311-3325.

[39]

Porrill, J., Frisby, J. P., Adams, W. J., & Buckley, D. (1999). Robust and optimal use of information in stereo vision. Nature, 397, 63-66.

[40]

Quartulli, M., & Datcu, M. (2001). Bayesian model based city reconstruction from high resolution ISAR data. In IEEE/ISPRS joint workshop remote sensing and data fusion over urban areas.

[41]

Saxena, A., Anand, A., & Mukerjee, A. (2004). Robust facial expression recognition using spatially localized geometric model. In International conf systemics, cybernetics and informatics (ICSCI).

[42]

Saxena, A., Chung, S. H., & Ng, A. Y. (2005). Learning depth from single monocular images. In Neural information processing system (NIPS) (Vol. 18).

[43]

Saxena, A., Driemeyer, J., Kearns, J., Osondu, C., & Ng, A. Y. (2006a). Learning to grasp novel objects using vision. In 10th international symposium on experimental robotics (ISER).

[44]

Saxena, A., Sun, M., Agarwal, R., & Ng, A. Y. (2006b). Learning 3-d scene structure from a single still image. Stanford Technical Report, November 2006.

[45]

Saxena, A., Driemeyer, J., Kearns, J., & Ng, A. Y. (2006c). Robotic grasping of novel objects. In Neural information processing systems (NIPS) (Vol. 19).

[46]

Saxena, A., Schulte, J., & Ng, A. Y. (2007). Depth estimation using monocular and stereo cues. In International joint conference on artificial intelligence (IJCAI).

[47]

Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1), 7-42.

Digital Library

[48]

Scharstein, D., & Szeliski, R. (2003) High-accuracy stereo depth maps using structured light. In Computer vision and pattern recognition (CVPR).

[49]

Schwartz, S. H. (1999). Visual perception (2nd ed.). Connecticut: Appleton and Lange.

[50]

Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In Computer vision and pattern recognition (CVPR).

[51]

Strang, G., & Nguyen, T. (1997). Wavelets and filter banks. Wellesley: Wellesley-Cambridge Press.

[52]

Sudderth, E. B., Torralba, A., Freeman, W. T., & Willisky, A. S. (2006). Depth from familiar objects: A hierarchical model for 3D scenes. In Computer vision and pattern recognition (CVPR).

[53]

Szeliski, R. (1990). Bayesian modeling of uncertainty in low-level vision. In International conference on computer vision (ICCV).

[54]

Thrun, S., & Wegbreit, B. (2005). Shape from symmetry. In International conference on computer vision (ICCV).

[55]

Torralba, A., & Oliva, A. (2002). Depth estimation from image structure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1-13.

[56]

Torresani, L., & Hertzmann, A. (2004). Automatic non-rigid 3D modeling from video. In European conference on computer vision.

[57]

Wandell, B. A. (1995). Foundations of vision. Sunderland: Sinauer Associates.

[58]

Welchman, A. E., Deubelius, A., Conrad, V., Bülthoff, H. H., & Kourtzi. Z. (2005). 3D shape perception from combined depth cues in human visual cortex. Nature Neuroscience, 8, 820-827.

[59]

Wexler, M., Panerai, F., Lamouret, I., & Droulez, J. (2001). Self-motion and the perception of stationary objects. Nature, 409, 85-88.

[60]

Willsky, A. S. (2002). Multiresolution Markov models for signal and image processing. Proceedings IEEE, 90(8), 1396-1458.

[61]

Wu, B., Ooi, T. L., & He, Z. J. (2004). Perceiving distance accurately by a directional process of integrating ground information. Letters to Nature, 428, 73-77.

[62]

Zhang, R., Tsai, P.-S., Cryer, J. E., & Shah, M. (1999). Shape from shading: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8), 690-706.

Digital Library

[63]

Zhao, W., Chellappa, R., Phillips, P. J., & Rosenfield, A. (2003). Face recognition: a literature survey. ACM Computing Surveys, 35, 399-458.

Digital Library

Cited By

Peng BSun LLei JLiu BShen HLi WHuang Q(2024)Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366357020:8(1-19)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3663570
Arampatzakis VPavlidis GMitianoudis NPapamarkos N(2024)Monocular Depth Estimation: A Thorough ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333094446:4(2396-2414)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TPAMI.2023.3330944
Haji-Esmaeili MMontazer G(2024)Large-scale Monocular Depth Estimation in the WildEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107189127:PAOnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107189
Show More Cited By

Index Terms

3-D Depth Reconstruction from a Single Still Image
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
      2. Computer vision tasks
        Scene understanding

Recommendations

Visual Modeling with a Hand-Held Camera

In this paper a complete system to build visual models from camera images is presented. The system can deal with uncalibrated image sequences acquired with a hand-held camera. Based on tracked or matched features the relations between multiple views are ...
Stereo Vision Tracking System
ICFCC '09: Proceedings of the 2009 International Conference on Future Computer and Communication

Target tracking has become an area of interest in recent years. Target tracking is able to widen the perspective and view of the static camera to track a target, providing basic artificial intelligence features for robots, and serve as the platform for ...
Digging into the multi-scale structure for a more refined depth map and 3D reconstruction
Abstract
Extracting dense depth from a single image is an important yet challenging computer vision task. Compared with stereo depth estimation, sensing the depth of a scene from monocular images is much more difficult and ambiguous because the epipolar ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision

International Journal of Computer Vision Volume 76, Issue 1

January 2008

101 pages

ISSN:0920-5691

Issue’s Table of Contents

Copyright © Copyright © 2008 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 January 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

98
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Peng BSun LLei JLiu BShen HLi WHuang Q(2024)Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366357020:8(1-19)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3663570
Arampatzakis VPavlidis GMitianoudis NPapamarkos N(2024)Monocular Depth Estimation: A Thorough ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333094446:4(2396-2414)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TPAMI.2023.3330944
Haji-Esmaeili MMontazer G(2024)Large-scale Monocular Depth Estimation in the WildEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107189127:PAOnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107189
Xue FChang YWang TZhou YMing A(2024)Indoor Obstacle Discovery on Reflective Ground via Monocular CameraInternational Journal of Computer Vision10.1007/s11263-023-01925-4132:3(987-1007)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11263-023-01925-4
Pinard CManzanera A(2023)Does it work outside this benchmark? Introducing the rigid depth constructor toolMultimedia Tools and Applications10.1007/s11042-023-14743-082:27(41641-41667)Online publication date: 4-Apr-2023
https://dl.acm.org/doi/10.1007/s11042-023-14743-0
Weng LChen XQiu QZhuang YGao F(2023)A semantic-aware monocular projection model for accurate pose measurementPattern Analysis & Applications10.1007/s10044-023-01197-126:4(1703-1714)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s10044-023-01197-1
Yu ZPeng SNiemeyer MSattler TGeiger AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)MonoSDFProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602084(25018-25032)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602084
Huang FLiu CHsiao KKuo YChu HYang Y(2022)Image-Based OA-Style Paper Pop-Up Design via Mixed-Integer ProgrammingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.318956929:10(4269-4283)Online publication date: 8-Jul-2022
https://dl.acm.org/doi/10.1109/TVCG.2022.3189569
Ling CZhang XChen H(2022)Unsupervised Monocular Depth Estimation Using Attention and Multi-Warp ReconstructionIEEE Transactions on Multimedia10.1109/TMM.2021.309130824(2938-2949)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TMM.2021.3091308
Eising CHorgan JYogamani S(2022)Near-Field Perception for Low-Speed Vehicle Automation Using Surround-View Fisheye CamerasIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2021.312764623:9(13976-13993)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1109/TITS.2021.3127646
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents