Depth Prediction Single Image
Depth Prediction Single Image
root mean squared error. Since out dataset is in log observed in the model when we combine the input
space, we convert into the exponent of the base to in the expanding part of the model with the corre-
get the linear value. Similarly, we add other loss sponding output in the contracting path as shown
functions like relative difference, squared relative in figure 5.
difference for both log space and absolute. We plot
these values during the validation stage and ob-
served that post 200 epochs the network stabilizes
and doesn’t progress much. The values observed
are evaluated on the evaluation dataset of the la-
beled NYU Depth dataset which had 201 images.
We propose a end to end trainable model for es- [4] Iro Laina, Christian Rupprecht, Vasileios Belagian-
timating the depth map from a single rgb image nis, Federico Tombari, and Nassir Navab. Deeper
using NYU Depth labeled dataset. We show how depth prediction with fully convolutional residual
networks. In 3D Vision (3DV), 2016 Fourth Interna-
to train this model by using the coarse fine model tional Conference on, pages 239–248. IEEE, 2016.
for comparison. We see that our model with u-net
based architecture converged in mere 200 epochs [5] Dan Xu, Elisa Ricci, Wanli Ouyang, Xiaogang
and trains quickly as there are no heavy fully con- Wang, and Nicu Sebe. Multi-scale continuous crfs
as sequential deep networks for monocular depth es-
nected layers used in the model. Also we use timation. In Proceedings of CVPR, volume 1, 2017.
no processing of images like CRFs or other ad-
ditional refinement steps. The output generated by [6] Caner Hazirbas, Lingni Ma, Csaba Domokos, and
our model is comparable to the fine-coarse model Daniel Cremers. Fusenet: Incorporating depth into
semantic segmentation via fusion-based cnn archi-
implemented by us and can do even better with in- tecture. In Asian Conference on Computer Vision,
creased dataset size. pages 213–228. Springer, 2016.
For future work, we would like to investigate
our findings further and try to explore other loss [7] Yuanzhouhan Cao, Chunhua Shen, and Heng Tao
Shen. Exploiting depth from single monocular
functions with better gradient optimization strate- images for object detection and semantic segmen-
gies to better account for the loss in the output im- tation. IEEE Transactions on Image Processing,
ages. We would also like to investigate the effects 26(2):836–846, 2017.
of using the original u-net architecture which ex-
[8] Jamie Shotton, Ross Girshick, Andrew Fitzgibbon,
pects input of size 572 * 572 and see the impact
Toby Sharp, Mat Cook, Mark Finocchio, Richard
of these increased parameters on the model’s per- Moore, Pushmeet Kohli, Antonio Criminisi, Alex
formance. Also make our model more robust by Kipman, et al. Efficient human pose estimation from
applying it on non labeled dataset as well. single depth images. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 35(12):2821–
9 Contributions 2840, 2013.
− Diksha Meghwal - Worked on implement- [9] Shuran Song Jianxiong Xiao. Deep sliding shapes
for amodal 3d object detection in rgb-d images.
ing variations of u-net model, built log parser
to plot the graphs for gradient descent, and [10] David Eigen, Christian Puhrsch, and Rob Fergus.
worked on setting up the framework for run- Depth map prediction from a single image using a
ning the program parallely on GPUs multi-scale deep network. In Advances in neural
− Imran - Worked on implementation of the information processing systems, pages 2366–2374,
2014.
fine and coarse model, extracting and trans-
forming images from NYU Depth Dataset [11] Nathan Silberman, Derek Hoiem, Pushmeet Kohli,
and developing framework to calculate nu- and Rob Fergus. Indoor segmentation and support
merous loss values for comparison inference from rgbd images. In European Confer-
ence on Computer Vision, pages 746–760. Springer,
2012.
References [12] Andreas Geiger, Philip Lenz, Christoph Stiller, and
[1] Ashutosh Saxena, Min Sun, and Andrew Y Ng. Raquel Urtasun. Vision meets robotics: The kitti
Make3d: Learning 3d scene structure from a single dataset. The International Journal of Robotics Re-
still image. IEEE transactions on pattern analysis search, 32(11):1231–1237, 2013.
and machine intelligence, 31(5):824–840, 2009.
[13] Vijay Badrinarayanan, Alex Kendall, and Roberto
[2] Kevin Karsch, Ce Liu, and Sing Bing Kang. Depth Cipolla. Segnet: A deep convolutional encoder-
transfer: Depth extraction from video using non- decoder architecture for image segmentation. arXiv
parametric sampling. IEEE transactions on pattern preprint arXiv:1511.00561, 2015.
[14] Hyeonwoo Noh, Seunghoon Hong, and Bohyung [26] Guanghui Wang, Hung-Tat Tsui, and QM Jonathan
Han. Learning deconvolution network for semantic Wu. What can we learn about the scene structure
segmentation. In Proceedings of the IEEE interna- from three orthogonal vanishing points in images.
tional conference on computer vision, pages 1520– Pattern Recognition Letters, 30(3):192–202, 2009.
1528, 2015.
[27] Miaomiao Liu, Mathieu Salzmann, and Xuming
[15] Jonathan Long, Evan Shelhamer, and Trevor Dar- He. Discrete-continuous depth estimation from a
rell. Fully convolutional networks for semantic seg- single image. In Proceedings of the IEEE Confer-
mentation. In Proceedings of the IEEE conference ence on Computer Vision and Pattern Recognition,
on computer vision and pattern recognition, pages pages 716–723, 2014.
3431–3440, 2015.
[28] Pushmeet Kohli Nathan Silberman, Derek Hoiem
[16] Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Rob Fergus. Indoor segmentation and support
and Jitendra Malik. Hypercolumns for object seg- inference from rgbd images. In ECCV, 2012.
mentation and fine-grained localization. In Proceed-
ings of the IEEE conference on computer vision and [29] Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen,
pattern recognition, pages 447–456, 2015. Brian Price, and Alan L Yuille. Towards unified
depth and semantic prediction from a single image.
[17] Olaf Ronneberger, Philipp Fischer, and Thomas In Proceedings of the IEEE Conference on Com-
Brox. U-net: Convolutional networks for biomed- puter Vision and Pattern Recognition, pages 2800–
ical image segmentation. In International Confer- 2809, 2015.
ence on Medical image computing and computer-
assisted intervention, pages 234–241. Springer,
2015.
[18] Guosheng Lin, Anton Milan, Chunhua Shen, and
Ian D Reid. Refinenet: Multi-path refinement net-
works for high-resolution semantic segmentation. In
Cvpr, volume 1, page 5, 2017.
[19] Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai
Wang, and Xiang Bai. Richer convolutional features
for edge detection. In Computer Vision and Pat-
tern Recognition (CVPR), 2017 IEEE Conference
on, pages 5872–5881. IEEE, 2017.
[20] Saining Xie and Zhuowen Tu. Holistically-nested
edge detection. In Proceedings of the IEEE interna-
tional conference on computer vision, pages 1395–
1403, 2015.
[21] Derek Hoiem, Alexei A Efros, and Martial Hebert.
Automatic photo pop-up. In ACM transactions on
graphics (TOG), volume 24, pages 577–584. ACM,
2005.
[22] Alexander G Schwing and Raquel Urtasun. Effi-
cient exact inference for 3d indoor scene understand-
ing. In European Conference on Computer Vision,
pages 299–313. Springer, 2012.
[23] Varsha Hedau, Derek Hoiem, and David Forsyth.
Thinking inside the box: Using appearance mod-
els and context based on room geometry. In Euro-
pean Conference on Computer Vision, pages 224–
237. Springer, 2010.
[24] Ashutosh Saxena, Sung H Chung, and Andrew Y
Ng. Learning depth from single monocular images.
In Advances in neural information processing sys-
tems, pages 1161–1168, 2006.
[25] Ashutosh Saxena, Sung H Chung, and Andrew Y
Ng. 3-d depth reconstruction from a single still
image. International journal of computer vision,
76(1):53–69, 2008.