research-article

Novel Views of Objects from a Single Image

Authors:

Konstantinos Rematas,

Chuong H. Nguyen,

Tobias Ritschel,

Tinne TuytelaarsAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 39, Issue 8

Pages 1576 - 1590

https://doi.org/10.1109/TPAMI.2016.2601093

Published: 01 August 2017 Publication History

Abstract

Taking an image of an object is at its core a lossy process. The rich information about the three-dimensional structure of the world is flattened to an image plane and decisions such as viewpoint and camera parameters are final and not easily revertible. As a consequence, possibilities of changing viewpoint are limited. Given a single image depicting an object, novel-view synthesis is the task of generating new images that render the object from a different viewpoint than the one given. The main difficulty is to synthesize the parts that are disoccluded; disocclusion occurs when parts of an object are hidden by the object itself under a specific viewpoint. In this work, we show how to improve novel-view synthesis by making use of the correlations observed in 3D models and applying them to new image instances. We propose a technique to use the structural information extracted from a 3D model that matches the image object in terms of viewpoint and shape. For the latter part, we propose an efficient 2D-to-3D alignment method that associates precisely the image appearance with the 3D model geometry with minimal user interaction. Our technique is able to simulate plausible viewpoint changes for a variety of object classes within seconds. Additionally, we show that our synthesized images can be used as additional training data that improves the performance of standard object detectors.

References

[1]

T. Beier and S. Neely, “Feature-based image metamorphosis,” in Proc. 19th Annu. Conf. Comput. Graph. Interactive Techn., 1992, vol. 26, no. 2, pp. 35–42.

[2]

S. E. Chen and L. Williams, “View interpolation for image synthesis,” in Proc. 20th Annu. Conf. Comput. Graph. Interactive Techn., 1993, pp. 279–288.

[3]

Y. Horry, K.-I. Anjyo, and K. Arai, “Tour into the picture: Using a spidery mesh interface to make animation from a single image,” in Proc. 24th Annu. Conf. Comput. Graph. Interactive Techn., 1997, pp. 225 –232.

[4]

S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” in Proc. 23rd Annu. Conf. Comput. Graph. Interactive Techn., 1996, pp. 43–54.

[5]

P. E. Debevec, C. J. Taylor, and J. Malik, “Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach,” in Proc. 23rd Annu. Conf. Comput. Graph. Interactive Techn., 1996, pp. 11 –20.

[6]

N. Kholgade, T. Simon, A. Efros, and Y. Sheikh, “3D object manipulation in a single photograph using stock 3D models,” ACM Trans. Graph., vol. 33, no. 4, 2014, Art. no.

[7]

D. N. Wood et al., “Surface light fields for 3D photography,” in Proc. 27th Annu. Conf. Comput. Graph. Interactive Techn., 2000, pp. 287–296.

[8]

P. Debevec, “Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography,” in Proc. 25th Annu. Conf. Comput. Graph. Interactive Techn., 1998, pp. 189 –198.

[9]

A. Criminisi, P. Perez, and K. Toyama, “Object removal by exemplar-based inpainting,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2003, vol. 2, pp. 721 –728.

[10]

C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, “PatchMatch: A randomized correspondence algorithm for structural image editing,” in Proc. ACM SIGGRAPH, 2009, Art. no.

[11]

P.-P. J. Sloan, W. Martin, A. Gooch, and B. Gooch, “The lit sphere: A model for capturing NPR shading from art,” in Proc. Graph. Interface, 2001, pp. 143–150.

[12]

A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin, “Image analogies,” in Proc. 28th Annu. Conf. Comput. Graph. Interactive Techn., 2001, pp. 327– 340.

[13]

A. Jain, T. Thormählen, T. Ritschel, and H.-P. Seidel, “Material memex: Automatic material suggestions for 3D objects,” ACM Trans. Graph., vol. 31, no. 6, pp. 143:1–143:8, 2012.

Digital Library

[14]

J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph., vol. 26, 2007, Art. no.

Digital Library

[15]

M. Pauly, N. J. Mitra, J. Giesen, M. H. Gross, and L. J. Guibas, “Example-based 3D scan completion,” in Proc. 3rd Eurographics Symp. Geometry Process., 2005, Art. no.

[16]

A. van den Hengel, A. Dick, T. Thormählen, B. Ward, and P. H. S. Torr, “VideoTrace: Rapid interactive scene modelling from video,” in Proc. ACM SIGGRAPH, 2007, Art. no.

[17]

H. Wu and K. Zhou, “ AppFusion: Interactive appearance acquisition using a Kinect sensor,” Comput. Graph. Forum, vol. 34, no. 6, pp. 289–298, 2015.

Digital Library

[18]

S. Georgoulis, K. Rematas, T. Ritschel, M. Fritz, L. V. Gool, and T. Tuytelaars, “DeLight-Net: Decomposing reflectance maps into specular materials and natural illumination,” arXiv:1602.00328, 2016, http://arxiv.org/pdf/1603.08240v1.pdf

[19]

K. Rematas, T. Ritschel, M. Fritz, E. Gavves, and T. Tuytelaars, “Deep reflectance maps,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 3898–3905.

[20]

B. K. Horn and R. W. Sjoberg, “Calculating the reflectance map,” Appl. Opt., vol. 18, no. 11, pp. 1770–1779, 1979.

[21]

V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in Proc. 26th Annu. Conf. Comput. Graph. Interactive Techn., 1999, pp. 187– 194.

[22]

M. Zia, M. Stark, B. Schiele, and K. Schindler, “Detailed 3D representations for object modeling and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 2608–2623, Nov. 2013.

Digital Library

[23]

Q. Huang, H. Wang, and V. Koltun, “Single-view reconstruction via joint analysis of image and shape collections,” ACM Trans. Graph. , vol. 34, no. 4, 2015, Art. no.

[24]

K. Xu, H. Zheng, H. Zhang, D. Cohen-Or, L. Liu, and Y. Xiong, “ Photo-inspired model-driven 3D object modeling,” ACM Trans. Graph., vol. 30, no. 4, 2011, Art. no.

[25]

C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. A. Efros, “What makes Paris look like Paris?” ACM Trans. Graph., vol. 31, no. 4, 2012, Art. no.

[26]

M. Aubry, B. Russell, and J. Sivic, “ Painting-to-3D model alignment via discriminative visual elements,” ACM Trans. Graph. , vol. 33, no. 2, 2013, Art. no.

[27]

M. Aubry, D. Maturana, A. A. Efros, B. Russell, and J. Sivic, “Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 3762–3769 .

[28]

H. Su, Q. Huang, N. J. Mitra, Y. Li, and L. Guibas, “Estimating image depth using shape collections,” ACM Trans. Graph. , vol. 33, no. 4, 2014, Art. no.

[29]

J. J. Lim, H. Pirsiavash, and A. Torralba., “ Parsing IKEA objects: Fine pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2992–2999.

[30]

C. B. Choy, M. Stark, S. Corbett-Davies, and S. Savarese, “Enriching object detection with 2D-3D registration and continuous viewpoint estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2512–2520.

[31]

J. Shotton et al., “Real-time human pose recognition in parts from a single depth image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1297–1304.

[32]

K. Lai and D. Fox, “ 3D laser scan classification using web data and domain adaptation,” in Proc. Robot. Sci. Syst., 2009, pp. 169–176.

[33]

W. Wohlkinger and M. Vincze, “3D object classification for mobile robots in home-environments using web-data,” in Proc. IEEE 19th Int. Workshop Robot. Alpe-Adria-Danube Region, 2010, pp. 247–252.

[34]

M. Stark, M. Goesele, and B. Schiele, “Back to the future: Learning shape models from 3D CAD data,” in Proc. British Mach. Vis. Conf. , 2010, pp. 106.1–106.11.

[35]

J. Liebelt and C. Schmid, “Multi-view object class detection with a 3D geometric model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2010, pp. 1688–1695.

[36]

A. T. Targhi, J. M. Geusebroek, and A. Zisserman, “ Texture classification with minimal training images,” in Proc. 9th Int. Conf. Pattern Recog., 2008, pp. 1–4.

[37]

W. Li and M. Fritz, “ Recognizing materials from virtual examples,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 345–358.

[38]

B. Kaneva, A. Torralba, and W. Freeman, “Evaluation of image features using a photorealistic virtual world,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2282–289.

[39]

M. Enzweiler and D. M. Gavrila, “A mixed generative-discriminative framework for pedestrian classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2008, pp. 1–8.

[40]

L. Pishchulin, A. Jain, C. Wojek, M. Andriluka, T. Thormaehlen, and B. Schiele, “Learning people detection models from few training samples,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1473–1480.

[41]

L. Pishchulin, A. Jain, C. Wojek, T. Thormaehlen, and B. Schiele, “In good shape: Robust people detection based on appearance and shape,” in Proc. British Mach. Vis. Conf., Sep. 2011, pp. 5.1–5.12.

[42]

B. Pepik, M. Stark, P. Gehler, and B. Schiele, “Teaching 3D geometry to deformable part models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2012, pp. 3362–3369.

[43]

M. Zobel, M. Fritz, and I. Scholz, “Object tracking and pose estimation using light-field object models,” in Proc. Vis. Modeling Vis. Conf., 2002, pp. 371–378.

[44]

H. Su, C. R. Qi, Y. Li, and L. J. Guibas, “Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3D model views,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2686–2694.

[45]

X. Peng, B. Sun, K. Ali, and K. Saenko, “Learning deep object detectors from 3D models,” in Proc. IEEE Conf. Comput. Vis., 2015, pp. 1278–1286.

[46]

M. Deering, S. Winner, B. Schediwy, C. Duffy, and N. Hunt, “The triangle processor and normal vector shader: A VLSI system for high performance graphics,” in Proc. 15th Annu. Conf. Comput. Graph. Interactive Techn. , 1988, vol. 22, no. 4, pp. 21– 30.

[47]

F. Pellacini, J. A. Ferwerda, and D. P. Greenberg, “ Toward a psychophysically-based light reflection model for image synthesis,” in Proc. 27th Annu. Conf. Comput. Graph. Interactive Techn., 2000, pp. 55 –64.

[48]

A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An information-rich 3D model repository,” Stanford University—Princeton University—Toyota Technological Institute at Chicago, arXiv:1512.03012 [cs.GR], 2015.

[49]

S. Zhukov, A. Iones, and G. Kronin, “An ambient light illumination model,” in Proc. Rendering Techn., 1998, pp. 45–55.

[50]

T. Malisiewicz, A. Gupta, and A. A. Efros, “Ensemble of exemplar-SVMs for object detection and beyond,” in Proc. Int. Conf. Comput. Vis. , 2011, pp. 89–96.

[51]

A. Shrivastava, T. Malisiewicz, A. Gupta, and A. A. Efros, “Data-driven visual similarity for cross-domain image matching,” ACM Trans. Graph., vol. 30, 2011, Art. no.

[52]

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2005, pp. 886–893.

[53]

B. Hariharan, J. Malik, and D. Ramanan, “ Discriminative decorrelation for clustering and classification,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 459–472.

[54]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2012 Results,” 2012. [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

[55]

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3431–3440.

[56]

C. Rother, V. Kolmogorov, and A. Blake, “ GrabCut: Interactive foreground extraction using iterated graph cuts,” ACM Trans. Graph. , vol. 23, pp. 309–314, 2004.

Digital Library

[57]

S. Kirkpatrick, C. D. Gelatt Jr, and M. P. Vecchi, “ Optimization by simmulated annealing,” Science, vol. 220, no. 4598, pp. 671–80, 1983.

[58]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2007 Results,” 2007. [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

[59]

D. Hoiem, Y. Chodpathumwan, and Q. Dai, “ Diagnosing error in object detectors,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 340–353.

[60]

M. Ozuysal, V. Lepetit, and P. Fua, “Pose estimation for category specific multiview object localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 778– 785.

[61]

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627– 1645, Sep. 2010.

Digital Library

[62]

W. Hu, “Learning 3D object templates by hierarchical quantization of geometry and appearance spaces,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. , 2012, pp. 2336–2343.

[63]

D.-Y. Chen, X.-P. Tian, Y.-T. Shen, and M. Ouhyoung, “On visual similarity based 3D model retrieval,” Comput. Graph. Forum, vol. 22, no. 3, pp. 223–232, 2003.

[64]

Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond PASCAL: A benchmark for 3D object detection in the wild,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2014, pp. 75–82.

[65]

D. Zhang and G. Lu, “An integrated approach to shape based image retrieval,” in Proc. 5th Asian Conf. Comput. Vis., 2002, pp. 652–657.

[66]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 248–255.

[67]

D. Hoiem, A. A. Efros, and M. Hebert, “Automatic photo pop-up,” ACM Trans. Graph., vol. 24, no. 3, pp. 577–584, 2005.

Digital Library

[68]

A. Saxena, M. Sun, and A. Y. Ng, “Make3D: Learning 3D scene structure from a single still image,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 5, pp. 824– 840, May 2009.

Digital Library

[69]

R. Girshick et al., “Fast R-CNN,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1440–1448.

[70]

Y. He, W.-C. Chiu, M. Keuper, and M. Fritz, “RGBD semantic segmentation using spatio-temporal data-driven pooling,” arXiv:1604.02388 [cs.CV], Apr. 2016, http://arxiv.org/abs/1604.02388

[71]

W.-C. Chiu and M. Fritz, “See the difference: Direct pre-image reconstruction and pose estimation by differentiating HOG,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 468–476.

[72]

A. Sharma, O. Grau, and M. Fritz, “VConv-DAE: Deep volumetric shape learning without object labels,” arXiv:1604.03755 [cs.CV], Apr. 2016, http://arxiv.org/abs/1604.03755

Cited By

Zhang PYang LXie XLai J(2024)Pose Guided Person Image Generation Via Dual-Task Correlation and Affinity LearningIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328639430:8(5111-5128)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3286394
Liu BPeng BZhang ZHuang QLing NLei J(2024)Unsupervised Single-View Synthesis Network via Style Guidance and Prior DistillationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329452134:3(1604-1614)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3294521
Liu BLei JPeng BYu CLi WLing N(2023)Novel View Synthesis from a Single Unposed Image via Unsupervised LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358746719:6(1-23)Online publication date: 31-May-2023
https://dl.acm.org/doi/10.1145/3587467
Show More Cited By

Index Terms

Novel Views of Objects from a Single Image
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation
    2. Shape modeling

Index terms have been assigned to the content through auto-classification.

Recommendations

Image-based rendering of diffuse, specular and glossy surfaces from a single image
SIGGRAPH '01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques

In this paper, we present a new method to recover an approximation of the bidirectional reflectance distribution function (BRDF) of the surfaces present in a real scene. This is done from a single photograph and a 3D geometric model of the scene. The ...
Illuminating image-based objects
PG '97: Proceedings of the 5th Pacific Conference on Computer Graphics and Applications

We present a new scheme of data representation for image-based objects. It allows the illumination to be changed interactively without knowing any geometrical information (e.g. depth or surface normal) of the scene, but the resulting images are ...
Versioning model of image objects for easy development of image database applications
DEXA '96: Proceedings of the 7th International Workshop on Database and Expert Systems Applications

A versioning model is presented for managing and controlling image objects in an image database. Image objects consist of attributes such as pixel data and methods such as image processing operations. Furthermore, image objects can be represented in ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 39, Issue 8

Aug. 2017

208 pages

ISSN:0162-8828

Issue’s Table of Contents

0162-8828 © 2016 IEEE.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 August 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang PYang LXie XLai J(2024)Pose Guided Person Image Generation Via Dual-Task Correlation and Affinity LearningIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328639430:8(5111-5128)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3286394
Liu BPeng BZhang ZHuang QLing NLei J(2024)Unsupervised Single-View Synthesis Network via Style Guidance and Prior DistillationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329452134:3(1604-1614)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3294521
Liu BLei JPeng BYu CLi WLing N(2023)Novel View Synthesis from a Single Unposed Image via Unsupervised LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358746719:6(1-23)Online publication date: 31-May-2023
https://dl.acm.org/doi/10.1145/3587467
Dalai RDalai NSenapati K(2023)An accurate volume estimation on single view object images by deep learning based depth map analysis and 3D reconstructionMultimedia Tools and Applications10.1007/s11042-023-14615-782:18(28235-28258)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.1007/s11042-023-14615-7
Su YLin GSun RWu Q(2022)General Object Pose Transformation Network from Unpaired DataComputer Vision – ECCV 202210.1007/978-3-031-20068-7_17(292-310)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-20068-7_17
Qin ZYin MLin ZYang FZhong C(2021)Three-view generation based on a single front view image for carThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01979-237:8(2195-2205)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1007/s00371-020-01979-2
Li QKalantari N(2020)Synthesizing light field from a single image with variable MPI and two network fusionACM Transactions on Graphics10.1145/3414685.341778539:6(1-10)Online publication date: 27-Nov-2020
https://dl.acm.org/doi/10.1145/3414685.3417785
Yin MSun LLi Q(2020)Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-EncoderComputer Vision – ECCV 202010.1007/978-3-030-58604-1_6(87-103)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-58604-1_6
Liu XChe TLu YYang CLi SYou J(2020)AUTO3D: Novel View Synthesis Through Unsupervisely Learned Variational Viewpoint and Global 3D RepresentationComputer Vision – ECCV 202010.1007/978-3-030-58545-7_4(52-71)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-58545-7_4
Niklaus SMai LYang JLiu F(2019)3D Ken Burns effect from a single imageACM Transactions on Graphics10.1145/3355089.335652838:6(1-15)Online publication date: 8-Nov-2019
https://dl.acm.org/doi/10.1145/3355089.3356528
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents