Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Novel Views of Objects from a Single Image

Published: 01 August 2017 Publication History

Abstract

Taking an image of an object is at its core a lossy process. The rich information about the three-dimensional structure of the world is flattened to an image plane and decisions such as viewpoint and camera parameters are final and not easily revertible. As a consequence, possibilities of changing viewpoint are limited. Given a single image depicting an object, novel-view synthesis is the task of generating new images that render the object from a different viewpoint than the one given. The main difficulty is to synthesize the parts that are disoccluded; disocclusion occurs when parts of an object are hidden by the object itself under a specific viewpoint. In this work, we show how to improve novel-view synthesis by making use of the correlations observed in 3D models and applying them to new image instances. We propose a technique to use the structural information extracted from a 3D model that matches the image object in terms of viewpoint and shape. For the latter part, we propose an efficient 2D-to-3D alignment method that associates precisely the image appearance with the 3D model geometry with minimal user interaction. Our technique is able to simulate plausible viewpoint changes for a variety of object classes within seconds. Additionally, we show that our synthesized images can be used as additional training data that improves the performance of standard object detectors.

References

[1]
T. Beier and S. Neely, “Feature-based image metamorphosis,” in Proc. 19th Annu. Conf. Comput. Graph. Interactive Techn., 1992, vol. 26, no. 2, pp. 35–42.
[2]
S. E. Chen and L. Williams, “View interpolation for image synthesis,” in Proc. 20th Annu. Conf. Comput. Graph. Interactive Techn., 1993, pp. 279–288.
[3]
Y. Horry, K.-I. Anjyo, and K. Arai, “Tour into the picture: Using a spidery mesh interface to make animation from a single image,” in Proc. 24th Annu. Conf. Comput. Graph. Interactive Techn., 1997, pp. 225 –232.
[4]
S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” in Proc. 23rd Annu. Conf. Comput. Graph. Interactive Techn., 1996, pp. 43–54.
[5]
P. E. Debevec, C. J. Taylor, and J. Malik, “Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach,” in Proc. 23rd Annu. Conf. Comput. Graph. Interactive Techn., 1996, pp. 11 –20.
[6]
N. Kholgade, T. Simon, A. Efros, and Y. Sheikh, “3D object manipulation in a single photograph using stock 3D models,” ACM Trans. Graph., vol. 33, no. 4, 2014, Art. no.
[7]
D. N. Wood et al., “Surface light fields for 3D photography,” in Proc. 27th Annu. Conf. Comput. Graph. Interactive Techn., 2000, pp. 287–296.
[8]
P. Debevec, “Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography,” in Proc. 25th Annu. Conf. Comput. Graph. Interactive Techn., 1998, pp. 189 –198.
[9]
A. Criminisi, P. Perez, and K. Toyama, “Object removal by exemplar-based inpainting,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2003, vol. 2, pp. 721 –728.
[10]
C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, “PatchMatch: A randomized correspondence algorithm for structural image editing,” in Proc. ACM SIGGRAPH, 2009, Art. no.
[11]
P.-P. J. Sloan, W. Martin, A. Gooch, and B. Gooch, “The lit sphere: A model for capturing NPR shading from art,” in Proc. Graph. Interface, 2001, pp. 143–150.
[12]
A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin, “Image analogies,” in Proc. 28th Annu. Conf. Comput. Graph. Interactive Techn., 2001, pp. 327– 340.
[13]
A. Jain, T. Thormählen, T. Ritschel, and H.-P. Seidel, “Material memex: Automatic material suggestions for 3D objects,” ACM Trans. Graph., vol. 31, no. 6, pp. 143:1–143:8, 2012.
[14]
J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph., vol. 26, 2007, Art. no.
[15]
M. Pauly, N. J. Mitra, J. Giesen, M. H. Gross, and L. J. Guibas, “Example-based 3D scan completion,” in Proc. 3rd Eurographics Symp. Geometry Process., 2005, Art. no.
[16]
A. van den Hengel, A. Dick, T. Thormählen, B. Ward, and P. H. S. Torr, “VideoTrace: Rapid interactive scene modelling from video,” in Proc. ACM SIGGRAPH, 2007, Art. no.
[17]
H. Wu and K. Zhou, “ AppFusion: Interactive appearance acquisition using a Kinect sensor,” Comput. Graph. Forum, vol. 34, no. 6, pp. 289–298, 2015.
[18]
S. Georgoulis, K. Rematas, T. Ritschel, M. Fritz, L. V. Gool, and T. Tuytelaars, “DeLight-Net: Decomposing reflectance maps into specular materials and natural illumination,” arXiv:1602.00328, 2016, http://arxiv.org/pdf/1603.08240v1.pdf
[19]
K. Rematas, T. Ritschel, M. Fritz, E. Gavves, and T. Tuytelaars, “Deep reflectance maps,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 3898–3905.
[20]
B. K. Horn and R. W. Sjoberg, “Calculating the reflectance map,” Appl. Opt., vol. 18, no. 11, pp. 1770–1779, 1979.
[21]
V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in Proc. 26th Annu. Conf. Comput. Graph. Interactive Techn., 1999, pp. 187– 194.
[22]
M. Zia, M. Stark, B. Schiele, and K. Schindler, “Detailed 3D representations for object modeling and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 2608–2623, Nov. 2013.
[23]
Q. Huang, H. Wang, and V. Koltun, “Single-view reconstruction via joint analysis of image and shape collections,” ACM Trans. Graph. , vol. 34, no. 4, 2015, Art. no.
[24]
K. Xu, H. Zheng, H. Zhang, D. Cohen-Or, L. Liu, and Y. Xiong, “ Photo-inspired model-driven 3D object modeling,” ACM Trans. Graph., vol. 30, no. 4, 2011, Art. no.
[25]
C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. A. Efros, “What makes Paris look like Paris?” ACM Trans. Graph., vol. 31, no. 4, 2012, Art. no.
[26]
M. Aubry, B. Russell, and J. Sivic, “ Painting-to-3D model alignment via discriminative visual elements,” ACM Trans. Graph. , vol. 33, no. 2, 2013, Art. no.
[27]
M. Aubry, D. Maturana, A. A. Efros, B. Russell, and J. Sivic, “Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 3762–3769 .
[28]
H. Su, Q. Huang, N. J. Mitra, Y. Li, and L. Guibas, “Estimating image depth using shape collections,” ACM Trans. Graph. , vol. 33, no. 4, 2014, Art. no.
[29]
J. J. Lim, H. Pirsiavash, and A. Torralba., “ Parsing IKEA objects: Fine pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2992–2999.
[30]
C. B. Choy, M. Stark, S. Corbett-Davies, and S. Savarese, “Enriching object detection with 2D-3D registration and continuous viewpoint estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2512–2520.
[31]
J. Shotton et al., “Real-time human pose recognition in parts from a single depth image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1297–1304.
[32]
K. Lai and D. Fox, “ 3D laser scan classification using web data and domain adaptation,” in Proc. Robot. Sci. Syst., 2009, pp. 169–176.
[33]
W. Wohlkinger and M. Vincze, “3D object classification for mobile robots in home-environments using web-data,” in Proc. IEEE 19th Int. Workshop Robot. Alpe-Adria-Danube Region, 2010, pp. 247–252.
[34]
M. Stark, M. Goesele, and B. Schiele, “Back to the future: Learning shape models from 3D CAD data,” in Proc. British Mach. Vis. Conf. , 2010, pp. 106.1–106.11.
[35]
J. Liebelt and C. Schmid, “Multi-view object class detection with a 3D geometric model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2010, pp. 1688–1695.
[36]
A. T. Targhi, J. M. Geusebroek, and A. Zisserman, “ Texture classification with minimal training images,” in Proc. 9th Int. Conf. Pattern Recog., 2008, pp. 1–4.
[37]
W. Li and M. Fritz, “ Recognizing materials from virtual examples,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 345–358.
[38]
B. Kaneva, A. Torralba, and W. Freeman, “Evaluation of image features using a photorealistic virtual world,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2282–289.
[39]
M. Enzweiler and D. M. Gavrila, “A mixed generative-discriminative framework for pedestrian classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2008, pp. 1–8.
[40]
L. Pishchulin, A. Jain, C. Wojek, M. Andriluka, T. Thormaehlen, and B. Schiele, “Learning people detection models from few training samples,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1473–1480.
[41]
L. Pishchulin, A. Jain, C. Wojek, T. Thormaehlen, and B. Schiele, “In good shape: Robust people detection based on appearance and shape,” in Proc. British Mach. Vis. Conf., Sep. 2011, pp. 5.1–5.12.
[42]
B. Pepik, M. Stark, P. Gehler, and B. Schiele, “Teaching 3D geometry to deformable part models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2012, pp. 3362–3369.
[43]
M. Zobel, M. Fritz, and I. Scholz, “Object tracking and pose estimation using light-field object models,” in Proc. Vis. Modeling Vis. Conf., 2002, pp. 371–378.
[44]
H. Su, C. R. Qi, Y. Li, and L. J. Guibas, “Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3D model views,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2686–2694.
[45]
X. Peng, B. Sun, K. Ali, and K. Saenko, “Learning deep object detectors from 3D models,” in Proc. IEEE Conf. Comput. Vis., 2015, pp. 1278–1286.
[46]
M. Deering, S. Winner, B. Schediwy, C. Duffy, and N. Hunt, “The triangle processor and normal vector shader: A VLSI system for high performance graphics,” in Proc. 15th Annu. Conf. Comput. Graph. Interactive Techn. , 1988, vol. 22, no. 4, pp. 21– 30.
[47]
F. Pellacini, J. A. Ferwerda, and D. P. Greenberg, “ Toward a psychophysically-based light reflection model for image synthesis,” in Proc. 27th Annu. Conf. Comput. Graph. Interactive Techn., 2000, pp. 55 –64.
[48]
A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An information-rich 3D model repository,” Stanford University—Princeton University—Toyota Technological Institute at Chicago, arXiv:1512.03012 [cs.GR], 2015.
[49]
S. Zhukov, A. Iones, and G. Kronin, “An ambient light illumination model,” in Proc. Rendering Techn., 1998, pp. 45–55.
[50]
T. Malisiewicz, A. Gupta, and A. A. Efros, “Ensemble of exemplar-SVMs for object detection and beyond,” in Proc. Int. Conf. Comput. Vis. , 2011, pp. 89–96.
[51]
A. Shrivastava, T. Malisiewicz, A. Gupta, and A. A. Efros, “Data-driven visual similarity for cross-domain image matching,” ACM Trans. Graph., vol. 30, 2011, Art. no.
[52]
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2005, pp. 886–893.
[53]
B. Hariharan, J. Malik, and D. Ramanan, “ Discriminative decorrelation for clustering and classification,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 459–472.
[54]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2012 Results,” 2012. [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
[55]
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3431–3440.
[56]
C. Rother, V. Kolmogorov, and A. Blake, “ GrabCut: Interactive foreground extraction using iterated graph cuts,” ACM Trans. Graph. , vol. 23, pp. 309–314, 2004.
[57]
S. Kirkpatrick, C. D. Gelatt Jr, and M. P. Vecchi, “ Optimization by simmulated annealing,” Science, vol. 220, no. 4598, pp. 671–80, 1983.
[58]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2007 Results,” 2007. [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
[59]
D. Hoiem, Y. Chodpathumwan, and Q. Dai, “ Diagnosing error in object detectors,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 340–353.
[60]
M. Ozuysal, V. Lepetit, and P. Fua, “Pose estimation for category specific multiview object localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 778– 785.
[61]
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627– 1645, Sep. 2010.
[62]
W. Hu, “Learning 3D object templates by hierarchical quantization of geometry and appearance spaces,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. , 2012, pp. 2336–2343.
[63]
D.-Y. Chen, X.-P. Tian, Y.-T. Shen, and M. Ouhyoung, “On visual similarity based 3D model retrieval,” Comput. Graph. Forum, vol. 22, no. 3, pp. 223–232, 2003.
[64]
Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond PASCAL: A benchmark for 3D object detection in the wild,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2014, pp. 75–82.
[65]
D. Zhang and G. Lu, “An integrated approach to shape based image retrieval,” in Proc. 5th Asian Conf. Comput. Vis., 2002, pp. 652–657.
[66]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 248–255.
[67]
D. Hoiem, A. A. Efros, and M. Hebert, “Automatic photo pop-up,” ACM Trans. Graph., vol. 24, no. 3, pp. 577–584, 2005.
[68]
A. Saxena, M. Sun, and A. Y. Ng, “Make3D: Learning 3D scene structure from a single still image,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 5, pp. 824– 840, May 2009.
[69]
R. Girshick et al., “Fast R-CNN,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1440–1448.
[70]
Y. He, W.-C. Chiu, M. Keuper, and M. Fritz, “RGBD semantic segmentation using spatio-temporal data-driven pooling,” arXiv:1604.02388 [cs.CV], Apr. 2016, http://arxiv.org/abs/1604.02388
[71]
W.-C. Chiu and M. Fritz, “See the difference: Direct pre-image reconstruction and pose estimation by differentiating HOG,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 468–476.
[72]
A. Sharma, O. Grau, and M. Fritz, “VConv-DAE: Deep volumetric shape learning without object labels,” arXiv:1604.03755 [cs.CV], Apr. 2016, http://arxiv.org/abs/1604.03755

Cited By

View all
  • (2024)Pose Guided Person Image Generation Via Dual-Task Correlation and Affinity LearningIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328639430:8(5111-5128)Online publication date: 1-Aug-2024
  • (2024)Unsupervised Single-View Synthesis Network via Style Guidance and Prior DistillationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329452134:3(1604-1614)Online publication date: 1-Mar-2024
  • (2023)Novel View Synthesis from a Single Unposed Image via Unsupervised LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358746719:6(1-23)Online publication date: 31-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 39, Issue 8
Aug. 2017
208 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 August 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Pose Guided Person Image Generation Via Dual-Task Correlation and Affinity LearningIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328639430:8(5111-5128)Online publication date: 1-Aug-2024
  • (2024)Unsupervised Single-View Synthesis Network via Style Guidance and Prior DistillationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329452134:3(1604-1614)Online publication date: 1-Mar-2024
  • (2023)Novel View Synthesis from a Single Unposed Image via Unsupervised LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358746719:6(1-23)Online publication date: 31-May-2023
  • (2023)An accurate volume estimation on single view object images by deep learning based depth map analysis and 3D reconstructionMultimedia Tools and Applications10.1007/s11042-023-14615-782:18(28235-28258)Online publication date: 21-Feb-2023
  • (2022)General Object Pose Transformation Network from Unpaired DataComputer Vision – ECCV 202210.1007/978-3-031-20068-7_17(292-310)Online publication date: 23-Oct-2022
  • (2021)Three-view generation based on a single front view image for carThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01979-237:8(2195-2205)Online publication date: 1-Aug-2021
  • (2020)Synthesizing light field from a single image with variable MPI and two network fusionACM Transactions on Graphics10.1145/3414685.341778539:6(1-10)Online publication date: 27-Nov-2020
  • (2020)Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-EncoderComputer Vision – ECCV 202010.1007/978-3-030-58604-1_6(87-103)Online publication date: 23-Aug-2020
  • (2020)AUTO3D: Novel View Synthesis Through Unsupervisely Learned Variational Viewpoint and Global 3D RepresentationComputer Vision – ECCV 202010.1007/978-3-030-58545-7_4(52-71)Online publication date: 23-Aug-2020
  • (2019)3D Ken Burns effect from a single imageACM Transactions on Graphics10.1145/3355089.335652838:6(1-15)Online publication date: 8-Nov-2019
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media