Inferring 3D Shapes from Image Collections Using Adversarial Networks

Gadelha, Matheus; Rai, Aartika; Maji, Subhransu; Wang, Rui

doi:10.1007/s11263-020-01335-w

Inferring 3D Shapes from Image Collections Using Adversarial Networks

Published: 24 June 2020

Volume 128, pages 2651–2664, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Matheus Gadelha ORCID: orcid.org/0000-0002-4971-7980¹,
Aartika Rai¹,
Subhransu Maji¹ &
…
Rui Wang¹

766 Accesses
4 Citations
Explore all metrics

Abstract

We investigate the problem of learning a probabilistic distribution over three-dimensional shapes given two-dimensional views of multiple objects taken from unknown viewpoints. Our approach called projective generative adversarial network (PrGAN) trains a deep generative model of 3D shapes whose projections (or renderings) matches the distribution of the provided 2D views. The addition of a differentiable projection module allows us to infer the underlying 3D shape distribution without access to any explicit 3D or viewpoint annotation during the learning phase. We show that our approach produces 3D shapes of comparable quality to GANs trained directly on 3D data. Experiments also show that the disentangled representation of 2D shapes into geometry and viewpoint leads to a good generative model of 2D shapes. The key advantage of our model is that it estimates 3D shape, viewpoint, and generates novel views from an input image in a completely unsupervised manner. We further investigate how the generative models can be improved if additional information such as depth, viewpoint or part segmentations is available at training time. To this end, we present new differentiable projection operators that can be used to learn better 3D generative models. Our experiments show that PrGAN can successfully leverage extra visual cues to create more diverse and accurate shapes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Shape generation via learning an adaptive multimodal prior

Article 20 March 2024

Learning Shape Priors for Single-View 3D Completion And Reconstruction

Planes vs. Chairs: Category-Guided 3D Shape Learning Without any 3D Cues

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Notes

We later relax this assumption to incorporate extra supervision.

References

Achlioptas, P., Diamanti, O., Mitliagkas, I., & Guibas, L. J. (2017). Learning representations and generative models for 3d point clouds. arXiv preprint arXiv:1707.02392.
Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In Computer vision and pattern recognition (CVPR). IEEE.
Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. Transactions of Pattern Analysis and Machine Intelligence (PAMI), 37, 1670–1687.
Article Google Scholar
Barrow, H., & Tenenbaum, J. (1978). Recovering intrinsic scene characteristics. In A. Hanson & E. Riseman (Eds.), Comput. vis. syst. (pp. 3–26).
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194). ACM Press/Addison-Wesley Publishing Co.
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., & Su, H., et al. (2015). Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012.
Cheng, Z., Gadelha, M., Maji, S., & Sheldon, D. (2019). A Bayesian perspective on the deep image prior. In The IEEE conference on computer vision and pattern recognition (CVPR).
Dosovitskiy, A., Tobias Springenberg, J., & Brox, T. (2015). Learning to generate chairs with convolutional neural networks. In Conference on computer vision and pattern recognition (CVPR).
Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In International conference on computer vision (ICCV).
Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3D object reconstruction from a single image. In Computer vision and pattern recognition (CVPR).
Gadelha, M., Maji, S., & Wang, R. (2017). 3D shape generation using spatially ordered point clouds. In British machine vision conference (BMVC).
Gadelha, M., Maji, S., & Wang, R. (2017). 3D shape induction from 2D views of multiple objects. In International conference on 3D vision (3DV).
Gadelha, M., Wang, R., & Maji, S. (2018). Multiresolution tree networks for 3D point cloud processing. In European conference on computer vision (ECCV).
Gadelha, M., Wang, R., & Maji, S. (2019). Shape reconstruction using differentiable projections and deep priors. In International conference on computer vision (ICCV).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (NIPS).
Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., & Smola, A. J. (2006). A kernel method for the two-sample-problem. In Advances in neural information processing systems (NIPS).
Groueix, T., Fisher, M., Kim, V. G., Russell, B., & Aubry, M. (2018). AtlasNet: A Papier-Mâché approach to learning 3D surface generation. In Computer vision and pattern recognition (CVPR).
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
MATH Google Scholar
Henderson, P., & Ferrari, V. (2018). Learning to generate and reconstruct 3D meshes with only 2D supervision. In British machine vision conference (BMVC).
Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In International conference on computer vision (ICCV).
Kanazawa, A., Tulsiani, S., Efros, A. A., & Malik, J. (2018). Learning category-specific mesh reconstruction from image collections. In European conference on computer vision (ECCV).
Kar, A., Tulsiani, S., Carreira, J., & Malik, J. (2015). Category-specific object reconstruction from a single image. In Computer vision and pattern recognition (CVPR).
Kato, H., Ushiku, Y., & Harada, T. (2018). Neural 3d mesh renderer. In Computer vision and pattern recognition (CVPR).
Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. (2015). Deep convolutional inverse graphics network. In Advances in neural information processing systems (NIPS).
Land, E. H., & McCann, J. J. (1971). Lightness and retinex theory. JOSA, 61(1), 1–11.
Article Google Scholar
Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(2), 150–162.
Article Google Scholar
Li, T.-M., Aittala, M., Durand, F., & Lehtinen, J. (2018). Differentiable monte carlo ray tracing through edge sampling. ACM Transactions on Graph (SIGGRAPH Asia), 37, 1–11.
Google Scholar
Lin, C.-H., Kong, C., & Lucey, S. (2018). Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI conference on artificial intelligence (AAAI).
Liu, H. T. D., Tao, M., & Jacobson, A. (2018). Paparazzi: Surface editing by way of multi-view image processing. ACM Transactions on Graphcs, 37, 221.
Google Scholar
Lun, Z., Gadelha, M., Kalogerakis, E., Maji, S., & Wang, R. (2017). 3D shape reconstruction from sketches via multi-view convolutional networks. In International conference on 3D vision (3DV) (pp. 67–77).
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning (ICML).
Nalbach, O., Arabadzhiyska, E., Mehta, D., Seidel, H.-P., & Ritschel, T. (2016). Deep shading: Convolutional neural networks for screen-space shading. arXiv preprint arXiv:1603.06078.
Nguyen-Phuoc, T., Li, C., Balaban, S., & Yang, Y.-L. (2018). Rendernet: A deep convolutional network for differentiable rendering from 3d shapes. In Advances in neural information processing systems 31.
Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard artifacts. Distill.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Rezende, D. J., Eslami, S. M., Mohamed, S., Battaglia, P., Jaderberg, M., & Heess, N. (2016). Unsupervised learning of 3D structure from images. In Advances in neural information processing systems (NIPS).
Savarese, S., & Fei-Fei, L. (2007). 3D generic object categorization, localization and pose estimation. In International conference on computer vision (ICCV).
Saxena, A., Chung, S. H., & Ng, A. (2005). Learning depth from single monocular images. In Advances in neural information processing systems (NIPS).
Schwing, A. G., & Urtasun, R. (2012). Efficient exact inference for 3d indoor scene understanding. In European conference on computer vision (ECCV).
Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for CNN: Viewpoint estimation in images using cnns trained with rendered 3D model views. In International conference on computer vision (ICCV).
Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3D models from single images with a convolutional network. In European conference on computer vision (ECCV).
Tulsiani, S., Carreira, J., & Malik, J.. (2015). Pose induction for novel object categories. In International conference on computer vision (ICCV).
Tulsiani, S., Efros, A. A., & Malik, J. (2018). Multi-view consistency as supervisory signal for learning shape and pose prediction. In Computer vision and pattern regognition (CVPR).
Tulsiani, S., Zhou, T., Efros, A. A., & Malik, J. (2017). Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Computer vision and pattern regognition (CVPR).
Woodham, R. J. (1980). Photometric method for determining surface orientation from multiple images. Optical Engineering, 19(1), 191139–191139.
Article Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Conference on computer vision and pattern recognition (CVPR).
Wu, J., Zhang, C., Xue, T., Freeman, W. T., & Tenenbaum, J. B. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In Advances in neural information processing systems (NIPS).
Yan, X., Yang, J., Yumer, E., Guo, Y., & Lee, H. (2016). Perspective transformer nets: Learning single-view 3D object reconstruction without 3D supervision. In Advances in neural information processing systems.
Zhou, T., Tulsiani, S., Sun, W., Malik, J., & Efros, A. A. (2016). View synthesis by appearance flow. In European conference on computer vision (ECCV).

Download references

Acknowledgements

This research was supported in part by the NSF Grants 1617917, 1749833, 1661259 and 1908669. The experiments were performed using equipment obtained under a grant from the Collaborative R&D Fund managed by the Massachusetts Tech Collaborative.

Author information

Authors and Affiliations

College of Information and Computer Sciences, University of Massachusetts Amherst, 140 Governors Dr, Amherst, MA, 01003, USA
Matheus Gadelha, Aartika Rai, Subhransu Maji & Rui Wang

Authors

Matheus Gadelha
View author publications
You can also search for this author in PubMed Google Scholar
Aartika Rai
View author publications
You can also search for this author in PubMed Google Scholar
Subhransu Maji
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matheus Gadelha.

Additional information

Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gadelha, M., Rai, A., Maji, S. et al. Inferring 3D Shapes from Image Collections Using Adversarial Networks. Int J Comput Vis 128, 2651–2664 (2020). https://doi.org/10.1007/s11263-020-01335-w

Download citation

Received: 16 May 2019
Accepted: 25 April 2020
Published: 24 June 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11263-020-01335-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring 3D Shapes from Image Collections Using Adversarial Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Shape generation via learning an adaptive multimodal prior

Learning Shape Priors for Single-View 3D Completion And Reconstruction

Planes vs. Chairs: Category-Guided 3D Shape Learning Without any 3D Cues

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Inferring 3D Shapes from Image Collections Using Adversarial Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Shape generation via learning an adaptive multimodal prior

Learning Shape Priors for Single-View 3D Completion And Reconstruction

Planes vs. Chairs: Category-Guided 3D Shape Learning Without any 3D Cues

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation