Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-20086-1_32guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

MvDeCor: Multi-view Dense Correspondence Learning for Fine-Grained 3D Segmentation

Published: 23 October 2022 Publication History
  • Get Citation Alerts
  • Abstract

    We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. This is inspired by the observation that view-based surface representations are more effective at modeling high-resolution surface details and texture than their 3D counterparts based on point clouds or voxel occupancy. Specifically, given a 3D shape, we render it from multiple views, and set up a dense correspondence learning task within the contrastive learning framework. As a result, the learned 2D representations are view-invariant and geometrically consistent, leading to better generalization when trained on a limited number of labeled shapes than alternatives based on self-supervision in 2D or 3D alone. Experiments on textured (RenderPeople) and untextured (PartNet) 3D datasets show that our method outperforms state-of-the-art alternatives in fine-grained part segmentation. The improvements over baselines are greater when only a sparse set of views is available for training or when shapes are textured, indicating that MvDeCor benefits from both 2D processing and 3D geometric reasoning. Project page: https://nv-tlabs.github.io/MvDeCor/.

    References

    [2]
    Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Learning representations and generative models for 3D point clouds. In: International Conference on Machine Learning (2018)
    [3]
    Alliegro, A., Boscaini, D., Tommasi, T.: Joint supervised and self-supervised learning for 3D real world challenges. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6718–6725. IEEE Computer Society (2021)
    [4]
    Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)
    [5]
    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
    [6]
    Chen, Z., Yin, K., Fisher, M., Chaudhuri, S., Zhang, H.: BAE-NET: branched autoencoder for shape co-segmentation. In: International Conference on Computer Vision (ICCV), pp. 8490–8499 (2019)
    [7]
    Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal ConvNets: minkowski convolutional neural networks. In: Computer Vision and Pattern Recognition (CVPR), pp. 3075–3084 (2019)
    [8]
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Advances in Neural Information Processing Systems, vol. 30 (2016)
    [9]
    Gadelha M et al. Vedaldi A, Bischof H, Brox T, Frahm J-M, et al. Label-efficient learning on point clouds using approximate convex decompositions Computer Vision – ECCV 2020 2020 Cham Springer 473-491
    [10]
    Gadelha, M., Wang, R., Maji, S.: Multiresolution tree networks for 3D point cloud processing. In: ECCV (2018)
    [11]
    Genova, K., et al.: Learning 3D semantic segmentation with only 2D image supervision. In: International Conference on 3D Vision (3DV), pp. 361–372. IEEE (2021)
    [12]
    Ginzburg D and Raviv D Vedaldi A, Bischof H, Brox T, and Frahm J-M Cyclic functional mapping: self-supervised correspondence between non-isometric deformable shapes Computer Vision – ECCV 2020 2020 Cham Springer 36-52
    [13]
    Goyal, A., Law, H., Liu, B., Newell, A., Deng, J.: Revisiting point cloud shape classification with a simple and effective baseline. In: International Conference on Machine Learning (2021)
    [14]
    Grill, J.B., et al.: Bootstrap your own latent - a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
    [15]
    Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a papier-Mâché approach to learning 3D surface generation. In: Computer Vision and Pattern Recognition (CVPR) (2018)
    [16]
    Halimi, O., Litany, O., Rodola, E., Bronstein, A.M., Kimmel, R.: Unsupervised learning of dense shape correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4370–4379 (2019)
    [17]
    Hamdi, A., Giancola, S., Ghanem, B.: MVTN: multi-view transformation network for 3D shape recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1–11 (2021)
    [18]
    Hassani, K., Haley, M.: Unsupervised multi-task feature learning on point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8160–8171 (2019)
    [19]
    He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735 (2020).
    [20]
    Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15587–15597 (2021)
    [21]
    Hou, J., Xie, S., Graham, B., Dai, A., Nießner, M.: Pri3d: can 3D priors help 2D representation learning? In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5673–5682 (2021).
    [22]
    Huang H, Kalogerakis E, Chaudhuri S, Ceylan D, Kim VG, and Yumer E Learning local shape descriptors from part correspondences with multiview convolutional networks ACM Trans. Graph. 2018 37 1 1-14
    [23]
    Kalogerakis, E., Averkiou, M., Maji, S., Chaudhuri, S.: 3D shape segmentation with projective convolutional networks. In: Proceedings of CVPR (2017)
    [24]
    Kawana, Y., Mukuta, Y., Harada, T.: Neural star domain as primitive representation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 2020. Curran Associates Inc., Red Hook (2020)
    [25]
    Kundu A et al. Vedaldi A, Bischof H, Brox T, Frahm J-M, et al. Virtual multi-view fusion for 3D semantic segmentation Computer Vision – ECCV 2020 2020 Cham Springer 518-535
    [26]
    Lien, J.M., Amato, N.M.: Approximate convex decomposition of polyhedra. In: ACM Symposium on Solid and Physical Modeling (2007)
    [27]
    Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Computer Vision and Pattern Recognition (CVPR), pp. 5695–5703 (2016)
    [28]
    Mo, K., et al.: Partnet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: Computer Vision and Pattern Recognition (CVPR) (2019)
    [29]
    Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
    [30]
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of CVPR (2017)
    [31]
    Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
    [32]
    Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. In: Advances in Neural Information Processing Systems, pp. 12942–12952 (2019)
    [33]
    Sharma, G., et al.: PriFit: learning to fit primitives improves few shot point cloud segmentation. In: Computer Graphics Forum (2022).
    [34]
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of ICCV (2015)
    [35]
    Su, J.C., Gadelha, M., Wang, R., Maji, S.: A deeper look at 3D shape classifiers. In: European Conference on Computer Vision Workshops (ECCV) (2018)
    [36]
    Tulsiani, S., Su, H., Guibas, L.J., Efros, A.A., Malik, J.: Learning shape abstractions by assembling volumetric primitives. In: Computer Vision and Pattern Regognition (CVPR) (2017)
    [37]
    Wang, L., Li, X., Fang, Y.: Few-shot learning of part-specific probability space for 3D shape segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4504–4513 (2020)
    [38]
    Wang, P.S., Yang, Y.Q., Zou, Q.F., Wu, Z., Liu, Y., Tong, X.: Unsupervised 3D learning for shape analysis via multiresolution instance discrimination (2021)
    [39]
    Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
    [40]
    Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3733–3742 (2018)
    [41]
    Xie S, Gu J, Guo D, Qi CR, Guibas L, and Litany O Vedaldi A, Bischof H, Brox T, and Frahm J-M PointContrast: unsupervised pre-training for 3D point cloud understanding Computer Vision – ECCV 2020 2020 Cham Springer 574-591
    [42]
    Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: Computer Vision and Pattern Recognition (CVPR), pp. 206–215 (2018)
    [43]
    Yi L et al. A scalable active framework for region annotation in 3D shape collections ACM Trans. Graph. 2016 35 6 1-12
    [44]
    Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: International Conference on Computer Vision (ICCV), pp. 10252–10263 (2021)
    [45]
    Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.J.: In-place scene labelling and understanding with implicit scene representation. In: International Conference on Computer Vision (ICCV), pp. 15838–15847 (2021)

    Index Terms

    1. MvDeCor: Multi-view Dense Correspondence Learning for Fine-Grained 3D Segmentation
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II
            Oct 2022
            805 pages
            ISBN:978-3-031-20085-4
            DOI:10.1007/978-3-031-20086-1

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 23 October 2022

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0

            Other Metrics

            Citations

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media