Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Trusted 3D self-supervised representation learning with cross-modal settings

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Cross-modal setting employing 2D images and 3D point clouds in self-supervised representation learning is proven to be an effective way to enhance visual perception capabilities. However, different modalities have different data formats and representations. Directly using features extracted from cross-modal datasets may lead to information conflicting and collapsing. We refer to this problem as uncertainty in network learning. Therefore, reducing uncertainty to obtain trusted descriptions has become the key to improving network performance. Motivated by this, we propose our trusted cross-modal network in self-supervised learning (TCMSS). It can obtain trusted descriptions by a trusted combination module as well as improve network performance with a well-designed loss function. In the trusted combination module, we utilize the Dirichlet distribution and the subjective logic to parameterize the features and acquire probabilistic uncertainty at the same. Then, the Dempster-Shafer Theory (DST) is used to obtain trusted descriptions by weighting uncertainty to the parameterized results. We have also designed our trusted domain loss function, including domain loss and trusted loss. It can effectively improve the prediction accuracy of the network by applying contrastive learning between different feature descriptions. The experimental results show that our model outperforms previous results on linear classification in ScanObjectNN as well as few-shot classification in both ModelNet40 and ScanObjectNN. In addition, part segmentation also reports a superior result to previous methods in ShapeNet. Further, the ablation studies validate the potency of our method for a better point cloud understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Cheng, H., Zhu, J., Lu, J., Han, X.: EDGCNet: joint dynamic hyperbolic graph convolution and dual squeeze-and-attention for 3D point cloud segmentation. Expert Syst. Appl. 237, 121551 (2023)

    Article  Google Scholar 

  2. Huang, S., Xie, Y., Zhu, S.-C., Zhu, Y.: Spatio-temporal self-supervised representation learning for point clouds. In: Paper presented at the IEEE international conference on computer vision, 6535–6545 (2021)

  3. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for classification and segmentation. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 652–660 (2017)

  4. Xing, C., Rostamzadeh, N., Oreshkin, B., Pinheiro, P.O.: Adaptive cross-modal few-shot learning. Adv. Neural Inf. Process. Syst. 32, (2019)

  5. Xu, M., Ding, R., Zhao, H., Qi, X.: PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 3173–3182 (2021)

  6. Cheng, H., Lu, J., Luo, M., Liu, W., Zhang, K.: PTANet: triple attention network for point cloud semantic segmentation. Eng. Appl. Artif. Intell. 102, 104239 (2021)

    Article  Google Scholar 

  7. Yuan, Q., Chang, J., Luo, Y., Ma, T., Wang, D.: Automatic cables segmentation from a substation device based on 3D point cloud. Mach. Vision Appl. 3, 9 (2023)

    Article  Google Scholar 

  8. Lu, J., Cheng, H., Luo, M., Liu, T., Zhang, K.: PUConv: upsampling convolutional network for point cloud semantic segmentation. Electron. Lett. 56(9), 435–438 (2020)

    Article  Google Scholar 

  9. Kamath, A., Singh, M., LeCun, Y., Synnaeve, G., Misra, I., Carion, N.: MDETR-modulated detection for end-to-end multi-modal understanding. In: Paper presented at the IEEE international conference on computer vision, 1780–1790 (2021)

  10. Liu, Z., Zhao, X., Huang, T., Hu, R., Zhou, Y., Bai, X.: TANet: robust object detection from point clouds with triple attention. In: Paper presented at the AAAI conference on artificial intelligence, 11677–11684 (2020)

  11. Morgado, P., Vasconcelos, N., Misra, I.: Audio-visual instance discrimination with cross-modal agreement. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 12475–12486 (2021)

  12. Jaritz, M., Vu, T.-H., De Charette, R., Wirbel, É., Pérez, P.: Cross-modal learning for domain adaptation in semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1533–1544 (2022)

    Article  Google Scholar 

  13. Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: Crosspoint: self-supervised cross-modal contrastive learning for point cloud understanding. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 9902–9912 (2022)

  14. Jing, L., Zhang, L., Tian, Y.: Self-supervised feature learning by cross-modality and cross-view correspondences. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 1581–1591 (2021)

  15. Afham, M., Khan, S., Khan, M.H., Naseer, M., Khan, F.S.: Rich semantics improve few-shot learning. arXiv:2104.12709 (2021)

  16. Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: Paper presented at the international conference on machine learning, 1613–1622 (2015)

  17. Neal, R.M.: Bayesian learning for neural networks, vol. 118. Springer, Boston (2012)

    Google Scholar 

  18. Tang, L., Zhan, Y., Chen, Z., Yu, B., Tao, D.: Contrastive boundary learning for point cloud segmentation. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 8489–8499 (2022)

  19. Wang, P.-S., Yang, Y.-Q., Zou, Q.-F., Wu, Z., Liu, Y., Tong, X.: Unsupervised learning for shape analysis via multiresolution instance discrimination. In: Paper presented at the AAAI conference on artificial intelligence, 2773–2781 (2021)

  20. Wang, H., Liu, Q., Yue, X., Lasenby, J., Kusner, M.J.: Unsupervised point cloud pre-training via occlusion completion. In: Paper presented at the IEEE international conference on computer vision, 9782–9792 (2021)

  21. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Gr. 38(5), 1–12 (2019)

    Article  Google Scholar 

  22. Jsang, A.: Subjective logic: a formalism for reasoning under uncertainty, (2018)

  23. Han, Z., Zhang, C., Fu, H., Zhou, J.T.: Trusted multi-view classification with dynamic evidential fusion. IEEE Trans. Pattern Anal. Mach. Intell. 45, 2551–2566 (2022)

    Article  Google Scholar 

  24. Sariyildiz, M.B., Perez, J., Larlus, D.: Learning visual representations with caption annotations. In: Paper presented at the European conference on computer vision, 153–170 (2020)

  25. Huang, S., Xie, Y., Zhu, S.-C., Zhu, Y.: Spatio-temporal self-supervised representation learning for point clouds. In: Paper presented at the IEEE international conference on computer vision, 6535–6545 (2021)

  26. Yang, G., Huang, X., Hao, Z., Liu, M.-Y., Belongie, S., Hariharan, B.: PointFlow: point cloud generation with continuous normalizing flows. In: Paper presented at the IEEE international conference on computer vision, 4541–4550 (2019)

  27. Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Paper presented at the international conference on machine learning, 1050–1059 (2016)

  28. Dempster, A.P.: A generalization of bayesian inference. J. Roy. Stat. Soc.: Ser. B (Methodol.) 30(2), 205–232 (1968)

    Article  MathSciNet  Google Scholar 

  29. Shafer, G.A.: A mathematical theory of evidence. Technometrics 20(1), 106–106 (1978)

    Article  Google Scholar 

  30. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 9729–9738 (2020)

  31. Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020)

    Google Scholar 

  32. Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: PointContrast: unsupervised pre-training for point cloud understanding. In: Paper presented at the European conference on computer vision, 574–591 (2020)

  33. Gadelha, M., Wang, R., Maji, S.: Multiresolution tree networks for point cloud processing. In: Paper presented at the European conference on computer vision, 103–118 (2018)

  34. Zheng, L., Ma, W., Cai, Y., Lu, T., Wang, S.: GPDAN: Grasp pose domain adaptation network for sim-to-real 6-DoF object grasping. IEEE Robot. Autom. Lett. 8(8), 4585–4592 (2023)

    Article  Google Scholar 

  35. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 770-778 (2016)

  36. Jsang, A.: Subjective logic (2016)

  37. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: ShapeNet: an information-rich model repository. arXiv:1512.03012 (2015)

  38. Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view reconstruction. Adv. Neural Inf. Process. Syst. 32 (2019)

  39. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: ShapeNets: a deep representation for volumetric shapes. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 1912–1920 (2015)

  40. Uy, M.A., Pham, Q.-H., Hua, B.-S., Nguyen, T., Yeung, S.-K.: Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: Paper presented at the IEEE international conference on computer vision, 1588–1597 (2019)

  41. Yan, S., Yang, Z., Li, H., Song, C., Guan, L., Kang, H., Hua, G., Huang, Q.: Implicit autoencoder for point-cloud self-supervised representation learning. In: Paper presented at the IEEE/CVF international conference on computer vision, 14530–14542 (2023)

  42. Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. Adv. Neural Inf. Process. Syst. 32 (2019)

  43. Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In: Paper presented at the conference on computer vision and pattern recognition, 19313–19322 (2022)

  44. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via generative-adversarial modeling. Adv. Neural Inf. Process. Syst. 29 (2016)

  45. Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 206–215 (2018)

  46. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for point clouds. In: Paper presented at the international conference on machine learning, 40–49 (2018)

  47. Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 1009–1018 (2019)

  48. Han, Z., Wang, X., Liu, Y.-S., Zwicker, M.: Multi-angle point cloud-VAE: unsupervised feature learning for 3D point clouds from multiple angles by joint self-reconstruction and half-to-half prediction. In: Paper presented at the IEEE international conference on computer vision, 10441–10450 (2019)

  49. Gao, X., Hu, W., Qi, G.-J.: GraphTER: unsupervised learning of graph transformation equivariant representations via auto-encoding node-wise transformations. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 7163–7172 (2020)

  50. Wen, X., Li, T., Han, Z., Liu, Y.-S.: Point cloud completion by skip-attention network with hierarchical folding. In: Paper presented at the IEEE conference on computer vision and pattern recognition, 1939–1948 (2020)

  51. Poursaeed, O., Jiang, T., Qiao, H., Xu, N., Kim, V.G.: Self-supervised learning of point clouds via orientation estimation. In: Paper presented at the international conference on 3D vision, 1018–1028 (2020)

  52. Zhang, L., Zhu, Z.: Unsupervised feature learning for point cloud understanding by contrasting and clustering using graph convolutional neural networks. In: Paper presented at the international conference on 3D vision, 395–404 (2019)

  53. Du, B., Gao, X., Hu, W., Li, X.: Self-contrastive learning with hard negative sampling for self-supervised point cloud learning. In: Paper presented at the ACM international conference on multimedia, 3133–3142 (2021)

  54. Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., Yuan, L.: Masked autoencoders for point cloud self-supervised learning. In: Paper presented at the European conference on computer vision, 604–621 (2022)

  55. Sharma, C., Kaul, M.: Self-supervised few-shot learning on point clouds. Adv. Neural Inf. Process. Syst. 33, 7212–7221 (2020)

    Google Scholar 

  56. Dong, R., Qi, Z., Zhang, L., Zhang, J., Sun, J., Ge, Z., Yi, L., Ma, K.: Autoencoders as cross-modal teachers: can pretrained 2D image transformers help 3D representation learning?. In: Paper presented at the eleventh international conference on learning representations (2023)

  57. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017)

  58. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 31 (2018)

  59. Yi, L., Kim, V.G., Ceylan, D., Shen, I.-C., Yan, M., Su, H., Lu, C., Huang, Q., Sheffer, A., Guibas, L.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Gr. 35(6) (2016)

  60. Du, B., Gao, X., Hu, W., Li, X.: Self-contrastive learning with hard negative sampling for self-supervised point cloud learning. In: Paper presented at the ACM international conference on multimedia, 3133–3142 (2021)

Download references

Acknowledgements

This work is supported by the Key Research and Development Program of Shaanxi under Grants 2021GY-025 and 2021GXLH-Z-097.

Author information

Authors and Affiliations

Authors

Contributions

H.X. conducted the experiments. H.X. and C.H. wrote the main manuscript text and prepared Figures 1, 2, 3, 4. Z.J. and S.P. prepared Tables 1, 2, 3, 4, 5, 6. All authors reviewed the manuscript

Corresponding author

Correspondence to Jihua Zhu.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, X., Cheng, H., Shi, P. et al. Trusted 3D self-supervised representation learning with cross-modal settings. Machine Vision and Applications 35, 77 (2024). https://doi.org/10.1007/s00138-024-01556-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01556-w

Keywords