Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13643))

Included in the following conference series:

  • 709 Accesses

Abstract

As a promising scheme of self-supervised learning, contrastive learning has significantly advanced the modeling of image or video in a self-supervised manner, as well as the understanding for 3D point cloud. Nevertheless, normal point cloud contrastive learning methods mainly concentrate on the point-level corresponding matching, ignoring the spatial context in the point cloud 3D space. In this notebook paper, we modify the original point contrastive learning for better 3D modeling, namely Density-Based PointContrast (DBPC), through leveraging the prior knowledge of point cloud density for self-supervised 3D feature optimization. Specifically, we exploit the traditional Density-Based Spatial Clustering of Applications with Noise to cluster the input point cloud to obtain many clusters and each cluster can represent one semantic objective instance. The object-level contrastive loss is employed on the sampled point pairs according to the clustering label to regulate the point-level contrastive learning with richer scene contextual information. Good generalization abilities of the pre-trained model learnt on ScanNet dataset are verified by extensive experiments on the downstream tasks, e.g., point cloud classification, part segmentation and scene semantic segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representation and generative models for 3D point clouds. In: ICML (2018)

    Google ScholarĀ 

  2. Ahmed, S.M., Meng, C.C.: Density based clustering for 3D object detection in point clouds. In: CVPR (2020)

    Google ScholarĀ 

  3. Armeni, I., et al.: 3D semantic parsing of large-scale indoor space. In: ICCV (2016)

    Google ScholarĀ 

  4. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)

    Google ScholarĀ 

  5. Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: minkowski convolutional neural networks. In: CVPR (2019)

    Google ScholarĀ 

  6. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., NieƟner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)

    Google ScholarĀ 

  7. Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)

    Google ScholarĀ 

  8. Engelmann, F., Bokenloh, M., Fathi, A., Leibe, B., NieƟner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: CVPR (2020)

    Google ScholarĀ 

  9. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD (1996)

    Google ScholarĀ 

  10. Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast networks for video recognition. In: ICCV (2019)

    Google ScholarĀ 

  11. Feichtenhofer, C., Fan, H., Xiong, B., Girshick, R., He, K.: A large-scale study on unsupervised spatiotemporal representation learning. In: CVPR (2021)

    Google ScholarĀ 

  12. Gadelha, M., Wang, R., Maji, S.: Multiresolution tree networks for 3D point cloud processing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 105ā€“122. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_7

    ChapterĀ  Google ScholarĀ 

  13. Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolution networks. In: CVPR (2018)

    Google ScholarĀ 

  14. Han, Z., Wang, X., Liu, Y.S., Zwicker, M.: Multi-angle point cloud-VAE: unsupervised feature learning for 3D point clouds from multiple angles by joint self-reconstruction and half-to-half prediction. In: ICCV (2019)

    Google ScholarĀ 

  15. Hassani, K., Haley, M.: Unsupervised multi-task feature learning on point clouds. In: ICCV (2019)

    Google ScholarĀ 

  16. He, K., Chen, X., Xie, S., Li, Y., Dollar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)

    Google ScholarĀ 

  17. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

    Google ScholarĀ 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google ScholarĀ 

  19. Hou, J., Dai, A., NieƟner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: CVPR (2019)

    Google ScholarĀ 

  20. Hou, J., Graham, B., NieƟner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: CVPR (2021)

    Google ScholarĀ 

  21. Jiang, H., Yan, F., Cai, J., Zheng, J., Xiao, J.: End-to-end 3D point cloud instance segmentation without detection. In: CVPR (2020)

    Google ScholarĀ 

  22. Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C., Jia, J.: PointGroup: dual-set point grouping for 3D instance segmentation. In: CVPR (2020)

    Google ScholarĀ 

  23. Gwak, J.Y., Choy, C., Savarese, S.: Generative sparse detection networks for 3D single-shot object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 297ā€“313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_18

    ChapterĀ  Google ScholarĀ 

  24. Li, J., Chen, B.M., Lee, G.H.: SO-Net: self-organizing network for point cloud analysis. In: CVPR (2018)

    Google ScholarĀ 

  25. Li, J., Zhou, P., Xiong, C., Hoi, S.: Prototypical contrastive learning of unsupervised representations. In: ICLR (2021)

    Google ScholarĀ 

  26. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: ICCV (2017)

    Google ScholarĀ 

  27. Long, F., Qiu, Z., Pan, Y., Yao, T., Luo, J., Mei, T.: Stand-alone inter-frame attention in video models. In: CVPR (2022)

    Google ScholarĀ 

  28. Long, F., Qiu, Z., Pan, Y., Yao, T., Ngo, C.W., Mei, T.: Dynamic temporal filtering in video models. In: Avidan, S., Brostow, G., CissƩ, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13695. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19833-5_28

  29. Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: CVPR (2019)

    Google ScholarĀ 

  30. Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Learning to localize actions fromĀ moments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 137ā€“154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_9

    ChapterĀ  Google ScholarĀ 

  31. Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Bi-calibration networks for weakly-supervised video representation learning. arXiv preprint arXiv:2206.10491 (2022)

  32. Long, F., Yao, T., Qiu, Z., Tian, X., Mei, T., Luo, J.: Coarse-to-fine localization of temporal action proposals. IEEE Trans. Multimed. (2019)

    Google ScholarĀ 

  33. Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: CVPR (2017)

    Google ScholarĀ 

  34. Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: CVPR (2019)

    Google ScholarĀ 

  35. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. In: NeurIPS (2018)

    Google ScholarĀ 

  36. Pan, Y., et al.: Smart director: an event-driven directing system for live broadcasting. IEEE Trans. Multimed. Comput. Commun. Appl. (2022)

    Google ScholarĀ 

  37. Qi, C.R., Chen, X., Litany, O., Guibas, L.J.: Imvotenet: boosting 3D object detection in point cloud with image votes. In: CVPR (2020)

    Google ScholarĀ 

  38. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

    Google ScholarĀ 

  39. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)

    Google ScholarĀ 

  40. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: ICCV (2017)

    Google ScholarĀ 

  41. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211ā€“252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  42. Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. In: NeurIPS (2019)

    Google ScholarĀ 

  43. Sun, Y., Wang, Y., Liu, Z., Siegel, J.E., Sarma, S.E.: PointGrow: autoregressively learned point cloud generation with self-attention. In: WACV (2019)

    Google ScholarĀ 

  44. Tchapmi, L., Choy, C., Armeni, I., Gwak, J., Savarese, S.: Segcloud: segmentation of 3D point clouds. In: 3DV (2017)

    Google ScholarĀ 

  45. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20ā€“36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    ChapterĀ  Google ScholarĀ 

  46. Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)

    Google ScholarĀ 

  47. Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shape modeling. In: CVPR (2015)

    Google ScholarĀ 

  48. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)

    Google ScholarĀ 

  49. Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: PointContrast: unsupervised pre-training for 3D point cloud understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 574ā€“591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_34

    ChapterĀ  Google ScholarĀ 

  50. Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 318ā€“335. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_19

    ChapterĀ  Google ScholarĀ 

  51. Yang, J., Ahn, P., Kim, D., Lee, H., Kim, J.: Progressive seed generation auto-encoder for unsupervised point cloud learning. In: ICCV (2021)

    Google ScholarĀ 

  52. Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: CVPR (2018)

    Google ScholarĀ 

  53. Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (2016)

    Google ScholarĀ 

  54. Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: ICCV (2021)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fuchen Long .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Long, F., Qiu, Z. (2023). Involving Density Prior forĀ 3D Point Cloud Contrastive Learning. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13643. Springer, Cham. https://doi.org/10.1007/978-3-031-37660-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37660-3_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37659-7

  • Online ISBN: 978-3-031-37660-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics