Involving Density Prior for 3D Point Cloud Contrastive Learning

Long, Fuchen; Qiu, Zhaofan

doi:10.1007/978-3-031-37660-3_21

Fuchen Long⁹ &
Zhaofan Qiu⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13643))

Included in the following conference series:

International Conference on Pattern Recognition

709 Accesses

Abstract

As a promising scheme of self-supervised learning, contrastive learning has significantly advanced the modeling of image or video in a self-supervised manner, as well as the understanding for 3D point cloud. Nevertheless, normal point cloud contrastive learning methods mainly concentrate on the point-level corresponding matching, ignoring the spatial context in the point cloud 3D space. In this notebook paper, we modify the original point contrastive learning for better 3D modeling, namely Density-Based PointContrast (DBPC), through leveraging the prior knowledge of point cloud density for self-supervised 3D feature optimization. Specifically, we exploit the traditional Density-Based Spatial Clustering of Applications with Noise to cluster the input point cloud to obtain many clusters and each cluster can represent one semantic objective instance. The object-level contrastive loss is employed on the sampled point pairs according to the clustering label to regulate the point-level contrastive learning with richer scene contextual information. Good generalization abilities of the pre-trained model learnt on ScanNet dataset are verified by extensive experiments on the downstream tasks, e.g., point cloud classification, part segmentation and scene semantic segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representation and generative models for 3D point clouds. In: ICML (2018)
Google Scholar
Ahmed, S.M., Meng, C.C.: Density based clustering for 3D object detection in point clouds. In: CVPR (2020)
Google Scholar
Armeni, I., et al.: 3D semantic parsing of large-scale indoor space. In: ICCV (2016)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Google Scholar
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: minkowski convolutional neural networks. In: CVPR (2019)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
Google Scholar
Engelmann, F., Bokenloh, M., Fathi, A., Leibe, B., Nießner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: CVPR (2020)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD (1996)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast networks for video recognition. In: ICCV (2019)
Google Scholar
Feichtenhofer, C., Fan, H., Xiong, B., Girshick, R., He, K.: A large-scale study on unsupervised spatiotemporal representation learning. In: CVPR (2021)
Google Scholar
Gadelha, M., Wang, R., Maji, S.: Multiresolution tree networks for 3D point cloud processing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 105–122. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_7
Chapter Google Scholar
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolution networks. In: CVPR (2018)
Google Scholar
Han, Z., Wang, X., Liu, Y.S., Zwicker, M.: Multi-angle point cloud-VAE: unsupervised feature learning for 3D point clouds from multiple angles by joint self-reconstruction and half-to-half prediction. In: ICCV (2019)
Google Scholar
Hassani, K., Haley, M.: Unsupervised multi-task feature learning on point clouds. In: ICCV (2019)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: CVPR (2019)
Google Scholar
Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: CVPR (2021)
Google Scholar
Jiang, H., Yan, F., Cai, J., Zheng, J., Xiao, J.: End-to-end 3D point cloud instance segmentation without detection. In: CVPR (2020)
Google Scholar
Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C., Jia, J.: PointGroup: dual-set point grouping for 3D instance segmentation. In: CVPR (2020)
Google Scholar
Gwak, J.Y., Choy, C., Savarese, S.: Generative sparse detection networks for 3D single-shot object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 297–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_18
Chapter Google Scholar
Li, J., Chen, B.M., Lee, G.H.: SO-Net: self-organizing network for point cloud analysis. In: CVPR (2018)
Google Scholar
Li, J., Zhou, P., Xiong, C., Hoi, S.: Prototypical contrastive learning of unsupervised representations. In: ICLR (2021)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Long, F., Qiu, Z., Pan, Y., Yao, T., Luo, J., Mei, T.: Stand-alone inter-frame attention in video models. In: CVPR (2022)
Google Scholar
Long, F., Qiu, Z., Pan, Y., Yao, T., Ngo, C.W., Mei, T.: Dynamic temporal filtering in video models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13695. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19833-5_28
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: CVPR (2019)
Google Scholar
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Learning to localize actions from moments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 137–154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_9
Chapter Google Scholar
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Bi-calibration networks for weakly-supervised video representation learning. arXiv preprint arXiv:2206.10491 (2022)
Long, F., Yao, T., Qiu, Z., Tian, X., Mei, T., Luo, J.: Coarse-to-fine localization of temporal action proposals. IEEE Trans. Multimed. (2019)
Google Scholar
Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: CVPR (2017)
Google Scholar
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: CVPR (2019)
Google Scholar
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. In: NeurIPS (2018)
Google Scholar
Pan, Y., et al.: Smart director: an event-driven directing system for live broadcasting. IEEE Trans. Multimed. Comput. Commun. Appl. (2022)
Google Scholar
Qi, C.R., Chen, X., Litany, O., Guibas, L.J.: Imvotenet: boosting 3D object detection in point cloud with image votes. In: CVPR (2020)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: ICCV (2017)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. In: NeurIPS (2019)
Google Scholar
Sun, Y., Wang, Y., Liu, Z., Siegel, J.E., Sarma, S.E.: PointGrow: autoregressively learned point cloud generation with self-attention. In: WACV (2019)
Google Scholar
Tchapmi, L., Choy, C., Armeni, I., Gwak, J., Savarese, S.: Segcloud: segmentation of 3D point clouds. In: 3DV (2017)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)
Google Scholar
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shape modeling. In: CVPR (2015)
Google Scholar
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)
Google Scholar
Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: PointContrast: unsupervised pre-training for 3D point cloud understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_34
Chapter Google Scholar
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 318–335. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_19
Chapter Google Scholar
Yang, J., Ahn, P., Kim, D., Lee, H., Kim, J.: Progressive seed generation auto-encoder for unsupervised point cloud learning. In: ICCV (2021)
Google Scholar
Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: CVPR (2018)
Google Scholar
Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (2016)
Google Scholar
Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: ICCV (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

JD Explore Academy, Beijing, China
Fuchen Long & Zhaofan Qiu

Authors

Fuchen Long
View author publications
You can also search for this author in PubMed Google Scholar
Zhaofan Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuchen Long .

Editor information

Editors and Affiliations

York University, Toronto, ON, Canada
Jean-Jacques Rousseau
Ontario Tech University, Oshawa, ON, Canada
Bill Kapralos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Long, F., Qiu, Z. (2023). Involving Density Prior for 3D Point Cloud Contrastive Learning. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13643. Springer, Cham. https://doi.org/10.1007/978-3-031-37660-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-37660-3_21
Published: 30 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37659-7
Online ISBN: 978-3-031-37660-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Involving Density Prior for 3D Point Cloud Contrastive Learning