Abstract
As a promising scheme of self-supervised learning, contrastive learning has significantly advanced the modeling of image or video in a self-supervised manner, as well as the understanding for 3D point cloud. Nevertheless, normal point cloud contrastive learning methods mainly concentrate on the point-level corresponding matching, ignoring the spatial context in the point cloud 3D space. In this notebook paper, we modify the original point contrastive learning for better 3D modeling, namely Density-Based PointContrast (DBPC), through leveraging the prior knowledge of point cloud density for self-supervised 3D feature optimization. Specifically, we exploit the traditional Density-Based Spatial Clustering of Applications with Noise to cluster the input point cloud to obtain many clusters and each cluster can represent one semantic objective instance. The object-level contrastive loss is employed on the sampled point pairs according to the clustering label to regulate the point-level contrastive learning with richer scene contextual information. Good generalization abilities of the pre-trained model learnt on ScanNet dataset are verified by extensive experiments on the downstream tasks, e.g., point cloud classification, part segmentation and scene semantic segmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representation and generative models for 3D point clouds. In: ICML (2018)
Ahmed, S.M., Meng, C.C.: Density based clustering for 3D object detection in point clouds. In: CVPR (2020)
Armeni, I., et al.: 3D semantic parsing of large-scale indoor space. In: ICCV (2016)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: minkowski convolutional neural networks. In: CVPR (2019)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., NieĆner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)
Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
Engelmann, F., Bokenloh, M., Fathi, A., Leibe, B., NieĆner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: CVPR (2020)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD (1996)
Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast networks for video recognition. In: ICCV (2019)
Feichtenhofer, C., Fan, H., Xiong, B., Girshick, R., He, K.: A large-scale study on unsupervised spatiotemporal representation learning. In: CVPR (2021)
Gadelha, M., Wang, R., Maji, S.: Multiresolution tree networks for 3D point cloud processing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 105ā122. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_7
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolution networks. In: CVPR (2018)
Han, Z., Wang, X., Liu, Y.S., Zwicker, M.: Multi-angle point cloud-VAE: unsupervised feature learning for 3D point clouds from multiple angles by joint self-reconstruction and half-to-half prediction. In: ICCV (2019)
Hassani, K., Haley, M.: Unsupervised multi-task feature learning on point clouds. In: ICCV (2019)
He, K., Chen, X., Xie, S., Li, Y., Dollar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hou, J., Dai, A., NieĆner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: CVPR (2019)
Hou, J., Graham, B., NieĆner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: CVPR (2021)
Jiang, H., Yan, F., Cai, J., Zheng, J., Xiao, J.: End-to-end 3D point cloud instance segmentation without detection. In: CVPR (2020)
Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C., Jia, J.: PointGroup: dual-set point grouping for 3D instance segmentation. In: CVPR (2020)
Gwak, J.Y., Choy, C., Savarese, S.: Generative sparse detection networks for 3D single-shot object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 297ā313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_18
Li, J., Chen, B.M., Lee, G.H.: SO-Net: self-organizing network for point cloud analysis. In: CVPR (2018)
Li, J., Zhou, P., Xiong, C., Hoi, S.: Prototypical contrastive learning of unsupervised representations. In: ICLR (2021)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: ICCV (2017)
Long, F., Qiu, Z., Pan, Y., Yao, T., Luo, J., Mei, T.: Stand-alone inter-frame attention in video models. In: CVPR (2022)
Long, F., Qiu, Z., Pan, Y., Yao, T., Ngo, C.W., Mei, T.: Dynamic temporal filtering in video models. In: Avidan, S., Brostow, G., CissƩ, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13695. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19833-5_28
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: CVPR (2019)
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Learning to localize actions fromĀ moments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 137ā154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_9
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Bi-calibration networks for weakly-supervised video representation learning. arXiv preprint arXiv:2206.10491 (2022)
Long, F., Yao, T., Qiu, Z., Tian, X., Mei, T., Luo, J.: Coarse-to-fine localization of temporal action proposals. IEEE Trans. Multimed. (2019)
Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: CVPR (2017)
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: CVPR (2019)
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. In: NeurIPS (2018)
Pan, Y., et al.: Smart director: an event-driven directing system for live broadcasting. IEEE Trans. Multimed. Comput. Commun. Appl. (2022)
Qi, C.R., Chen, X., Litany, O., Guibas, L.J.: Imvotenet: boosting 3D object detection in point cloud with image votes. In: CVPR (2020)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: ICCV (2017)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211ā252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. In: NeurIPS (2019)
Sun, Y., Wang, Y., Liu, Z., Siegel, J.E., Sarma, S.E.: PointGrow: autoregressively learned point cloud generation with self-attention. In: WACV (2019)
Tchapmi, L., Choy, C., Armeni, I., Gwak, J., Savarese, S.: Segcloud: segmentation of 3D point clouds. In: 3DV (2017)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20ā36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shape modeling. In: CVPR (2015)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)
Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: PointContrast: unsupervised pre-training for 3D point cloud understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 574ā591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_34
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 318ā335. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_19
Yang, J., Ahn, P., Kim, D., Lee, H., Kim, J.: Progressive seed generation auto-encoder for unsupervised point cloud learning. In: ICCV (2021)
Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: CVPR (2018)
Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (2016)
Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: ICCV (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Long, F., Qiu, Z. (2023). Involving Density Prior forĀ 3D Point Cloud Contrastive Learning. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13643. Springer, Cham. https://doi.org/10.1007/978-3-031-37660-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-37660-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37659-7
Online ISBN: 978-3-031-37660-3
eBook Packages: Computer ScienceComputer Science (R0)