Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

Cai, Mu; Luo, Chenxu; Lee, Yong Jae; Yang, Xiaodong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.06827 (cs)

[Submitted on 10 Sep 2024]

Title:Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

Authors:Mu Cai, Chenxu Luo, Yong Jae Lee, Xiaodong Yang

View PDF HTML (experimental)

Abstract:3D perception in LiDAR point clouds is crucial for a self-driving vehicle to properly act in 3D environment. However, manually labeling point clouds is hard and costly. There has been a growing interest in self-supervised pre-training of 3D perception models. Following the success of contrastive learning in images, current methods mostly conduct contrastive pre-training on point clouds only. Yet an autonomous driving vehicle is typically supplied with multiple sensors including cameras and LiDAR. In this context, we systematically study single modality, cross-modality, and multi-modality for contrastive learning of point clouds, and show that cross-modality wins over other alternatives. In addition, considering the huge difference between the training sources in 2D images and 3D point clouds, it remains unclear how to design more effective contrastive units for LiDAR. We therefore propose the instance-aware and similarity-balanced contrastive units that are tailored for self-driving point clouds. Extensive experiments reveal that our approach achieves remarkable performance gains over various point cloud models across the downstream perception tasks of LiDAR based 3D object detection and 3D semantic segmentation on the four popular benchmarks including Waymo Open Dataset, nuScenes, SemanticKITTI and ONCE.

Comments:	IROS 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.06827 [cs.CV]
	(or arXiv:2409.06827v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.06827

Submission history

From: Xiaodong Yang [view email]
[v1] Tue, 10 Sep 2024 19:11:45 UTC (3,314 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators