Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders

Min, Chen; Xu, Xinli; Zhao, Dawei; Xiao, Liang; Nie, Yiming; Dai, Bin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.09900 (cs)

[Submitted on 20 Jun 2022 (v1), last revised 9 Oct 2023 (this version, v7)]

Title:Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders

Authors:Chen Min, Xinli Xu, Dawei Zhao, Liang Xiao, Yiming Nie, Bin Dai

View PDF

Abstract:Current perception models in autonomous driving heavily rely on large-scale labelled 3D data, which is both costly and time-consuming to annotate. This work proposes a solution to reduce the dependence on labelled 3D training data by leveraging pre-training on large-scale unlabeled outdoor LiDAR point clouds using masked autoencoders (MAE). While existing masked point autoencoding methods mainly focus on small-scale indoor point clouds or pillar-based large-scale outdoor LiDAR data, our approach introduces a new self-supervised masked occupancy pre-training method called Occupancy-MAE, specifically designed for voxel-based large-scale outdoor LiDAR point clouds. Occupancy-MAE takes advantage of the gradually sparse voxel occupancy structure of outdoor LiDAR point clouds and incorporates a range-aware random masking strategy and a pretext task of occupancy prediction. By randomly masking voxels based on their distance to the LiDAR and predicting the masked occupancy structure of the entire 3D surrounding scene, Occupancy-MAE encourages the extraction of high-level semantic information to reconstruct the masked voxel using only a small number of visible voxels. Extensive experiments demonstrate the effectiveness of Occupancy-MAE across several downstream tasks. For 3D object detection, Occupancy-MAE reduces the labelled data required for car detection on the KITTI dataset by half and improves small object detection by approximately 2% in AP on the Waymo dataset. For 3D semantic segmentation, Occupancy-MAE outperforms training from scratch by around 2% in mIoU. For multi-object tracking, Occupancy-MAE enhances training from scratch by approximately 1% in terms of AMOTA and AMOTP. Codes are publicly available at this https URL.

Comments:	Accepted by TIV
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.09900 [cs.CV]
	(or arXiv:2206.09900v7 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.09900

Submission history

From: Chen Min [view email]
[v1] Mon, 20 Jun 2022 17:15:50 UTC (2,834 KB)
[v2] Fri, 24 Jun 2022 06:46:02 UTC (3,068 KB)
[v3] Mon, 27 Jun 2022 09:01:51 UTC (2,834 KB)
[v4] Tue, 16 Aug 2022 14:16:21 UTC (1,431 KB)
[v5] Wed, 23 Nov 2022 06:15:30 UTC (1,309 KB)
[v6] Sat, 29 Apr 2023 00:54:33 UTC (1,540 KB)
[v7] Mon, 9 Oct 2023 12:34:02 UTC (5,018 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators