DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Li, Yingwei; Yu, Adams Wei; Meng, Tianjian; Caine, Ben; Ngiam, Jiquan; Peng, Daiyi; Shen, Junyang; Wu, Bo; Lu, Yifeng; Zhou, Denny; Le, Quoc V.; Yuille, Alan; Tan, Mingxing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.08195 (cs)

[Submitted on 15 Mar 2022]

Title:DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Authors:Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan

View PDF

Abstract:Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While prevalent multi-modal methods simply decorate raw lidar point clouds with camera features and feed them directly to existing 3D detection models, our study shows that fusing camera features with deep lidar features instead of raw points, can lead to better performance. However, as those features are often augmented and aggregated, a key challenge in fusion is how to effectively align the transformed features from two modalities. In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e.g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion. Based on InverseAug and LearnableAlign, we develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods. For example, DeepFusion improves PointPillars, CenterPoint, and 3D-MAN baselines on Pedestrian detection for 6.7, 8.9, and 6.2 LEVEL_2 APH, respectively. Notably, our models achieve state-of-the-art performance on Waymo Open Dataset, and show strong model robustness against input corruptions and out-of-distribution data. Code will be publicly available at this https URL.

Comments:	CVPR 2022. 1st rank 3D detection method on Waymo Challenge Leaderboard: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.08195 [cs.CV]
	(or arXiv:2203.08195v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.08195

Submission history

From: Yingwei Li [view email]
[v1] Tue, 15 Mar 2022 18:46:06 UTC (1,679 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators