SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

KITTI-360 Demo | [nuScenes Demo] | Project | Talk | Slides | Poster

Abhinav Kumar¹ · Yuliang Guo² · Xinyu Huang² · Liu Ren² · Xiaoming Liu¹
¹Michigan State University, ²Bosch Research North America, Bosch Center for AI

in CVPR 2024

Monocular 3D detectors achieve remarkable performance on cars and smaller objects. However, their performance drops on larger objects, leading to fatal accidents. Some attribute the failures to training data scarcity or the receptive field requirements of large objects. In this paper, we highlight this understudied problem of generalization to large objects. We find that modern frontal detectors struggle to generalize to large objects even on nearly balanced datasets. We argue that the cause of failure is the sensitivity of depth regression losses to noise of larger objects. To bridge this gap, we comprehensively investigate regression and dice losses, examining their robustness under varying error levels and object sizes. We mathematically prove that the dice loss leads to superior noise-robustness and model convergence for large objects compared to regression losses for a simplified case. Leveraging our theoretical insights, we propose SeaBird (Segmentation in Bird's View) as the first step towards generalizing to large objects. SeaBird effectively integrates BEV segmentation on foreground objects for 3D detection, with the segmentation head trained with the dice loss. SeaBird achieves SoTA results on the KITTI-360 leaderboard and improves existing detectors on the nuScenes leaderboard, particularly for large objects.

Citation

If you find our work useful in your research, please consider starring the repo and citing:

@inproceedings{kumar2024seabird,
   title={{SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular $3$D Detection of Large Objects}},
   author={Kumar, Abhinav and Guo, Yuliang and Huang, Xinyu and Ren, Liu and Liu, Xiaoming},
   booktitle={CVPR},
   year={2024}
}

Single Camera (KITTI-360) Models

See PanopticBEV

Model Zoo

We provide logs/models/predictions for the main experiments on KITTI-360 Val /KITTI-360 Test data splits available to download here.

Data_Splits	Method	Config (Run)	Weight /Pred	Metrics	Lrg (50)	Car (50)	Mean (50)	Lrg (25)	Car (25)	Mean (25)	Lrg Seg	Car Seg	Mean Seg
KITTI-360 Val	Stage 1	seabird_val_stage1	gdrive	IoU	-	-	-	-	-	-	23.83	48.54	36.18
KITTI-360 Val	PBEV+SeaBird	seabird_val	gdrive	AP	13.22	42.46	27.84	37.15	52.53	44.84	24.30	48.04	36.17
KITTI-360 Test	PBEV+SeaBird	seabird_test	gdrive	AP	-	-	4.64	-	-	37.12	-	-	-

Multi-Camera (nuScenes) Models

See HoP

Model Zoo

nuScenes Val Results

Model	Resolution	Backbone	Pretrain	APLrg	mAP	NDS	Ckpt/Log/Pred
HoP_BEVDet4D_256	256x704	ResNet50	ImageNet-1K	0.274	0.399	0.509	ckpt / log
HoP+SeaBird_256 Stage1	256x704	ResNet50	ImageNet-1K	-	-	-	gdrive
HoP+SeaBird_256	256x704	ResNet50	ImageNet-1K	0.282	0.411	0.515	gdrive
HoP+SeaBird_512 Stage1	512x1408	ResNet101	ImageNet-1K	-	-	-	gdrive
HoP+SeaBird_512	512x1408	ResNet101	ImageNet-1K	0.329	0.462	0.547	gdrive
HoP+SeaBird_640 Stage1	640x1600	V2-99	DDAD15M	-	-	-	gdrive
HoP+SeaBird_640	640x1600	V2-99	DDAD15M	0.403	0.527	0.602	gdrive

nuScenes Test Results

Model	Resolution	Backbone	Pretrain	APLrg	mAP	NDS	Ckpt/Log/Pred
HoP+SeaBird_512 Test	512x1408	ResNet101	ImageNet-1K	0.366	0.486	0.570	gdrive
HoP+SeaBird_640 Val	640x1600	V2-99	DDAD15M	0.384	0.511	0.597	gdrive

Acknowledgements

We thank the authors of the following awesome codebases:

Please also consider citing them.

Contributions

We welcome contributions to the SeaBird repo. Feel free to raise a pull request.

↳ Stargazers

↳ Forkers

License

SeaBird code is under the MIT license.

Contact

For questions, feel free to post here or drop an email to this address- abhinav3663@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
HoP		HoP
PanopticBEV		PanopticBEV
LICENSE		LICENSE
README.md		README.md
Seabird_teasor.gif		Seabird_teasor.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

KITTI-360 Demo | [nuScenes Demo] | Project | Talk | Slides | Poster

Citation

Single Camera (KITTI-360) Models

Model Zoo

Multi-Camera (nuScenes) Models

Model Zoo

Acknowledgements

Contributions

↳ Stargazers

↳ Forkers

License

Contact

About

Releases

Packages

Contributors 2

Languages

License

abhi1kumar/SeaBird

Folders and files

Latest commit

History

Repository files navigation

SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

KITTI-360 Demo | [nuScenes Demo] | Project | Talk | Slides | Poster

Citation

Single Camera (KITTI-360) Models

Model Zoo

Multi-Camera (nuScenes) Models

Model Zoo

Acknowledgements

Contributions

↳ Stargazers

↳ Forkers

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages