S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Li, Jinlong; Xu, Runsheng; Liu, Xinyu; Li, Baolu; Zou, Qin; Ma, Jiaqi; Yu, Hongkai

Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.07935 (cs)

[Submitted on 16 Jul 2023 (v1), last revised 20 Feb 2024 (this version, v4)]

Title:S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Authors:Jinlong Li, Runsheng Xu, Xinyu Liu, Baolu Li, Qin Zou, Jiaqi Ma, Hongkai Yu

View PDF HTML (experimental)

Abstract:Due to the lack of enough real multi-agent data and time-consuming of labeling, existing multi-agent cooperative perception algorithms usually select the simulated sensor data for training and validating. However, the perception performance is degraded when these simulation-trained models are deployed to the real world, due to the significant domain gap between the simulated and real data. In this paper, we propose the first Simulation-to-Reality transfer learning framework for multi-agent cooperative perception using a novel Vision Transformer, named as S2R-ViT, which considers both the Deployment Gap and Feature Gap between simulated and real data. We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Deployment Gap and an agent-based feature adaptation module with inter-agent and ego-agent discriminators to reduce the Feature Gap. Our intensive experiments on the public multi-agent cooperative perception datasets OPV2V and V2V4Real demonstrate that the proposed S2R-ViT can effectively bridge the gap from simulation to reality and outperform other methods significantly for point cloud-based 3D object detection.

Comments:	submit the latest one, accepted by the 2024 IEEE International Conference on Robotics and Automation (ICRA)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.07935 [cs.CV]
	(or arXiv:2307.07935v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.07935

Submission history

From: Jinlong Li [view email]
[v1] Sun, 16 Jul 2023 03:54:10 UTC (29,030 KB)
[v2] Tue, 18 Jul 2023 22:33:55 UTC (29,343 KB)
[v3] Tue, 26 Sep 2023 18:01:44 UTC (35,476 KB)
[v4] Tue, 20 Feb 2024 20:50:55 UTC (34,239 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators