Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72940-9_19guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

RoScenes: A Large-Scale Multi-view 3D Dataset for Roadside Perception

Published: 17 November 2024 Publication History

Abstract

We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird’s Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within 64,000 m2. To relieve the expensive costorgnames of roadside 3D labeling, we present a novel BEV-orgname-3D joint annotation pipeline to efficiently collect such a large volume of data. After that, orgname comprehensive study for current BEV methods on RoScenes in terms of effectiveness and efficiency. Tested methods suffer from the vast perception area and variation of sensor layout across scenes, resulting in performance levels falling below expectations. To this end, we propose RoBEV that incorporates feature-guided position embedding for effective 2D-3D feature assignment. With its help, our method outperforms state-of-the-art by a large margin without extra computational overhead on validation set. Our dataset and devkit are at https://roscenes.github.io.

References

[1]
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR, pp. 11618–11628 (2020)
[2]
Cao, J., Pang, J., Weng, X., Khirodkar, R., Kitani, K.: Observation-centric SORT: rethinking SORT for robust multi-object tracking. In: CVPR, pp. 9686–9696 (2023)
[3]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, and Zagoruyko S Vedaldi A, Bischof H, Brox T, and Frahm J-M End-to-end object detection with transformers Computer Vision – ECCV 2020 2020 Cham Springer 213-229
[4]
Chang, M., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: CVPR, pp. 8748–8757 (2019)
[5]
Creß, C., et al.: A9-dataset: multi-sensor infrastructure-based dataset for mobility research. In: IEEE Intelligent Vehicles Symposium, pp. 965–970 (2022)
[6]
Dai, J., et al.: Deformable convolutional networks. In: ICCV, pp. 764–773 (2017)
[7]
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
[8]
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
[9]
Ettinger, S., et al.: Large scale interactive motion forecasting for autonomous driving: the Waymo open motion dataset. In: ICCV, pp. 9690–9699 (2021)
[10]
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
[11]
Geyer, J., et al.: A2D2: Audi autonomous driving dataset. arXiv preprint 2004.06320 (2020)
[12]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
[13]
Henkel, P., Mittmann, U., Iafrancesco, M.: Real-time kinematic positioning with GPS and GLONASS. In: European Signal Processing Conference (2016)
[14]
Huang, J., Huang, G.: BEVDet4D: exploit temporal cues in multi-camera 3D object detection. arXiv preprint 2203.17054 (2022)
[15]
Huang, J., Huang, G., Zhu, Z., Du, D.: BEVDet: high-performance multi-camera 3D object detection in bird-eye-view. arXiv preprint 2112.11790 (2021)
[16]
Huang X, Wang P, Cheng X, Zhou D, Geng Q, and Yang R The apolloscape open dataset for autonomous driving and its application IEEE Trans. Pattern Anal. Mach. Intell. 2020 42 10 2702-2719
[17]
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
[18]
Li, Y., et al.: BEVDepth: acquisition of reliable depth for multi-view 3D object detection. In: AAAI, pp. 1477–1485 (2023)
[19]
Li Z et al. Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, et al. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers Computer Vision - ECCV 2022 2022 Cham Springer 1-18
[20]
Liu, H., Teng, Y., Lu, T., Wang, H., Wang, L.: SparseBEV: high-performance sparse 3D object detection from multi-camera videos. In: ICCV (2023)
[21]
Liu Y, Wang T, Zhang X, and Sun J Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T PETR: position embedding transformation for multi-view 3d object detection Computer Vision - ECCV 2022 2022 Cham Springer 531-548
[22]
Liu, Y., et al.: PETRv2: a unified framework for 3D perception from multi-camera images. In: ICCV (2023)
[23]
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV, pp. 9992–10002 (2021)
[24]
Lyu, C., et al.: RTMDet: an empirical study of designing real-time object detectors. arXiv preprint 2212.07784 (2022)
[25]
Mao, J., et al.: One million scenes for autonomous driving: ONCE dataset. In: NeurIPS (2021)
[26]
Park, J., et al.: Time will tell: new outlooks and A baseline for temporal multi-view 3D object detection. In: ICLR (2023)
[27]
Patil, A., Malla, S., Gang, H., Chen, Y.: The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes. In: ICRA, pp. 9552–9557 (2019)
[28]
Pham, Q., et al.: A*3d dataset: towards autonomous driving in challenging environments. In: ICRA, pp. 2267–2273 (2020)
[29]
Philion J and Fidler S Vedaldi A, Bischof H, Brox T, and Frahm J-M Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D Computer Vision – ECCV 2020 2020 Cham Springer 194-210
[30]
Ravi, N., et al.: Accelerating 3D deep learning with pytorch3d. arXiv preprint 2007.08501 (2020)
[31]
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
[32]
Wang, S., Liu, Y., Wang, T., Li, Y., Zhang, X.: Exploring object-centric temporal modeling for efficient multi-view 3D object detection. In: ICCV, pp. 3621–3631 (2023)
[33]
Wang, W., et al.: Internimage: exploring large-scale vision foundation models with deformable convolutions. In: CVPR, pp. 14408–14419 (2023)
[34]
Wang, Y., Guizilini, V., Zhang, T., Wang, Y., Zhao, H., Solomon, J.: DETR3D: 3D object detection from multi-view images via 3D-to-2D queries. In: CoRL, pp. 180–191 (2021)
[35]
Xia, G., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: CVPR, pp. 3974–3983 (2018)
[36]
Yang, C., et al.: BEVFormer v2: adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. In: CVPR, pp. 17830–17839 (2023)
[37]
Yang, L., et al.: BEVHeight: a robust framework for vision-based roadside 3D object detection. In: CVPR, pp. 21611–21620 (2023)
[38]
Ye, X., et al.: Rope3D: the roadside perception dataset for autonomous driving and monocular 3D object detection task. In: CVPR, pp. 21309–21318 (2022)
[39]
Yu, H., et al.: DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection. In: CVPR, pp. 21329–21338 (2022)
[40]
Yu, H., et al.: V2x-Seq: a large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In: CVPR, pp. 5486–5495 (2023)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XLI
Sep 2024
585 pages
ISBN:978-3-031-72939-3
DOI:10.1007/978-3-031-72940-9
  • Editors:
  • Aleš Leonardis,
  • Elisa Ricci,
  • Stefan Roth,
  • Olga Russakovsky,
  • Torsten Sattler,
  • Gül Varol

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 November 2024

Author Tags

  1. BEV perception
  2. 3D detection
  3. Autonomous driving

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media