Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3603781.3603857acmotherconferencesArticle/Chapter ViewAbstractPublication PagescniotConference Proceedingsconference-collections
research-article

STFormer3D: Spatio-Temporal Transformer Based 3D Object Detection for Intelligent Driving

Published: 27 July 2023 Publication History

Abstract

This paper proposes a novel solution to the problem of efficiently detecting 3D objects in point clouds. By leveraging Convolutional Neural Networks (CNNs) and Transformer Networks, our method combines the strengths of both networks in feature extraction and long-range contextual information. To improve the detection performance under occlusion conditions, we propose a temporal fusion module to fuse the features of the current frame and the previous frame together. At the same time, we use BiFPN to effectively aggregate features of different scales.
Finally, we conducted experiments on the nuScenes dataset, and compared with the baseline, our algorithm improved by 2.54% on NDS and 2.44% on mAP.

References

[1]
S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” Cornell University - arXiv, 2018.
[2]
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv: Computer Vision and Pattern Recognition, 2017.
[3]
G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, and C. Welling- ton, “Lasernet: An efficient probabilistic 3d object detector for au- tonomous driving,” Computer Vision and Pattern Recognition, 2019.
[4]
B. Li, Z. Tianlei, and X. Tian, “Vehicle detection from 3d lidar using fully convolutional network,” arXiv: Computer Vision and Pattern Recognition, 2016.
[5]
Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” Cornell University - arXiv, 2018.
[6]
A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” arXiv: Learning, 2018.
[7]
X. Chen, H. Ma, J. Wan, B. Li, and X. Tian, “Multi-view 3d object detection network for autonomous driving,” Cornell University - arXiv, 2016.
[8]
Z. Liu, H. Tang, M. A. Amini, X. Yang, H. Mao, O. Daniela, R. Mit, H. Song, and Mit, “Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation,” 2023.
[9]
J. Fang, D. Zhou, X. Song, and L. Zhang, “Mapfusion: A general framework for 3d object detection with hdmaps.” arXiv: Computer Vision and Pattern Recognition, 2021.
[10]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows.” arXiv: Computer Vision and Pattern Recognition, 2021.
[11]
X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen, and B. Guo, “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” Cornell University - arXiv, 2021.
[12]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” arXiv: Computer Vision and Pattern Recognition, 2020.
[13]
P. Gao, M. Zheng, X. Wang, J. Dai, and H. Li, “Fast convergence of detr with spatially modulated co-attention,” International Conference on Computer Vision, 2021.
[14]
Y. Wang, X. Zhang, T. Yang, and J. Sun, “Anchor detr: Query design for transformer-based detector.” Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021.
[15]
Y. Liu, T. Wang, X. Zhang, and J. Sun, “Petr: Position embedding transformation for multi-view 3d object detection,” 2023.
[16]
C. R. Qi, Y. Zhou, M. Najibi, P. Sun, K. Vo, B. Deng, and D. Anguelov, “Offboard 3d object detection from point cloud sequences,” Computer Vision and Pattern Recognition, 2021.
[17]
C. Luo, X. Yang, and A. L. Yuille, “Exploring simple 3d multi-object tracking for autonomous driving,” arXiv: Computer Vision and Pattern Recognition, 2021.
[18]
Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Q. Yu, and J. Dai, “Bevformer: Learning bird's-eye-view representation from multi-camera images via spatiotemporal transformers,” 2023.
[19]
Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, 2018.
[20]
Z. Yang, Y. Zhou, Z. Chen, and J. Ngiam, “3d-man: 3d multi-frame attention network for object detection,” Computer Vision and Pattern Recognition, 2021.
[21]
P. Bhattacharyya, C. Huang, and K. Czarnecki, “Sa-det3d: Self-attention based context-aware 3d object detection,” arXiv: Computer Vision and Pattern Recognition, 2021.
[22]
Z. Zhou, X. Zhao, Y. Wang, P. Wang, and H. Foroosh, “Centerformer: Center-based transformer for 3d object detection,” 2023.
[23]
L. Casia, T. Zhang, Y.-X. Wang, H. Zhao, F. Wang, N. Wang, and Z. Zhang, “Embracing single stride 3d object detector with sparse transformer,” 2023.
[24]
M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” Cornell University - arXiv, 2019.
[25]
T. Yin, X. Zhou, and P. Kra¨henbu¨hl, “Center-based 3d object detection and tracking.” 2020.
[26]
X. Zhu, Y. Ma, T. Wang, Y. Xu, J. Shi, and D. Lin, “Ssn: Shape signature networks for multi-class object detection from point clouds,” Springer International Publishing eBooks, 2020.

Index Terms

  1. STFormer3D: Spatio-Temporal Transformer Based 3D Object Detection for Intelligent Driving

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CNIOT '23: Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things
    May 2023
    1025 pages
    ISBN:9798400700705
    DOI:10.1145/3603781
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D Object detection
    2. LiDAR
    3. Multi-frame fusion
    4. Transformer
    5. autonomous driving
    6. point cloud

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CNIOT'23

    Acceptance Rates

    Overall Acceptance Rate 39 of 82 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 46
      Total Downloads
    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media