research-article

STFormer3D: Spatio-Temporal Transformer Based 3D Object Detection for Intelligent Driving

Authors:

Jun HuAuthors Info & Claims

CNIOT '23: Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

Pages 415 - 420

https://doi.org/10.1145/3603781.3603857

Published: 27 July 2023 Publication History

Abstract

This paper proposes a novel solution to the problem of efficiently detecting 3D objects in point clouds. By leveraging Convolutional Neural Networks (CNNs) and Transformer Networks, our method combines the strengths of both networks in feature extraction and long-range contextual information. To improve the detection performance under occlusion conditions, we propose a temporal fusion module to fuse the features of the current frame and the previous frame together. At the same time, we use BiFPN to effectively aggregate features of different scales.

Finally, we conducted experiments on the nuScenes dataset, and compared with the baseline, our algorithm improved by 2.54% on NDS and 2.44% on mAP.

References

[1]

S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” Cornell University - arXiv, 2018.

[2]

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv: Computer Vision and Pattern Recognition, 2017.

[3]

G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, and C. Welling- ton, “Lasernet: An efficient probabilistic 3d object detector for au- tonomous driving,” Computer Vision and Pattern Recognition, 2019.

[4]

B. Li, Z. Tianlei, and X. Tian, “Vehicle detection from 3d lidar using fully convolutional network,” arXiv: Computer Vision and Pattern Recognition, 2016.

[5]

Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” Cornell University - arXiv, 2018.

[6]

A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” arXiv: Learning, 2018.

[7]

X. Chen, H. Ma, J. Wan, B. Li, and X. Tian, “Multi-view 3d object detection network for autonomous driving,” Cornell University - arXiv, 2016.

[8]

Z. Liu, H. Tang, M. A. Amini, X. Yang, H. Mao, O. Daniela, R. Mit, H. Song, and Mit, “Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation,” 2023.

[9]

J. Fang, D. Zhou, X. Song, and L. Zhang, “Mapfusion: A general framework for 3d object detection with hdmaps.” arXiv: Computer Vision and Pattern Recognition, 2021.

[10]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows.” arXiv: Computer Vision and Pattern Recognition, 2021.

[11]

X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen, and B. Guo, “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” Cornell University - arXiv, 2021.

[12]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” arXiv: Computer Vision and Pattern Recognition, 2020.

Digital Library

[13]

P. Gao, M. Zheng, X. Wang, J. Dai, and H. Li, “Fast convergence of detr with spatially modulated co-attention,” International Conference on Computer Vision, 2021.

[14]

Y. Wang, X. Zhang, T. Yang, and J. Sun, “Anchor detr: Query design for transformer-based detector.” Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021.

[15]

Y. Liu, T. Wang, X. Zhang, and J. Sun, “Petr: Position embedding transformation for multi-view 3d object detection,” 2023.

[16]

C. R. Qi, Y. Zhou, M. Najibi, P. Sun, K. Vo, B. Deng, and D. Anguelov, “Offboard 3d object detection from point cloud sequences,” Computer Vision and Pattern Recognition, 2021.

[17]

C. Luo, X. Yang, and A. L. Yuille, “Exploring simple 3d multi-object tracking for autonomous driving,” arXiv: Computer Vision and Pattern Recognition, 2021.

[18]

Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Q. Yu, and J. Dai, “Bevformer: Learning bird's-eye-view representation from multi-camera images via spatiotemporal transformers,” 2023.

[19]

Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, 2018.

[20]

Z. Yang, Y. Zhou, Z. Chen, and J. Ngiam, “3d-man: 3d multi-frame attention network for object detection,” Computer Vision and Pattern Recognition, 2021.

[21]

P. Bhattacharyya, C. Huang, and K. Czarnecki, “Sa-det3d: Self-attention based context-aware 3d object detection,” arXiv: Computer Vision and Pattern Recognition, 2021.

[22]

Z. Zhou, X. Zhao, Y. Wang, P. Wang, and H. Foroosh, “Centerformer: Center-based transformer for 3d object detection,” 2023.

[23]

L. Casia, T. Zhang, Y.-X. Wang, H. Zhao, F. Wang, N. Wang, and Z. Zhang, “Embracing single stride 3d object detector with sparse transformer,” 2023.

[24]

M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” Cornell University - arXiv, 2019.

[25]

T. Yin, X. Zhou, and P. Kra¨henbu¨hl, “Center-based 3d object detection and tracking.” 2020.

[26]

X. Zhu, Y. Ma, T. Wang, Y. Xu, J. Shi, and D. Lin, “Ssn: Shape signature networks for multi-class object detection from point clouds,” Springer International Publishing eBooks, 2020.

Index Terms

STFormer3D: Spatio-Temporal Transformer Based 3D Object Detection for Intelligent Driving
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection
Computer Vision – ACCV 2022
Abstract
Direct detection of 3D objects from point clouds is a challenging task due to sparsity and irregularity of point clouds. To capture point features from the raw point clouds for 3D object detection, most previous researches utilize PointNet and its ...
Deep multi-scale and multi-modal fusion for 3D object detection
Highlights
- We propose a multi-scale feature fusion method from different resolution feature maps for 3D object detection.
Abstract
The perception of 3D objects in the scene is the basis of autonomous driving. Most autonomous driving cars are equipped with cameras and Lidar to obtain 3D spatial information. RGB images taken from the camera and point cloud produced ...
3D object detection algorithm based on multi-sensor segmental fusion of frustum association for autonomous driving
Abstract
The rotation characteristics of point clouds are challenging to capture in current multimodal fusion methods for 3D object detection. A single fusion method cannot well balance the accuracy and speed in object detection. Therefore, a multi-sensor ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CNIOT '23: Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

May 2023

1025 pages

ISBN:9798400700705

DOI:10.1145/3603781

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CNIOT'23

CNIOT'23: 2023 4th International Conference on Computing, Networks and Internet of Things

May 26 - 28, 2023

Xiamen, China

Acceptance Rates

Overall Acceptance Rate 39 of 82 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
46
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten