PV-RCNN++: semantical point-voxel feature interaction for 3D object detection

Wu, Peng; Gu, Lipeng; Yan, Xuefeng; Xie, Haoran; Wang, Fu Lee; Cheng, Gary; Wei, Mingqiang

doi:10.1007/s00371-022-02672-2

PV-RCNN++: semantical point-voxel feature interaction for 3D object detection

Original article
Published: 29 September 2022

Volume 39, pages 2425–2440, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Peng Wu¹,
Lipeng Gu¹,
Xuefeng Yan¹,
Haoran Xie²,
Fu Lee Wang³,
Gary Cheng⁴ &
…
Mingqiang Wei⁵

993 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Large imbalance often exists between the foreground points (i.e., objects) and the background points in outdoor LiDAR point clouds. It hinders cutting-edge detectors from focusing on informative areas to produce accurate 3D object detection results. This paper proposes a novel object detection network by semantical point-voxel feature interaction, dubbed PV-RCNN++. Unlike most of existing methods, PV-RCNN++ explores the semantic information to enhance the quality of object detection. First, a semantic segmentation module is proposed to retain more discriminative foreground keypoints. Such a module will guide our PV-RCNN++ to integrate more object-related point-wise and voxel-wise features in the pivotal areas. Then, to make points and voxels interact efficiently, we utilize voxel query based on Manhattan distance to quickly sample voxel-wise features around keypoints. Such the voxel query will reduce the time complexity from O(N) to O(K), compared to the ball query. Further, to avoid being stuck in learning only local features, an attention-based residual PointNet module is designed to expand the receptive field to adaptively aggregate the neighboring voxel-wise features into keypoints. Extensive experiments on the KITTI dataset show that PV-RCNN++ achieves 81.60$\%$, 40.18$\%$, 68.21$\%$ 3D mAP on Car, Pedestrian, and Cyclist, achieving comparable or even better performance to the state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

Article Open access 24 November 2022

IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
Article Google Scholar
Zheng, C., Shi, D., Yan, X., Liang, D., Wei, M., Yang, X., Guo, Y., Xie, H.: Glassnet: label decoupling-based three-stream neural network for robust image glass detection. Comput. Graph. Forum 41(1), 377–388 (2022)
Article Google Scholar
Wei, Z., Liang, D., Zhang, D., Zhang, L., Geng, Q., Wei, M., Zhou, H.: Learning calibrated-guidance for object detection in aerial images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 15, 2721–2733 (2022)
Article Google Scholar
Luo, C., Yang, X., Yuille, A.: Exploring simple 3d multi-object tracking for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10488–10497 (2021)
Ji, C., Liu, G., Zhao, D.: Stereo 3d object detection via instance depth prior guidance and adaptive spatial feature aggregation. Vis. Comput. 1–12 (2022)
Wang, Z., Xie, Q., Wei, M., Long, K., Wang, J.: Multi-feature fusion votenet for 3d object detection. ACM Trans. Multim. Comput. Commun. Appl. 18(1), 6–1617 (2022)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The Kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Xia, R., Chen, Y., Ren, B.: Improved anti-occlusion object tracking algorithm using unscented Rauch–Tung–Striebel smoother and kernel correlation filter. J. King Saud Univers. Comput. Inf. Sci. 6008–6018 (2022)
Zhang, J., Feng, W., Yuan, T., Wang, J., Sangaiah, A.K.: Scstcf: spatial-channel selection and temporal regularized correlation filters for visual tracking. Appl. Soft Comput. 118, 108485 (2022)
Article Google Scholar
Zhang, J., Sun, J., Wang, J., Li, Z., Chen, X.: An object tracking framework with recapture based on correlation filters and Siamese networks. Comput. Electr. Eng. 98, 107730 (2022)
Article Google Scholar
Yan, C., Salman, E.: Mono3d: open source cell library for monolithic 3-d integrated circuits. IEEE Trans. Circuits Syst. I Regul. Pap. 65(3), 1075–1085 (2017)
Article Google Scholar
Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical depth distribution network for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8555–8564 (2021)
Chen, Y., Tai, L., Sun, K., Li, M.: Monopair: monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12093–12102 (2020)
Li, B., Ouyang, W., Sheng, L., Zeng, X., Wang, X.: Gs3d: an efficient 3d object detection framework for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1019–1028 (2019)
Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1259–1272 (2017)
Article Google Scholar
Li, P., Chen, X., Shen, S.: Stereo r-cnn based 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7644–7652 (2019)
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Ye, M., Xu, S., Cao, T.: Hvnet: hybrid voxel network for lidar based 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1631–1640 (2020)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Shi, S., Wang, Z., Wang, X., Li, H.: Part-a$^2$ net: 3d part-aware and aggregation neural network for object detection from point cloud. arXiv preprint arXiv:1907.03670 vol. 2, no. (3) (2019)
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Shi, W., Rajkumar, R.: Point-gnn: graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1711–1719 (2020)
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3dssd: point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020)
Xie, Q., Lai, Y.-K., Wu, J., Wang, Z., Lu, D., Wei, M., Wang, J.: Venet: voting enhancement network for 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3712–3721 (2021)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++ deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5105–5114 (2017)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pv-rcnn: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)
Chen, C., Chen, Z., Zhang, J., Tao, D.: Sasa: Semantics-augmented set abstraction for point-based 3d object detection. arXiv e-prints, 2201 (2022)
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel r-cnn: towards high performance voxel-based 3d object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1201–1209 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv e-prints, 1706 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: a simple residual mlp framework. arXiv preprint arXiv:2202.07123 (2022)
Zarzar, J., Giancola, S., Ghanem, B.: Pointrgcn: graph convolution networks for 3d vehicles detection refinement. arXiv preprint arXiv:1911.12236 (2019)
Zhang, Y., Huang, D., Wang, Y.: Pc-rgnn: Point cloud completion and graph neural network for 3d object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3430–3437 (2021)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Article Google Scholar
Yuan, W., Khot, T., Held, D., Mertz, C., Hebert, M.: Pcn: point completion network. In: 2018 International Conference on 3D Vision (3DV), pp. 728–737. IEEE (2018)
Zheng, W., Tang, W., Chen, S., Jiang, L., Fu, C.-W.: Cia-ssd: confident iou-aware single-stage object detector from point cloud. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3555–3562 (2021)
Zheng, W., Tang, W., Jiang, L., Fu, C.-W.: Se-ssd: self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14494–14503 (2021)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, vol. 2, no. (7) (2015)
Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1951–1960 (2019)
He, C., Zeng, H., Huang, J., Hua, X.-S., Zhang, L.: Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11873–11882 (2020)
Jiang, T., Song, N., Liu, H., Yin, R., Gong, Y., Yao, J.: Vic-net: voxelization information compensation network for point cloud 3d object detection. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13408–13414 (2021). IEEE
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8 (2018). IEEE
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Huang, T., Liu, Z., Chen, X., Bai, X.: Epnet: Enhancing point features with image semantics for 3d object detection. In: European Conference on Computer Vision, pp. 35–52 (2020). Springer
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: Sequential fusion for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4604–4612 (2020)
Yoo, J.H., Kim, Y., Kim, J., Choi, J.W.: 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: European Conference on Computer Vision, pp. 720–736 (2020). Springer
Pang, S., Morris, D., Radha, H.: Fast-clocs: fast camera-lidar object candidates fusion for 3d object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 187–196 (2022)
Zhang, Y., Chen, J., Huang, D.: Cat-det: contrastively augmented transformer for multi-modal 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 908–917 (2022)
Wang, Z., Xie, Q., Wei, M., Long, K., Wang, J.: Multi-feature fusion votenet for 3d object detection. ACM Trans. Multimed. Comput. Commun. Appl. 18(1), 1–17 (2022)
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Shu, X., Yang, J., Yan, R., Song, Y.: Expansion-squeeze-excitation fusion network for elderly activity recognition. IEEE Trans. Circuits Syst. Video Technol. 1–1 (2022)
Shu, X., Zhang, L., Qi, G.-J., Liu, W., Tang, J.: Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3300–3315 (2021)
Article Google Scholar
Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph lstm for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 636–647 (2022)
Article Google Scholar
Li, P., Chen, Y.: Research into an image inpainting algorithm via multilevel attention progression mechanism. Math. Probl. Eng. 2022, 1–12 (2022)
Article Google Scholar
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Mu, T.-J., Martin, R.R., Hu, S.-M.: Pct: point cloud transformer. Comput. Vis. Media 7(2), 187–199 (2021)
Article Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Noh, J., Lee, S., Ham, B.: Hvpr: hybrid voxel-point representation for single-stage 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14605–14614 (2021)
He, Q., Wang, Z., Zeng, H., Zeng, Y., Liu, Y.: Svga-net: sparse voxel-graph attention network for 3d object detection from point clouds. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 870–878 (2022)
Zhang, Y., Hu, Q., Xu, G., Ma, Y., Wan, J., Guo, Y.: Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18953–18962 (2022)
Team, O.D.: OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds. https://github.com/open-mmlab/OpenPCDet (2020)

Download references

Acknowledgements

This work was supported by the 14th Five-Year Planning Equipment Pre-Research Program (No. JZX7Y20220301001801), by the National Natural Science Foundation of China (No. 62172218), by the Free Exploration of Basic Research Project, Local Science and Technology Development Fund Guided by the Central Government of China (No. 2021Szvup060), and by the General Program of Natural Science Foundation of Guangdong Province (No. 2022A1515010170).

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Peng Wu, Lipeng Gu & Xuefeng Yan
Department of Computing and Decision Sciences, Lingnan University, Lingnan, 999077, Hong Kong, China
Haoran Xie
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, 999077, Hong Kong, China
Fu Lee Wang
Department of Mathematics and Information Technology, The Education University of Hong Kong, Ting Kok, 999077, Hong Kong, China
Gary Cheng
Shenzhen Research Institute, Nanjing University of Aeronautics and Astronautics, Shenzhen, 518063, China
Mingqiang Wei

Authors

Peng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lipeng Gu
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Xie
View author publications
You can also search for this author in PubMed Google Scholar
Fu Lee Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gary Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Mingqiang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuefeng Yan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, P., Gu, L., Yan, X. et al. PV-RCNN++: semantical point-voxel feature interaction for 3D object detection. Vis Comput 39, 2425–2440 (2023). https://doi.org/10.1007/s00371-022-02672-2

Download citation

Accepted: 06 September 2022
Published: 29 September 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00371-022-02672-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PV-RCNN++: semantical point-voxel feature interaction for 3D object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

PV-RCNN++: semantical point-voxel feature interaction for 3D object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud

Explore related subjects

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation