research-article

Sniffer Faster R-CNN ++: An Efficient Camera-LiDAR Object Detector with Proposal Refinement on Fused Candidates

Authors:

Dominic Carrillo,

Song FuAuthors Info & Claims

Journal on Autonomous Transportation Systems, Volume 1, Issue 2

Article No.: 6, Pages 1 - 18

https://doi.org/10.1145/3631138

Published: 08 April 2024 Publication History

Abstract

In this article, we present Sniffer Faster R-CNN++, an efficient camera-LiDAR late fusion network for low complexity and accurate object detection in autonomous driving scenarios. The proposed detection network architecture operates on output candidates of any three-dimensional (3D) detector and proposals from regional proposal network of any 2D detector to generate final prediction results. In comparison to the single modality object detection approaches, fusion-based methods in many instances suffer from dissimilar data integration difficulties. On the one hand, fusion-based network models are complicated in nature and, on the other hand, they require large computational overhead and resources, processing pipelines for training and inference specially, the early fusion and deep fusion approaches. As such, we devise a late fusion network that in-cooperates pre-trained, single-modality detectors without change, performing association only at the detection level. In addition to this, lidar-based method fail to detect distant object due to its sparse nature so we devise proposal refinement algorithm to jointly optimize detection candidates and assist detection for distant objects. Extensive experiments on both the 3D and 2D detection benchmark of challenging KITTI dataset illustrate that our proposed network architecture significantly improves the detection accuracy, accelerating the detection speed.

References

[1]

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. Nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11621–11631.

[2]

Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6154–6162.

[3]

Can Chen, Luca Zanotti Fragonara, and Antonios Tsourdos. 2021. RoIFusion: 3D object detection from LiDAR and vision. IEEE Access 9 (2021), 51710–51721. DOI:

[4]

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1907–1915.

[5]

Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. 2021. Voxel r-cnn: Towards high performance voxel-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 1201–1209.

[6]

Sudip Dhakal, Qi Chen, Deyuan Qu, Dominic Carillo, Qing Yang, and Song Fu. 2023. Sniffer faster R-CNN: A joint camera-LiDAR object detection framework with proposal refinement. In Proceedings of the IEEE International Conference on Mobility, Operations, Services and Technologies (MOST’23). 1–10.

[7]

Sudip Dhakal, Deyuan Qu, Dominic Carrillo, Qing Yang, and Song Fu. 2021. OASD: An open approach to self-driving vehicle. In Proceedings of the 4th International Conference on Connected and Autonomous Driving (MetroCAD’21). 54–61.

[8]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354–3361.

[9]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.

[10]

Tengteng Huang, Zhe Liu, Xiwu Chen, and Xiang Bai. 2020. Epnet: Enhancing point features with image semantics for 3d object detection. In Proceedings of the 16th European Conference on Computer Vision (ECCV’20). Springer, 35–52.

Digital Library

[11]

Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander. 2018. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18). IEEE, 1–8.

Digital Library

[12]

Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12697–12705.

[13]

Ming Liang, Bin Yang, Shenlong Wang, and Raquel Urtasun. 2018. Deep continuous fusion for multi-sensor 3d object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18). 641–656.

Digital Library

[14]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.

[15]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). Springer, 21–37.

[16]

Anh Nguyen and Bac Le. 2013. 3D point cloud segmentation: A survey. In Proceedings of the 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM’13). IEEE, 225–230.

[17]

Su Pang, Daniel Morris, and Hayder Radha. 2020. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’20). IEEE, 10386–10393.

Digital Library

[18]

A. J. Piergiovanni, Vincent Casser, Michael S. Ryoo, and Anelia Angelova. 2021. 4D-Net for learned multi-modal alignment. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV’21). 15415–15425. DOI:

[19]

Charles R. Qi, Xinlei Chen, Or Litany, and Leonidas J. Guibas. 2020. Imvotenet: Boosting 3d object detection in point clouds with image votes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4404–4413.

[20]

Charles R. Qi, Or Litany, Kaiming He, and Leonidas J. Guibas. 2019. Deep hough voting for 3d object detection in point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9277–9286.

[21]

Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 918–927.

[22]

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.

[23]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.

[24]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015).

[25]

Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42, 3 (2017), 1–21.

Digital Library

[26]

Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10529–10538.

[27]

Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–779.

[28]

Martin Simon, Karl Amende, Andrea Kraus, Jens Honer, Timo Samann, Hauke Kaulbersch, Stefan Milz, and Horst Michael Gross. 2019. Complexer-yolo: Real-time 3d object detection and tracking on semantic point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.

[29]

Vishwanath A. Sindagi, Yin Zhou, and Oncel Tuzel. 2019. Mvx-net: Multimodal voxelnet for 3d object detection. In Proceedings of the International Conference on Robotics and Automation (ICRA’19). IEEE, 7276–7282.

Digital Library

[30]

Vishwanath A. Sindagi, Yin Zhou, and Oncel Tuzel. 2019. MVX-Net: Multimodal voxelNet for 3D object detection. In 2019 International Conference on Robotics and Automation (ICRA’19). 7276–7282. DOI:

Digital Library

[31]

Alexander J. B. Trevor, Suat Gedikli, Radu B. Rusu, and Henrik I Christensen. 2013. Efficient organized point cloud segmentation with connected components. In Proceedings of the Workshop on Semantic Perception Mapping and Exploration (SPME’13).

[32]

Sourabh Vora, Alex H. Lang, Bassam Helou, and Oscar Beijbom. 2020. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4604–4612.

[33]

Chunwei Wang, Chao Ma, Ming Zhu, and Xiaokang Yang. 2021. PointAugmenting: Cross-modal augmentation for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 11794–11803.

[34]

Zhixin Wang and Kui Jia. 2019. Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’19). IEEE, 1742–1749.

Digital Library

[35]

Li-Hua Wen and Kang-Hyun Jo. 2021. Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based backbone. IEEE Access 9 (2021), 22080–22089.

[36]

Liang Xie, Chao Xiang, Zhengxu Yu, Guodong Xu, Zheng Yang, Deng Cai, and Xiaofei He. 2019. PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. ArXiv abs/1911.06084, (2019). Retrieved from https://api.semanticscholar.org/CorpusID:208006295

[37]

Liang Xie, Chao Xiang, Zhengxu Yu, Guodong Xu, Zheng Yang, Deng Cai, and Xiaofei He. 2020. PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12460–12467.

[38]

Danfei Xu, Dragomir Anguelov, and Ashesh Jain. 2018. Pointfusion: Deep sensor fusion for 3d bounding box estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 244–253.

[39]

Yan Yan, Yuxing Mao, and Bo Li. 2018. Second: Sparsely embedded convolutional detection. Sensors 18, 10 (2018), 3337.

[40]

Bin Yang, Wenjie Luo, and Raquel Urtasun. 2018. Pixor: Real-time 3d object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7652–7660.

[41]

Jin Hyeok Yoo, Yecheol Kim, Jisong Kim, and Jun Won Choi. 2020. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In Proceedings of the 16th European Conference on Computer Vision (ECCV’20). Springer, 720–736.

Digital Library

[42]

Yin Zhou and Oncel Tuzel. 2018. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4490–4499.

Cited By

Qu DChen QBai TLu HFan HZhang HFu SYang Q(2024)SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS58592.2024.10801398(8905-8912)Online publication date: 14-Oct-2024
https://doi.org/10.1109/IROS58592.2024.10801398

Index Terms

Sniffer Faster R-CNN ++: An Efficient Camera-LiDAR Object Detector with Proposal Refinement on Fused Candidates
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

SGFusion: Camera-LiDAR Semantic and Geometric Fusion for 3D Object Detection
Neural Information Processing
Abstract
Camera and lidar are considered as important sensors to achieve higher-level autonomous driving. And the complementary information provided by these sensors offer more opportunities for improving performance. However, it is difficulty to fuse them ...
Multiple Objects Detection based on Improved Faster R-CNN
ICSPS 2017: Proceedings of the 9th International Conference on Signal Processing Systems

Object detection is one of the hotspots in recent years. In order to solve those problems that many traditional methods exist such as single object detection and poor robustness detection, a multiple objects detection model based on the improved Faster ...
An Object Detector based on Bi-directional Feature Pyramid Network and Faster R-CNN
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. The proposal of Feature Pyramid Networks(FPN) has achieved great success in the field of object detection and greatly improved the detection ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Autonomous Transportation Systems

ACM Journal on Autonomous Transportation Systems Volume 1, Issue 2

June 2024

127 pages

EISSN:2833-0528

DOI:10.1145/3613595

Editors:
Vaneet Aggarwal
Purdue University, United States
,
Satish V. Ukkusuri
Purdue University, United States

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2024

Online AM: 28 October 2023

Accepted: 25 October 2023

Revised: 01 September 2023

Received: 03 May 2023

Published in JATS Volume 1, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
261
Total Downloads

Downloads (Last 12 months)154
Downloads (Last 6 weeks)8

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qu DChen QBai TLu HFan HZhang HFu SYang Q(2024)SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS58592.2024.10801398(8905-8912)Online publication date: 14-Oct-2024
https://doi.org/10.1109/IROS58592.2024.10801398

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents