Abstract
In a large number of real-life scenes and practical applications, 3D object detection is playing an increasingly important role. We need to estimate the position and direction of the 3D object in the real scene to complete the 3D object detection task. In this paper, we propose a new network architecture based on VoteNet to detect 3D point cloud targets. On the one hand, we use channel and spatial dual-domain attention module to enhance the features of the object to be detected while suppressing other useless features. On the other hand, the SIFT operator has scale invariance and the ability to resist occlusion and background interference. The PointSIFT module we use can capture information in different directions of point cloud in space, and is robust to shapes of different proportions, so as to better detect objects that are partially occluded. Our method is evaluated on the SUN-RGBD and ScanNet datasets of indoor scenes. The experimental results show that our method has better performance than VoteNet.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Scannet, A.D.: Richly-annotated 3d reconstructions of indoor scenes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Dyer, C., Kuncoro, A., Ballesteros, M., Smith, N.A.: Recurrent neural network grammars. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Hou, J., Dai, A., Niebner, M.: 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Jie, H., Shen, L., Albanie, S., Sun, G., Enhua, W.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 2011–2023 (2020)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. (NIPS) 28, 2017–2025 (2015)
Jiang, M., Wu, Y., Zhao, T., Zhao, Z., Pointsift, C.L.: A sift-like network module for 3d point cloud semantic segmentation (2018)
Lahoud, J., Ghanem, B.: 2d-driven 3d object detection in rgb-d images. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1–3), 259–289 (2008)
Pauline, C., Steven, N., Sift, H.: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31(13), 3812–3814 (2003)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3d object detection in point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Qi, C.R., Su, H., Mo, K., Pointnet, L.J.G.: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30, 5099–5108 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 421–429. Springer (2018)
Shi, S., Wang, X., Pointrcnn, H.L.: 3d object proposal generation and detection from point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-yolo: real-time 3d object detection on point clouds. arXiv:1803.06199 (2018)
Song, S., Lichtenberg, S.P., Xiao, J.: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Song, S., Xiao., J.: Deep sliding shapes for a modal 3D object detection in RGB-D images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Woo, S., Park, J., Lee, J.-Y., Cbam, I.S.K.: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Wu, W., Qi, Z., Pointconv, L.F.: Deep convolutional networks on 3d point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Yang, Z., Sun, Y., Liu, S., Shen, X., Std, J.J.: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Yi, L., Zhao, W., Wang, H., Sung, M., Guibas, L.J.: Generative shape proposal network for 3d instance segmentation in point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Zhang, H., Cao, J., Lu, G., Ouyang, W., Danet, Z.S.: Decompose-and-aggregate network for 3D human shape and pose estimation. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)
Zhou, Y., Voxelnet, O.T.: End-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, H., Yang, D. & Yu, J. 3D target detection using dual domain attention and SIFT operator in indoor scenes. Vis Comput 38, 3765–3774 (2022). https://doi.org/10.1007/s00371-021-02217-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02217-z