Abstract
Object detection is one of the most fundamental tasks toward image content understanding due to their wide applications in real-world. Although numerous algorithms have been proposed, implementing effective and efficient object detection is still very challenging for now, especially for the challenges in restricted situations of multi-size objects and weak semantic information. In this paper, we propose a feature information-interaction visual attention model for multi-layer feature fusion and enhancement, which utilizes channel information to weight self-attentive feature maps, completing extraction, fusion and enhancement of global semantic feature with local contextual information of the object. Additionally, we also propose an adaptively cyclic feature information-interaction model, which adopts branch prediction to decide the number of visual attention, accomplishing adaptive fusion of global semantic feature and local fine-grained information. Numerous experiments on the benchmark dataset PASCAL VOC and MS COCO show that our method effectively achieves significant improvements over baseline model.
This work is supported in part by the National Natural Science Foundation of China (Grant No. 61802058, 61911530397); and in part by the Project funded by the China Postdoctoral Science Foundation (Grant No. 2019M651650).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. Comput. 100(1), 67–92 (1973)
Borji, A., Cheng, M.-M., Hou, Q., Jiang, H., Li, J.: Salient object detection: a survey. Computat. Vis. Media 5(2), 117–150 (2019). https://doi.org/10.1007/s41095-019-0149-9
Chen, Y., Cao, Y., Hu, H., et al.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 10337–10346. IEEE (2020)
Zheng, Y., Liu, X., Cheng, X., et al.: Multi-task deep dual correlation filters for visual tracking. IEEE Trans. Image Process. 29, 9614–9626 (2020)
Fu, Z., Chen, Y., Yong, H., et al.: Foreground gating and background refining network for surveillance object detection. IEEE Trans. Image Process. 28(12), 6077–6090 (2019)
Dai, X.: Hybridnet: a fast vehicle detection system for autonomous driving. Sig. Process.: Image Commun. 70, 79–88 (2019)
Du, G., Wang, K., Lian, S.: Vision-based robotic grasping from object localization, pose estimation, grasp detection to motion planning: a review. arXiv preprint arXiv:1905.06658 (2019)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al.: Cascade object detection with deformable part models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 2241–2248. IEEE (2010)
Everingham, M., Van Gool, L., Williams, C.K., et al.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2009)
Rätsch, M., Romdhani, S., Vetter, T.: Efficient face detection by a cascaded support vector machine using haar-like features. Joint Pattern Recogn. Symp. 3175, 62–70 (2004)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 1150–1157 IEEE (1999)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 886–893. IEEE (2005)
Lin, T.Y., Maire, M., Belongie, S., Hays, et al.: Microsoft coco: Common objects in context, In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp.740–755. IEEE (2014)
Girshick, R., Donahue, J., Darrell, et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 580–587. IEEE (2014)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Liu, W., et al.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Zhou, P., Ni, B., Geng, C., et al.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 528–537. IEEE (2018)
Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 4203–4212. IEEE (2018)
Bell, S., Zitnick, C.L., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 2874–2883. IEEE (2016)
Zagoruyko, S., Lerer, A., Lin, T.Y., et al.: A multipath network for object detection. arXiv preprint arXiv:1604.02135 (2016)
Dai, J., Li, Y., He, K., et al.: R-fcn: object detection via region-based fully convolutional networks. arXiv preprint arXiv:06409Â (2016)
Bae, S.H.: Object detection based on region decomposition and assembly. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8094–8101 (2019)
Beery, S., Wu, G., et al.: Context r-cnn: Long term temporal context for per-camera object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 13075–13085. IEEE (2020)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2018. LNCS, pp. 734–750. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-30952-7
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Duan, K., Bai, S., Xie, L., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 6569–6578. IEEE (2019)
Tian, Z., Shen, C., Chen, H., et al.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 9627–9636. IEEE (2019)
Yang, Z., Liu, S., Hu, H., et al.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 9656–9665, IEEE (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Park, J., Woo, S., Lee, J.-Y., et al.: BAM: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Liu, J.J., Hou, Q., et al.: Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 10096–10105. IEEE (2020)
Yang, Q.L., Zhang, Y.B.: SA-Net: shuffle attention for deep convolutional neural networks. arXiv preprint arXiv:2102.00240 (2021)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 7794–7803. IEEE (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ,pp. 1026–1034. IEEE (2015)
Zheng, L., Fu, C., Zhao, Y.: Extend the shallow part of single shot multibox detector viconvolutional neural network. In: Tenth International Conference on Digital Image Processing. International Society for Optics and Photonics, pp. 10806–1080613. (2018)
Cao, G., Xie, X., Yang, W., et al.: Feature-fused SSD: fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing, pp. 10615–106151E (2018)
Cui, L., Ma, R., Lv, P., et al.: MDSSD: multi-scale deconvolutional single shot detector for small objects. arXiv preprint arXiv:1805.07009 (2018)
Cao, Y., Xu, J.R., Lin, S. et al.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Piscataway, NJ. IEEE(2019)
Li, X., Wang, W., Hu, X., et al.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, pp. 510–519. IEEE(2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, C., Cheng, X., Liu, L., Li, D. (2021). ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object Detection. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-88004-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)