ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object Detection

Song, Chen; Cheng, Xu; Liu, Lihua; Li, Daqiu

doi:10.1007/978-3-030-88004-0_31

Chen Song¹⁶,
Xu Cheng¹⁶,
Lihua Liu¹⁶ &
…
Daqiu Li¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13019))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2731 Accesses

Abstract

Object detection is one of the most fundamental tasks toward image content understanding due to their wide applications in real-world. Although numerous algorithms have been proposed, implementing effective and efficient object detection is still very challenging for now, especially for the challenges in restricted situations of multi-size objects and weak semantic information. In this paper, we propose a feature information-interaction visual attention model for multi-layer feature fusion and enhancement, which utilizes channel information to weight self-attentive feature maps, completing extraction, fusion and enhancement of global semantic feature with local contextual information of the object. Additionally, we also propose an adaptively cyclic feature information-interaction model, which adopts branch prediction to decide the number of visual attention, accomplishing adaptive fusion of global semantic feature and local fine-grained information. Numerous experiments on the benchmark dataset PASCAL VOC and MS COCO show that our method effectively achieves significant improvements over baseline model.

This work is supported in part by the National Natural Science Foundation of China (Grant No. 61802058, 61911530397); and in part by the Project funded by the China Postdoctoral Science Foundation (Grant No. 2019M651650).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

FFR-SSD: feature fusion and reconstruction single shot detector for multi-scale object detection

Article 17 March 2023

Multi-scale semantic enhancement network for object detection

Article Open access 03 May 2023

Multi-level Features Selection Network Based on Multi-attention for Salient Object Detection

References

Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. Comput. 100(1), 67–92 (1973)
Google Scholar
Borji, A., Cheng, M.-M., Hou, Q., Jiang, H., Li, J.: Salient object detection: a survey. Computat. Vis. Media 5(2), 117–150 (2019). https://doi.org/10.1007/s41095-019-0149-9
Article Google Scholar
Chen, Y., Cao, Y., Hu, H., et al.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 10337–10346. IEEE (2020)
Google Scholar
Zheng, Y., Liu, X., Cheng, X., et al.: Multi-task deep dual correlation filters for visual tracking. IEEE Trans. Image Process. 29, 9614–9626 (2020)
Article Google Scholar
Fu, Z., Chen, Y., Yong, H., et al.: Foreground gating and background refining network for surveillance object detection. IEEE Trans. Image Process. 28(12), 6077–6090 (2019)
Article MathSciNet Google Scholar
Dai, X.: Hybridnet: a fast vehicle detection system for autonomous driving. Sig. Process.: Image Commun. 70, 79–88 (2019)
Google Scholar
Du, G., Wang, K., Lian, S.: Vision-based robotic grasping from object localization, pose estimation, grasp detection to motion planning: a review. arXiv preprint arXiv:1905.06658 (2019)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al.: Cascade object detection with deformable part models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 2241–2248. IEEE (2010)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., et al.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2009)
Article Google Scholar
Rätsch, M., Romdhani, S., Vetter, T.: Efficient face detection by a cascaded support vector machine using haar-like features. Joint Pattern Recogn. Symp. 3175, 62–70 (2004)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 1150–1157 IEEE (1999)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 886–893. IEEE (2005)
Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, et al.: Microsoft coco: Common objects in context, In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp.740–755. IEEE (2014)
Google Scholar
Girshick, R., Donahue, J., Darrell, et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 580–587. IEEE (2014)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Liu, W., et al.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Zhou, P., Ni, B., Geng, C., et al.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 528–537. IEEE (2018)
Google Scholar
Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 4203–4212. IEEE (2018)
Google Scholar
Bell, S., Zitnick, C.L., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 2874–2883. IEEE (2016)
Google Scholar
Zagoruyko, S., Lerer, A., Lin, T.Y., et al.: A multipath network for object detection. arXiv preprint arXiv:1604.02135 (2016)
Dai, J., Li, Y., He, K., et al.: R-fcn: object detection via region-based fully convolutional networks. arXiv preprint arXiv:06409 (2016)
Bae, S.H.: Object detection based on region decomposition and assembly. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8094–8101 (2019)
Google Scholar
Beery, S., Wu, G., et al.: Context r-cnn: Long term temporal context for per-camera object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 13075–13085. IEEE (2020)
Google Scholar
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2018. LNCS, pp. 734–750. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-30952-7
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Duan, K., Bai, S., Xie, L., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 6569–6578. IEEE (2019)
Google Scholar
Tian, Z., Shen, C., Chen, H., et al.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 9627–9636. IEEE (2019)
Google Scholar
Yang, Z., Liu, S., Hu, H., et al.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 9656–9665, IEEE (2019)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Park, J., Woo, S., Lee, J.-Y., et al.: BAM: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Liu, J.J., Hou, Q., et al.: Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 10096–10105. IEEE (2020)
Google Scholar
Yang, Q.L., Zhang, Y.B.: SA-Net: shuffle attention for deep convolutional neural networks. arXiv preprint arXiv:2102.00240 (2021)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 7794–7803. IEEE (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ,pp. 1026–1034. IEEE (2015)
Google Scholar
Zheng, L., Fu, C., Zhao, Y.: Extend the shallow part of single shot multibox detector viconvolutional neural network. In: Tenth International Conference on Digital Image Processing. International Society for Optics and Photonics, pp. 10806–1080613. (2018)
Google Scholar
Cao, G., Xie, X., Yang, W., et al.: Feature-fused SSD: fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing, pp. 10615–106151E (2018)
Google Scholar
Cui, L., Ma, R., Lv, P., et al.: MDSSD: multi-scale deconvolutional single shot detector for small objects. arXiv preprint arXiv:1805.07009 (2018)
Cao, Y., Xu, J.R., Lin, S. et al.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Piscataway, NJ. IEEE(2019)
Google Scholar
Li, X., Wang, W., Hu, X., et al.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, pp. 510–519. IEEE(2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Chen Song, Xu Cheng, Lihua Liu & Daqiu Li

Authors

Chen Song
View author publications
You can also search for this author in PubMed Google Scholar
Xu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Lihua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Daqiu Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Cheng .

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, C., Cheng, X., Liu, L., Li, D. (2021). ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object Detection. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-88004-0_31
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics