Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object Detection

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13019))

Included in the following conference series:

  • 2631 Accesses

Abstract

Object detection is one of the most fundamental tasks toward image content understanding due to their wide applications in real-world. Although numerous algorithms have been proposed, implementing effective and efficient object detection is still very challenging for now, especially for the challenges in restricted situations of multi-size objects and weak semantic information. In this paper, we propose a feature information-interaction visual attention model for multi-layer feature fusion and enhancement, which utilizes channel information to weight self-attentive feature maps, completing extraction, fusion and enhancement of global semantic feature with local contextual information of the object. Additionally, we also propose an adaptively cyclic feature information-interaction model, which adopts branch prediction to decide the number of visual attention, accomplishing adaptive fusion of global semantic feature and local fine-grained information. Numerous experiments on the benchmark dataset PASCAL VOC and MS COCO show that our method effectively achieves significant improvements over baseline model.

This work is supported in part by the National Natural Science Foundation of China (Grant No. 61802058, 61911530397); and in part by the Project funded by the China Postdoctoral Science Foundation (Grant No. 2019M651650).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. Comput. 100(1), 67–92 (1973)

    Google Scholar 

  2. Borji, A., Cheng, M.-M., Hou, Q., Jiang, H., Li, J.: Salient object detection: a survey. Computat. Vis. Media 5(2), 117–150 (2019). https://doi.org/10.1007/s41095-019-0149-9

    Article  Google Scholar 

  3. Chen, Y., Cao, Y., Hu, H., et al.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 10337–10346. IEEE (2020)

    Google Scholar 

  4. Zheng, Y., Liu, X., Cheng, X., et al.: Multi-task deep dual correlation filters for visual tracking. IEEE Trans. Image Process. 29, 9614–9626 (2020)

    Article  Google Scholar 

  5. Fu, Z., Chen, Y., Yong, H., et al.: Foreground gating and background refining network for surveillance object detection. IEEE Trans. Image Process. 28(12), 6077–6090 (2019)

    Article  MathSciNet  Google Scholar 

  6. Dai, X.: Hybridnet: a fast vehicle detection system for autonomous driving. Sig. Process.: Image Commun. 70, 79–88 (2019)

    Google Scholar 

  7. Du, G., Wang, K., Lian, S.: Vision-based robotic grasping from object localization, pose estimation, grasp detection to motion planning: a review. arXiv preprint arXiv:1905.06658 (2019)

  8. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al.: Cascade object detection with deformable part models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 2241–2248. IEEE (2010)

    Google Scholar 

  9. Everingham, M., Van Gool, L., Williams, C.K., et al.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2009)

    Article  Google Scholar 

  10. Rätsch, M., Romdhani, S., Vetter, T.: Efficient face detection by a cascaded support vector machine using haar-like features. Joint Pattern Recogn. Symp. 3175, 62–70 (2004)

    Google Scholar 

  11. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 1150–1157 IEEE (1999)

    Google Scholar 

  12. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 886–893. IEEE (2005)

    Google Scholar 

  13. Lin, T.Y., Maire, M., Belongie, S., Hays, et al.: Microsoft coco: Common objects in context, In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp.740–755. IEEE (2014)

    Google Scholar 

  14. Girshick, R., Donahue, J., Darrell, et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 580–587. IEEE (2014)

    Google Scholar 

  15. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)

  16. Liu, W., et al.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  17. Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  18. Zhou, P., Ni, B., Geng, C., et al.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 528–537. IEEE (2018)

    Google Scholar 

  19. Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 4203–4212. IEEE (2018)

    Google Scholar 

  20. Bell, S., Zitnick, C.L., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 2874–2883. IEEE (2016)

    Google Scholar 

  21. Zagoruyko, S., Lerer, A., Lin, T.Y., et al.: A multipath network for object detection. arXiv preprint arXiv:1604.02135 (2016)

  22. Dai, J., Li, Y., He, K., et al.: R-fcn: object detection via region-based fully convolutional networks. arXiv preprint arXiv:06409 (2016)

  23. Bae, S.H.: Object detection based on region decomposition and assembly. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8094–8101 (2019)

    Google Scholar 

  24. Beery, S., Wu, G., et al.: Context r-cnn: Long term temporal context for per-camera object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 13075–13085. IEEE (2020)

    Google Scholar 

  25. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2018. LNCS, pp. 734–750. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-30952-7

  26. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  27. Duan, K., Bai, S., Xie, L., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 6569–6578. IEEE (2019)

    Google Scholar 

  28. Tian, Z., Shen, C., Chen, H., et al.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 9627–9636. IEEE (2019)

    Google Scholar 

  29. Yang, Z., Liu, S., Hu, H., et al.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ, pp. 9656–9665, IEEE (2019)

    Google Scholar 

  30. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)

  31. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  32. Park, J., Woo, S., Lee, J.-Y., et al.: BAM: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)

  33. Liu, J.J., Hou, Q., et al.: Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 10096–10105. IEEE (2020)

    Google Scholar 

  34. Yang, Q.L., Zhang, Y.B.: SA-Net: shuffle attention for deep convolutional neural networks. arXiv preprint arXiv:2102.00240 (2021)

  35. Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ, pp. 7794–7803. IEEE (2018)

    Google Scholar 

  36. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  37. He, K., Zhang, X., Ren, S., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, Piscataway, NJ,pp. 1026–1034. IEEE (2015)

    Google Scholar 

  38. Zheng, L., Fu, C., Zhao, Y.: Extend the shallow part of single shot multibox detector viconvolutional neural network. In: Tenth International Conference on Digital Image Processing. International Society for Optics and Photonics, pp. 10806–1080613. (2018)

    Google Scholar 

  39. Cao, G., Xie, X., Yang, W., et al.: Feature-fused SSD: fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing, pp. 10615–106151E (2018)

    Google Scholar 

  40. Cui, L., Ma, R., Lv, P., et al.: MDSSD: multi-scale deconvolutional single shot detector for small objects. arXiv preprint arXiv:1805.07009 (2018)

  41. Cao, Y., Xu, J.R., Lin, S. et al.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Piscataway, NJ. IEEE(2019)

    Google Scholar 

  42. Li, X., Wang, W., Hu, X., et al.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, pp. 510–519. IEEE(2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, C., Cheng, X., Liu, L., Li, D. (2021). ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object Detection. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88004-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88003-3

  • Online ISBN: 978-3-030-88004-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics