Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3609703.3609710acmotherconferencesArticle/Chapter ViewAbstractPublication PagesprisConference Proceedingsconference-collections
research-article

MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection

Published: 16 August 2023 Publication History

Abstract

Object detection under one-level feature is a challenging task, which requires that object representations at different scales can be extracted on a single feature map. However, existing object detectors using a one-level feature suffer from inadequate of different-scale object representations resulting in low accuracy for multi-scale object detection, especially for smaller objects. To address the problem above-mentioned, a novel object detector named MSYOLOF, is proposed to construct an effective single feature map for detecting objects of different scales. In the proposed network, three modules are proposed to bring considerable improvements, namely Feature Pyramid Pooling (FPP), Feature Perception Enhancement (FPE), and Dual Branch Receptive Field (DBRF). Firstly, the FPP module aggregates contextual information from various regions to improve the network's ability to achieve global information, which strengthens the model's understanding of the overall scene. Then, the FPE module utilizes coordinate attention to construct a residual block to obtain orientation-aware and position-sensitive information, making the network efficient in accurately locating and identifying objects of interest. Third, by rethinking the Dilated Encoder of YOLOF, the DBRF module reduces information loss and mitigates the problem of being sensitive only to large objects when dilated convolution utilizes large expansion rates. Extensive experiments are conducted on COCO benchmark to validate the effectiveness of our network, which exhibits superior performance compared to other state-of-the-art networks.

References

[1]
Ghiasi, G., Lin, T.-Y., & Le, Q. V. 2019. Nas-fpn: Learning scalable feature pyramid architecture for object detection, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7029-7038.
[2]
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. 2018. Path aggregation network for instance segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition. 8759-8768.
[3]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll ́ ar. 2017. Focal loss for dense object detection, Proceedings of the IEEE international conference on computer vision. 2980–2988.
[4]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection, Proceedings of the IEEE international conference on computer vision. 9627–9636.
[5]
Tsung-Yi Lin, Piotr Doll ́ ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.
[6]
Kaiming He, Georgia Gkioxari, Piotr Doll ́ ar, and Ross Girshick. 2017. Mask r-cnn, Proceedings of the IEEE international conference on computer vision. 2961–2969.
[7]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 6154–6162
[8]
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., & Sun, J. 2021. You only look onelevel feature, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13034-13043.
[9]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition. 8759–8768.
[10]
Tao Kong, Fuchun Sun, Chuanqi Tan, Huaping Liu, and Wenbing Huang. 2018. Deep feature pyramid reconfiguration for object detection, European Conference on Computer Vision. 169–185.
[11]
Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra r-cnn: Towards balanced learning for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 821–830.
[12]
Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. 2019. Nas-fpn: Learning scalable feature pyramid architecture for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 7036–7045.
[13]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10781–10790.
[14]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition. 580–587.
[15]
Ross Girshick. 2015. Fast r-cnn, Proceedings of the IEEE international conference on computer vision. 1440–1448.
[16]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems. Vol. 2015. 91–99.
[17]
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-fcn: Object detection via region-based fully convolutional networks, Advances in neural information processing systems.Vol. 29. 379–387.
[18]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unifified, real-time object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.
[19]
Joseph Redmon and Ali Farhadi. 2017. Yolo9000: better, faster, stronger, Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271.
[20]
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. 2019. M2det: A single-shot object detector based on multi-level feature pyramid network, Proceedings of the AAAI conference on artificial intelligence. 9259-9266.
[21]
Li, Y., Chen, Y., Wang, N., & Zhang, Z. 2019. Scale-Aware Trident Networks for Object Detection, Proceedings of the IEEE international conference on computer vision. 6053-6022.
[22]
Hu, J., Shen, L., Sun, G. 2018. Squeeze-and-excitation networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141.
[23]
Woo, S., Park, J., lee, JY., Kweon, IS. 2018. CBAM: convolutional blockAttention module, European Conference on Computer Vision. 3–19.
[24]
Q. Hou, D. Zhou, and J. Feng. 2021. Coordinate Attention for Efficient Mobile Network Design, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13708-13717.
[25]
He, K., Zhang, X., Ren, S., & Sun, J. 2015. Deep Residual Learning for Image Recognition, Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.
[26]
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D. 2019. Mmdetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155.
[27]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ́ ar, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context, European conference on computer vision. 740–755.
[28]
Daniel Bolya, Sean Foley, James Hays, and Judy Hoffman. 2020. Tide: A general toolbox for identifying object detection errors, European Conference on Computer Vision. 558-573.

Index Terms

  1. MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          PRIS '23: Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems
          July 2023
          123 pages
          ISBN:9781450399968
          DOI:10.1145/3609703
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 16 August 2023

          Permissions

          Request permissions for this article.

          Check for updates

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Funding Sources

          • the Fundamental Research Funds for the Central Universities
          • the High Resolution Earth Observing System-Water Application Demonstration
          • the Excellent Post-doctoral Program of Jiangsu Province
          • the Project of Water Science and Technology of Jiangsu Province
          • the National Natural Science Foundation of China
          • the National Natural Science Foundation of China
          • the National Natural Science Foundation of China

          Conference

          PRIS 2023

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 26
            Total Downloads
          • Downloads (Last 12 months)20
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 04 Oct 2024

          Other Metrics

          Citations

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media