research-article

MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection

Authors:

Xin LiAuthors Info & Claims

PRIS '23: Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems

Pages 36 - 44

https://doi.org/10.1145/3609703.3609710

Published: 16 August 2023 Publication History

Abstract

Object detection under one-level feature is a challenging task, which requires that object representations at different scales can be extracted on a single feature map. However, existing object detectors using a one-level feature suffer from inadequate of different-scale object representations resulting in low accuracy for multi-scale object detection, especially for smaller objects. To address the problem above-mentioned, a novel object detector named MSYOLOF, is proposed to construct an effective single feature map for detecting objects of different scales. In the proposed network, three modules are proposed to bring considerable improvements, namely Feature Pyramid Pooling (FPP), Feature Perception Enhancement (FPE), and Dual Branch Receptive Field (DBRF). Firstly, the FPP module aggregates contextual information from various regions to improve the network's ability to achieve global information, which strengthens the model's understanding of the overall scene. Then, the FPE module utilizes coordinate attention to construct a residual block to obtain orientation-aware and position-sensitive information, making the network efficient in accurately locating and identifying objects of interest. Third, by rethinking the Dilated Encoder of YOLOF, the DBRF module reduces information loss and mitigates the problem of being sensitive only to large objects when dilated convolution utilizes large expansion rates. Extensive experiments are conducted on COCO benchmark to validate the effectiveness of our network, which exhibits superior performance compared to other state-of-the-art networks.

References

[1]

Ghiasi, G., Lin, T.-Y., & Le, Q. V. 2019. Nas-fpn: Learning scalable feature pyramid architecture for object detection, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7029-7038.

[2]

Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. 2018. Path aggregation network for instance segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition. 8759-8768.

[3]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll ́ ar. 2017. Focal loss for dense object detection, Proceedings of the IEEE international conference on computer vision. 2980–2988.

[4]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection, Proceedings of the IEEE international conference on computer vision. 9627–9636.

[5]

Tsung-Yi Lin, Piotr Doll ́ ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.

[6]

Kaiming He, Georgia Gkioxari, Piotr Doll ́ ar, and Ross Girshick. 2017. Mask r-cnn, Proceedings of the IEEE international conference on computer vision. 2961–2969.

[7]

Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 6154–6162

[8]

Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., & Sun, J. 2021. You only look onelevel feature, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13034-13043.

[9]

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition. 8759–8768.

[10]

Tao Kong, Fuchun Sun, Chuanqi Tan, Huaping Liu, and Wenbing Huang. 2018. Deep feature pyramid reconfiguration for object detection, European Conference on Computer Vision. 169–185.

Digital Library

[11]

Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra r-cnn: Towards balanced learning for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 821–830.

[12]

Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. 2019. Nas-fpn: Learning scalable feature pyramid architecture for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 7036–7045.

[13]

Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10781–10790.

[14]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition. 580–587.

Digital Library

[15]

Ross Girshick. 2015. Fast r-cnn, Proceedings of the IEEE international conference on computer vision. 1440–1448.

Digital Library

[16]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems. Vol. 2015. 91–99.

[17]

Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-fcn: Object detection via region-based fully convolutional networks, Advances in neural information processing systems.Vol. 29. 379–387.

[18]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unifified, real-time object detection, Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.

[19]

Joseph Redmon and Ali Farhadi. 2017. Yolo9000: better, faster, stronger, Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271.

[20]

Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. 2019. M2det: A single-shot object detector based on multi-level feature pyramid network, Proceedings of the AAAI conference on artificial intelligence. 9259-9266.

[21]

Li, Y., Chen, Y., Wang, N., & Zhang, Z. 2019. Scale-Aware Trident Networks for Object Detection, Proceedings of the IEEE international conference on computer vision. 6053-6022.

[22]

Hu, J., Shen, L., Sun, G. 2018. Squeeze-and-excitation networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141.

[23]

Woo, S., Park, J., lee, JY., Kweon, IS. 2018. CBAM: convolutional blockAttention module, European Conference on Computer Vision. 3–19.

[24]

Q. Hou, D. Zhou, and J. Feng. 2021. Coordinate Attention for Efficient Mobile Network Design, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13708-13717.

[25]

He, K., Zhang, X., Ren, S., & Sun, J. 2015. Deep Residual Learning for Image Recognition, Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.

[26]

Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D. 2019. Mmdetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155.

[27]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ́ ar, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context, European conference on computer vision. 740–755.

[28]

Daniel Bolya, Sean Foley, James Hays, and Judy Hoffman. 2020. Tide: A general toolbox for identifying object detection errors, European Conference on Computer Vision. 558-573.

Digital Library

Index Terms

MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision tasks
  2. Machine learning
    1. Machine learning algorithms
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Feature Enhancement for Multi-scale Object Detection
Abstract
Recently, deep learning has brought great progress in object detection. However, we believe that traditional hand-crafted features may still contain valuable human knowledge complementary to features learned from raw data. Besides, almost all top-...
Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet
Intelligent Robotics and Applications
Abstract
SSD (Single Shot Multibox Detector) is one of advanced object detection methods and apparently can detect objects with high accuracy and fast speed. However, detecting small objects accurately remains a problem full of challenges for SSD. To ...
A single-shot multi-level feature reused neural network for object detection
Abstract
Recent years have witnessed the significant progress in object detection using deep convolutional neutral networks. However, there are few object detectors achieving high precision with low computational cost. In this paper, a novel and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PRIS '23: Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems

July 2023

123 pages

ISBN:9781450399968

DOI:10.1145/3609703

Editors:
Wenbing Zhao,
Xinguo Yu

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the Fundamental Research Funds for the Central Universities
the High Resolution Earth Observing System-Water Application Demonstration
the Excellent Post-doctoral Program of Jiangsu Province
the Project of Water Science and Technology of Jiangsu Province
the National Natural Science Foundation of China
the National Natural Science Foundation of China
the National Natural Science Foundation of China

Conference

PRIS 2023

PRIS 2023: 2023 5th International Conference on Pattern Recognition and Intelligent Systems

July 28 - 30, 2023

Shenyang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
26
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents