research-article

FIN: Feature Integrated Network for Object Detection

Authors:

Xiaofan Luo,

Fukoeng Wong,

Haifeng HuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 16, Issue 2

Article No.: 48, Pages 1 - 18

https://doi.org/10.1145/3381086

Published: 22 May 2020 Publication History

Get Access

Abstract

Multi-layer detection is a widely used method in the field of object detection. It extracts multiple feature maps with different resolutions from the backbone network to detect objects of different scales, which can effectively cope with the problem of object scale change in object detection. Although the multi-layer detection utilizes multiple detection layers to alleviate the burden of one single detection layer and can improve the detection accuracy to some extent, this method has two limitations. First, manually assigning anchor boxes of different sizes to different feature maps is too dependent on the human experience. Second, there is a semantic gap between each detection layer in multi-layer detection. The same detector needs to simultaneously process the detection layers with inconsistent semantic strength, which increases the optimization difficulty of the detector. In this article, we propose a feature integrated network (FIN) based on single layer detection to deal with the problems mentioned above. Different from the existing methods, we design a series of verification experiments based on the multi-layer detection model, which shows that the shallow high-resolution feature map has the potential to simultaneously and effectively detect objects of various scales. Considering that the semantic information of the shallow feature map is weak, we propose two modules to enhance the representation ability of the single detection layer. First, we propose a detection adaptation network (DANet) to extract powerful feature maps that are useful for object detection tasks. Second, we combine global context information and local detail information with a verified hourglass module (VHM) to generate a single feature map with high resolution and rich semantic information so that we can assign all anchor boxes to this detection layer. In our model, all the detection operations are concentrated on a high-resolution feature map whose semantic information and detailed information are enhanced as much as possible. Therefore, the proposed model can solve the problem of anchor assignment and inconsistent semantic strength between multiple detection layers mentioned above. A large number of experiments on the Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes (PASCAL VOC) and Microsoft Common Objects in Context (MS COCO) datasets show that our model has good detection performance for objects of various sizes. The proposed model can achieve<?brk?> 81.9 mAP when the size of the input image is 300 × 300.

References

[1]

Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick. 2016. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2874--2883.

Abstract

References

Cited By

Index Terms

Recommendations

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

Object detection using YOLO: challenges, architectural successors, datasets and applications

Development and challenges of object detection: A survey

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations