Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FIN: Feature Integrated Network for Object Detection

Published: 22 May 2020 Publication History

Abstract

Multi-layer detection is a widely used method in the field of object detection. It extracts multiple feature maps with different resolutions from the backbone network to detect objects of different scales, which can effectively cope with the problem of object scale change in object detection. Although the multi-layer detection utilizes multiple detection layers to alleviate the burden of one single detection layer and can improve the detection accuracy to some extent, this method has two limitations. First, manually assigning anchor boxes of different sizes to different feature maps is too dependent on the human experience. Second, there is a semantic gap between each detection layer in multi-layer detection. The same detector needs to simultaneously process the detection layers with inconsistent semantic strength, which increases the optimization difficulty of the detector. In this article, we propose a feature integrated network (FIN) based on single layer detection to deal with the problems mentioned above. Different from the existing methods, we design a series of verification experiments based on the multi-layer detection model, which shows that the shallow high-resolution feature map has the potential to simultaneously and effectively detect objects of various scales. Considering that the semantic information of the shallow feature map is weak, we propose two modules to enhance the representation ability of the single detection layer. First, we propose a detection adaptation network (DANet) to extract powerful feature maps that are useful for object detection tasks. Second, we combine global context information and local detail information with a verified hourglass module (VHM) to generate a single feature map with high resolution and rich semantic information so that we can assign all anchor boxes to this detection layer. In our model, all the detection operations are concentrated on a high-resolution feature map whose semantic information and detailed information are enhanced as much as possible. Therefore, the proposed model can solve the problem of anchor assignment and inconsistent semantic strength between multiple detection layers mentioned above. A large number of experiments on the Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes (PASCAL VOC) and Microsoft Common Objects in Context (MS COCO) datasets show that our model has good detection performance for objects of various sizes. The proposed model can achieve<?brk?> 81.9 mAP when the size of the input image is 300 × 300.

References

[1]
Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick. 2016. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2874--2883.
[2]
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems. 379--387.
[3]
Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017).
[4]
Spyros Gidaris and Nikos Komodakis. 2015. Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE International Conference on Computer Vision. 1134--1142.
[5]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[8]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4708.
[9]
Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, and Xinggang Wang. 2019. Mask scoring R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6409--6418.
[10]
Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, and Yurong Chen. 2017. Ron: Reverse connection with objectness prior networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5936--5944.
[11]
Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. 2018. Detnet: A backbone network for object detection. arXiv preprint arXiv:1804.06215 (2018).
[12]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117--2125.
[13]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980--2988.
[14]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21--37.
[15]
Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024--8035.
[16]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779--788.
[17]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[18]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017). 1137--1149.
[19]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[20]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.
[21]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
[22]
Sanghyun Woo, Soonmin Hwang, and In So Kweon. 2018. Stairnet: Top-down semantic aggregation for accurate one shot detection. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1093--1102.
[23]
Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, et al. 2018. Crafting GBD-net for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 9 (2018), 2109--2123.
[24]
Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4203--4212.

Cited By

View all
  • (2023)Soft-masks guided faster region-based convolutional neural network for domain adaptation in wind turbine detectionFrontiers in Energy Research10.3389/fenrg.2022.108300510Online publication date: 30-Jan-2023
  • (2023)Boosting Few-shot Object Detection with Discriminative Representation and Class MarginACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360847820:3(1-19)Online publication date: 12-Jul-2023
  • (2023)Complementary Feature Pyramid Network for Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358436219:6(1-15)Online publication date: 15-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 2
May 2020
390 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3401894
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2020
Online AM: 07 May 2020
Accepted: 01 January 2020
Revised: 01 December 2019
Received: 01 June 2019
Published in TOMM Volume 16, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Object detection
  2. deep learning
  3. feature integration
  4. multi-layer detection

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Soft-masks guided faster region-based convolutional neural network for domain adaptation in wind turbine detectionFrontiers in Energy Research10.3389/fenrg.2022.108300510Online publication date: 30-Jan-2023
  • (2023)Boosting Few-shot Object Detection with Discriminative Representation and Class MarginACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360847820:3(1-19)Online publication date: 12-Jul-2023
  • (2023)Complementary Feature Pyramid Network for Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358436219:6(1-15)Online publication date: 15-Feb-2023
  • (2023)Image Quality Assessment–driven Reinforcement Learning for Mixed Distorted Image RestorationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353262519:1s(1-23)Online publication date: 3-Feb-2023
  • (2022)GHOSM: Graph-based Hybrid Outline and Skeleton Modelling for Shape RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355492219:2s(1-23)Online publication date: 4-Aug-2022
  • (2022)Towards Accurate Oriented Object Detection in Aerial Images with Adaptive Multi-level Feature FusionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351313319:1(1-22)Online publication date: 18-Feb-2022
  • (2022)Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic PoolingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349222118:1s(1-23)Online publication date: 31-Jan-2022
  • (2021)Hypomimia Recognition in Parkinson’s Disease With Semantic FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347677817:3s(1-20)Online publication date: 26-Oct-2021

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media