Object Detection Combining CNN and Adaptive Color Prior Features
Abstract
:1. Introduction
2. Related Work
2.1. Object Detection Algorithm
2.2. Saliency Detection Algorithm
3. Proposed Method
3.1. Overall Structure
3.2. “Off-Line Memory” Stage
3.2.1. Asymmetric Color Pattern Image
3.2.2. Category Pattern Distribution and Scene Pattern Distribution
3.3. “On-Line Mapping” Stage
3.3.1. Dynamic Adaptive Color Prior Model
3.3.2. Feature Map Generation
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Overall Performance Verification of the Algorithm
4.4. Ablation Study
- YUV asymmetric color pattern: Another option is to use the RGB color pattern.
- Whether to use forgetting factor .
- Fusion methods of color features and network features: a variety of feature fusion methods are designed and compared, and the experiments are performed to verify that the best feature fusion method is selected.
4.5. Comparative Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [Green Version]
- Chen, Z.; Li, J.; You, X. Learn to focus on objects for visual detection. Neurocomputing 2019, 348, 27–39. [Google Scholar] [CrossRef]
- Gupta, D.; Anantharaman, A.; Mamgain, N.; Balasubramanian, V.N.; Jawahar, C. A multi-space approach to zero-shot object detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 1209–1217. [Google Scholar]
- Liu, Y.; Wang, Y.; Wang, S.; Liang, T.; Zhao, Q.; Tang, Z.; Ling, H. Cbnet: A novel composite backbone network architecture for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11653–11660. [Google Scholar]
- Oksuz, K.; Cam, B.C.; Akbas, E.; Kalkan, S. Generating positive bounding boxes for balanced training of object detectors. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 894–903. [Google Scholar]
- Qin, Z.; Li, Z.; Zhang, Z.; Bao, Y.; Yu, G.; Peng, Y.; Sun, J. ThunderNet: Towards real-time generic object detection on mobile devices. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6718–6727. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Gardner, M.; Grus, J.; Neumann, M.; Tafjord, O.; Dasigi, P.; Liu, N.; Peters, M.; Schmitz, M.; Zettlemoyer, L. Allennlp: A deep semantic natural language processing platform. arXiv 2018, arXiv:1803.07640. [Google Scholar]
- Nassif, A.B.; Shahin, I.; Attili, I.; Azzeh, M.; Shaalan, K. Speech recognition using deep neural networks: A systematic review. IEEE Access 2019, 7, 19143–19165. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heisenberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Wu, X.; Sahoo, D.; Hoi, S.C. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef] [Green Version]
- Jia, P.; Liu, F. Lightweight feature enhancement network for single-shot object detection. Sensors 2021, 21, 1066. [Google Scholar] [CrossRef]
- Gong, B.; Ergu, D.; Cai, Y.; Ma, B. Real-time detection for wheat head applying deep neural network. Sensors 2021, 21, 191. [Google Scholar] [CrossRef]
- Sepulveda, P.; Usher, M.; Davies, N.; Benson, A.A.; Ortoleva, P.; De Martino, B. Visual attention modulates the integration of goal-relevant evidence and not value. Elife 2020, 9, e60705. [Google Scholar] [CrossRef] [PubMed]
- Khachatryan, H.; Rihn, A.; Behe, B.; Hall, C.; Campbell, B.; Dennis, J.; Yue, C. Visual attention, buying impulsiveness, and consumer behavior. Mark. Lett. 2018, 29, 23–35. [Google Scholar] [CrossRef]
- Shen, Y.; Li, S.; Zhu, C.; Chang, H. A fast top-down visual attention method to accelerate template matching. Comput. Model. New Technol. 2014, 18, 86–93. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Zitnick, C.L.; Dollár, P. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heisenberg, Germany, 2014; pp. 391–405. [Google Scholar]
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision, Corfu, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; pp. 511–518. [Google Scholar]
- Chahal, K.S.; Dey, K. A survey of modern object detection literature using deep learning. arXiv 2018, arXiv:1808.07256. [Google Scholar]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef]
- Li, W.; Li, F.; Luo, Y.; Wang, P. Deep domain adaptive object detection: A survey. In Proceedings of the Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020; IEEE: New York, NY, USA, 2020; pp. 1808–1813. [Google Scholar]
- Cazzato, D.; Cimarelli, C.; Sanchez-Lopez, J.L.; Voos, H.; Leo, M. A Survey of Computer Vision Methods for 2D Object Detection from Unmanned Aerial Vehicles. J. Imaging 2020, 6, 78. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef] [Green Version]
- Itti, L.; Koch, C. Computational modelling of visual attention. Nat. Rev. Neurosci. 2001, 2, 194–203. [Google Scholar] [CrossRef] [Green Version]
- Xie, Y.; Lu, H. Visual saliency detection based on Bayesian model. In Proceedings of the IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 645–648. [Google Scholar]
- Yang, C.; Zhang, L.; Lu, H.; Ruan, X.; Yang, M.H. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3166–3173. [Google Scholar]
- Jiang, Z.; Davis, L.S. Submodular salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2043–2050. [Google Scholar]
- Liu, T.; Yuan, Z.; Sun, J.; Wang, J.; Zheng, N.; Tang, X.; Shum, H.Y. Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 353–367. [Google Scholar]
- Zhang, L.; Tong, M.H.; Marks, T.K.; Shan, H.; Cottrell, G.W. SUN: A Bayesian framework for saliency using natural statistics. J. Vis. 2008, 8, 32. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Backbone | Datasets | ColorPriors | mAP | |
---|---|---|---|---|
Faster R-CNN+FPN | ResNet-101 | VOC2007 | × | 0.756 |
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | 0.764 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | × | 0.803 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | 0.811 |
Backbone | ColorPriors | Train Speed/Epoch | Test Speed/Epoch | |
---|---|---|---|---|
Faster R-CNN+FPN | ResNet-101 | × | 0.5 h | 0.08 h |
Faster R-CNN+FPN | ResNet-101 | ✓ | 0.66 h | 0.16 h |
Backbone | Datasets | ColorPriors | Color Pattern | mAP | |
---|---|---|---|---|---|
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | RGB | 0.762 |
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | YUV | 0.764 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | RGB | 0.808 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | YUV | 0.811 |
Backbone | Datasets | ColorPriors | mAP | ||
---|---|---|---|---|---|
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | × | 0.760 |
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | ✓ | 0.764 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | × | 0.807 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | ✓ | 0.811 |
Backbone | Datasets | ColorPriors | Fusion Strategy | mAP | |
---|---|---|---|---|---|
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | C_P | 0.764 |
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | PH | 0.737 |
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | PH_P | 0.759 |
Faster R-CNN+FPN | ResNet-101 | VOC2007 | ✓ | PH+P | 0.759 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | C_P | 0.811 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | PH | 0.779 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | PH_P | 0.805 |
Faster R-CNN+FPN | ResNet-101 | VOC07+12 | ✓ | PH+P | 0.804 |
Backbone | ColorPriors | mAP Using VOC07 | mAP Using VOC07+12 | |
---|---|---|---|---|
Cascade R-CNN | ResNet-50 | × | 0.726 | 0.781 |
ResNet-50 | ✓ | 0.732 | 0.788 | |
SSD300 | VGG16 | × | 0.707 | 0.775 |
VGG16 | ✓ | 0.712 | 0.782 | |
Libra R-CNN | ResNet-50 | × | 0.743 | 0.808 |
ResNet-50 | ✓ | 0.748 | 0.813 | |
RetinaNet | ResNet-50 | × | 0.712 | 0.793 |
ResNet-50 | ✓ | 0.717 | 0.797 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, P.; Lan, X.; Li, S. Object Detection Combining CNN and Adaptive Color Prior Features. Sensors 2021, 21, 2796. https://doi.org/10.3390/s21082796
Gu P, Lan X, Li S. Object Detection Combining CNN and Adaptive Color Prior Features. Sensors. 2021; 21(8):2796. https://doi.org/10.3390/s21082796
Chicago/Turabian StyleGu, Peng, Xiaosong Lan, and Shuxiao Li. 2021. "Object Detection Combining CNN and Adaptive Color Prior Features" Sensors 21, no. 8: 2796. https://doi.org/10.3390/s21082796