research-article

A benchmark for multi-class object counting and size estimation using deep convolutional neural networks

Authors:

Fanlin MengAuthors Info & Claims

Volume 116, Issue C

https://doi.org/10.1016/j.engappai.2022.105449

Published: 01 November 2022 Publication History

Abstract

Automatic object counting and object size estimation in digital images can be very useful in many real-world applications such as surveillance, smart farming, intelligent traffic systems, etc. However, most existing research mainly focus on scenarios where only one type of object is considered due to the lack of proper datasets. Furthermore, they use the traditional detection algorithms for size estimation and can only do segmenting tasks but cannot identify different types of objects and return corresponding individual size information. To fill these gaps, we create a synthetic dataset and propose a benchmark for multi-class object counting and size estimation (MOCSE) within a unified framework. We create the dataset MOCSE13 by using Unity to generate synthetic images for 13 different objects (fruits and vegetables). Besides, we propose a deep architecture approach for multi-class object counting and object size estimation. Our proposed models with different backbones are evaluated on the synthetic dataset. The experimental results provide a benchmark for multi-class object counting and size estimation and the synthetic dataset can be served as a proper testbed for future studies.

References

[1]

Aich, S., Stavness, I., 2017. Leaf counting with deep convolutional and deconvolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. pp. 2080–2089.

[2]

Al-Thani N., Albuainain A., Alnaimi F., Zorba N., Drones for sheep livestock monitoring, in: 2020 IEEE 20th Mediterranean Electrotechnical Conference (MELECON), IEEE, 2020, pp. 672–676.

[3]

Apolo-Apolo O., Martínez-Guanter J., Egea G., Raja P., Pérez-Ruiz M., Deep learning techniques for estimation of the yield and size of citrus fruits using a UAV, Eur. J. Agron. 115 (2020).

[4]

Cang Y., He H., Qiao Y., An intelligent pig weights estimate method based on deep learning in sow stall environments, IEEE Access 7 (2019) 164867–164875.

[5]

Cao L., Xiao Z., Liao X., Yao Y., Wu K., Mu J., Li J., Pu H., Automated chicken counting in surveillance camera environments based on the point supervision algorithm: LC-DenseFCN, Agriculture 11 (6) (2021) 493.

[6]

Cook D.J., Holder L.B., Youngblood G.M., Graph-based analysis of human transfer learning using a game testbed, IEEE Trans. Knowl. Data Eng. 19 (11) (2007) 1465–1478.

[7]

Ege T., Ando Y., Tanno R., Shimoda W., Yanai K., Image-based estimation of real food size for accurate food calorie estimation, in: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), IEEE, 2019, pp. 274–279.

[8]

Enozone T., Supermarket with LOD, 2018, https://assetstore.unity.com/packages/3d/environments/urban/supermarket-with-lod-110050. Accessed: 2020-09-30.

[9]

Gao G., Liu Q., Wang Y., Counting from sky: A large-scale data set for remote sensing object counting and a benchmark method, IEEE Trans. Geosci. Remote Sens. 59 (5) (2020) 3642–3655.

[10]

Garcia Arnal Barbedo J., A review on methods for automatic counting of objects in digital images, IEEE Lat. Am. Trans. 10 (5) (2012) 2112–2124,.

[11]

Gené-Mola J., Sanz-Cortiella R., Rosell-Polo J.R., Escolà A., Gregorio E., In-field apple size estimation using photogrammetry-derived 3D point clouds: Comparison of 4 different methods considering fruit occlusions, Comput. Electron. Agric. 188 (2021).

Digital Library

[12]

Go H., Byun J., Park B., Choi M.-A., Yoo S., Kim C., Fine-grained multi-class object counting, in: 2021 IEEE International Conference on Image Processing (ICIP), IEEE, 2021, pp. 509–513.

[13]

Gongal A., Karkee M., Amatya S., Apple fruit size estimation using a 3D machine vision system, Inf. Process. Agric. 5 (4) (2018) 498–503.

[14]

Guo, D., Li, K., Zha, Z.-J., Wang, M., 2019. Dadnet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM International Conference on Multimedia. pp. 1823–1832.

[15]

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778.

[16]

Heinrich K., Roth A., Zschech P., Everything counts: a taxonomy of deep learning approaches for object counting, in: ECIS, 2019.

[17]

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4700–4708.

[18]

Idrees, H., Saleemi, I., Seibert, C., Shah, M., 2013. Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2547–2554.

[19]

Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., Shah, M., 2018. Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 532–546.

[20]

Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., Shao, L., 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6133–6142.

[21]

Jingying W., A survey on crowd counting methods and datasets, in: Advances in Computer, Communication and Computational Sciences, Springer, 2021, pp. 851–863.

[22]

Juliani A., Berges V.-P., Teng E., Cohen A., Harper J., Elion C., Goy C., Gao Y., Henry H., Mattar M., Lange D., Unity: A general platform for intelligent agents, 2020.

[23]

Lempitsky V., Zisserman A., Learning to count objects in images, Adv. Neural Inf. Process. Syst. 23 (2010).

[24]

Liu X., Chen S.W., Aditya S., Sivakumar N., Dcunha S., Qu C., Taylor C.J., Das J., Kumar V., Robust fruit counting: Combining deep learning, tracking, and structure from motion, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 1045–1052.

[25]

Ma T., Liu S., Wang Q., On data annotation efficiency for image based crowd counting, in: International Conference on Visual Communications and Image Processing (VCIP), IEEE, 2022.

[26]

Ma, N., Zhang, X., Zheng, H.-T., Sun, J., 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 116–131.

[27]

Najman L., Schmitt M., Watershed of a continuous function, Signal Process. 38 (1) (1994) 99–112.

Digital Library

[28]

Okamoto, K., Yanai, K., 2016. An automatic calorie estimation system of food images on a smartphone. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management. pp. 63–70.

[29]

Oo L.M., Aung N.Z., A simple and efficient method for automatic strawberry shape and size estimation and classification, Biosyst. Eng. 170 (2018) 96–107.

[30]

Pandit A., Rangole J., Literature review on object counting using image processing techniques, Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 3 (4) (2014) 8509–8512.

[31]

Paszke A., Gross S., Chintala S., Chanan G., Yang E., DeVito Z., Lin Z., Desmaison A., Antiga L., Lerer A., Automatic differentiation in pytorch, in: Thirty-First Conference on Neural Information Processing Systems, 2017.

[32]

Ponce J.M., Aquino A., Millán B., Andújar J.M., Olive-fruit mass and size estimation using image analysis and feature modeling, Sensors 18 (9) (2018) 2930.

[33]

Ren S., He K., Girshick R., Sun J., Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. 28 (2015) 91–99.

[34]

Sa J., Choi Y., Lee H., Chung Y., Park D., Cho J., Fast pig detection with a top-view camera under various illumination conditions, Symmetry 11 (2) (2019) 266.

[35]

Salamí E., Gallardo A., Skorobogatov G., Barrado C., On-the-fly olive tree counting using a UAS and cloud services, Remote Sens. 11 (3) (2019) 316.

[36]

Samet H., Tamminen M., Efficient component labeling of images of arbitrary dimension represented by linear bintrees, IEEE Trans. Pattern Anal. Mach. Intell. 10 (4) (1988) 579–586.

[37]

Shaker N., Abou-Zleikha M., Transfer learning for cross-game prediction of player experience, in: 2016 IEEE Conference on Computational Intelligence and Games (CIG), IEEE, 2016, pp. 1–8.

[38]

Simonyan K., Zisserman A., Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.

[39]

Sindagi V.A., Patel V.M., A survey of recent advances in CNN-based single image crowd counting and density estimation, Pattern Recognit. Lett. 107 (2018) 3–16. Video Surveillance-oriented Biometrics.

[40]

Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z., Rethinking the inception architecture for computer vision, in: CVPR, 2016, pp. 2818–2826.

[41]

Tong P., Han P., Li S., Li N., Bu S., Li Q., Li K., Counting trees with point-wise supervised segmentation network, Eng. Appl. Artif. Intell. 100 (2021).

[42]

Wang Q., Breckon T.P., Crowd counting via segmentation guided attention networks and curriculum loss, IEEE Trans. Intell. Transp. Syst. (2022).

[43]

Wang Z., Walsh K.B., Verma B., On-tree mango fruit size estimation using RGB-D images, Sensors 17 (12) (2017) 2738.

[44]

Wei Y., Tran S., Xu S., Kang B., Springer M., Deep learning for retail product recognition: Challenges and techniques, Comput. Intell. Neurosci. 2020 (2020).

Digital Library

[45]

Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., Zhang, L., 2018. DOTA: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3974–3983.

[46]

Xu W., Liang D., Zheng Y., Xie J., Ma Z., Dilated-scale-aware category-attention ConvNet for multi-class object counting, IEEE Signal Process. Lett. (2021).

[47]

Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N., 2020. Reverse perspective network for perspective-aware object counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4374–4383.

[48]

Zhang G., Pan Y., Zhang L., Tiong R.L.K., Cross-scale generative adversarial network for crowd density estimation from images, Eng. Appl. Artif. Intell. 94 (2020).

[49]

Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., Shao, L., 2019. Relational attention network for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6788–6797.

[50]

Zhang B., Wang N., Zhao Z., Abraham A., Liu H., Crowd counting based on attention-guided multi-scale fusion networks, Neurocomputing 451 (2021) 12–24.

[51]

Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y., 2016. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 589–597.

[52]

Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Wu, H., Nie, Q., Cheng, H., Liu, C., et al., 2018. Visdrone-vdt2018: The vision meets drone video detection and tracking challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops.

Cited By

Wang MZhou XChen Y(2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10115-023-02056-5

Index Terms

A benchmark for multi-class object counting and size estimation using deep convolutional neural networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Unreal mask: one-shot multi-object class-based pose estimation for robotic manipulation using keypoints with a synthetic dataset
Abstract
Object pose estimation is a prerequisite for many robotic applications. Preparing dataset for network training is a challenging part of the pose estimation approaches, and in most of them, the network can detect just the trained objects. Synthetic ...
A Generic Approach Towards Image Manipulation Parameter Estimation Using Convolutional Neural Networks
IH&MMSec '17: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security

Estimating manipulation parameter values is an important problem in image forensics. While several algorithms have been proposed to accomplish this, their application is exclusively limited to one type of image manipulation. These existing techniques ...
Water body segmentation of Synthetic Aperture Radar image using Deep Convolutional Neural Networks
Abstract
Deep Convolutional Neural Networks are finding their way into modern machine learning tasks and proved themselves to become one of the best contenders for future development in the field. Several proposed methods in image segmentation ...

Comments

Information & Contributors

Information

Published In

cover image Engineering Applications of Artificial Intelligence

Engineering Applications of Artificial Intelligence Volume 116, Issue C

Nov 2022

1625 pages

ISSN:0952-1976

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 November 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang MZhou XChen Y(2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10115-023-02056-5

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents