Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A benchmark for multi-class object counting and size estimation using deep convolutional neural networks

Published: 01 November 2022 Publication History

Abstract

Automatic object counting and object size estimation in digital images can be very useful in many real-world applications such as surveillance, smart farming, intelligent traffic systems, etc. However, most existing research mainly focus on scenarios where only one type of object is considered due to the lack of proper datasets. Furthermore, they use the traditional detection algorithms for size estimation and can only do segmenting tasks but cannot identify different types of objects and return corresponding individual size information. To fill these gaps, we create a synthetic dataset and propose a benchmark for multi-class object counting and size estimation (MOCSE) within a unified framework. We create the dataset MOCSE13 by using Unity to generate synthetic images for 13 different objects (fruits and vegetables). Besides, we propose a deep architecture approach for multi-class object counting and object size estimation. Our proposed models with different backbones are evaluated on the synthetic dataset. The experimental results provide a benchmark for multi-class object counting and size estimation and the synthetic dataset can be served as a proper testbed for future studies.

References

[1]
Aich, S., Stavness, I., 2017. Leaf counting with deep convolutional and deconvolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. pp. 2080–2089.
[2]
Al-Thani N., Albuainain A., Alnaimi F., Zorba N., Drones for sheep livestock monitoring, in: 2020 IEEE 20th Mediterranean Electrotechnical Conference (MELECON), IEEE, 2020, pp. 672–676.
[3]
Apolo-Apolo O., Martínez-Guanter J., Egea G., Raja P., Pérez-Ruiz M., Deep learning techniques for estimation of the yield and size of citrus fruits using a UAV, Eur. J. Agron. 115 (2020).
[4]
Cang Y., He H., Qiao Y., An intelligent pig weights estimate method based on deep learning in sow stall environments, IEEE Access 7 (2019) 164867–164875.
[5]
Cao L., Xiao Z., Liao X., Yao Y., Wu K., Mu J., Li J., Pu H., Automated chicken counting in surveillance camera environments based on the point supervision algorithm: LC-DenseFCN, Agriculture 11 (6) (2021) 493.
[6]
Cook D.J., Holder L.B., Youngblood G.M., Graph-based analysis of human transfer learning using a game testbed, IEEE Trans. Knowl. Data Eng. 19 (11) (2007) 1465–1478.
[7]
Ege T., Ando Y., Tanno R., Shimoda W., Yanai K., Image-based estimation of real food size for accurate food calorie estimation, in: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), IEEE, 2019, pp. 274–279.
[9]
Gao G., Liu Q., Wang Y., Counting from sky: A large-scale data set for remote sensing object counting and a benchmark method, IEEE Trans. Geosci. Remote Sens. 59 (5) (2020) 3642–3655.
[10]
Garcia Arnal Barbedo J., A review on methods for automatic counting of objects in digital images, IEEE Lat. Am. Trans. 10 (5) (2012) 2112–2124,.
[11]
Gené-Mola J., Sanz-Cortiella R., Rosell-Polo J.R., Escolà A., Gregorio E., In-field apple size estimation using photogrammetry-derived 3D point clouds: Comparison of 4 different methods considering fruit occlusions, Comput. Electron. Agric. 188 (2021).
[12]
Go H., Byun J., Park B., Choi M.-A., Yoo S., Kim C., Fine-grained multi-class object counting, in: 2021 IEEE International Conference on Image Processing (ICIP), IEEE, 2021, pp. 509–513.
[13]
Gongal A., Karkee M., Amatya S., Apple fruit size estimation using a 3D machine vision system, Inf. Process. Agric. 5 (4) (2018) 498–503.
[14]
Guo, D., Li, K., Zha, Z.-J., Wang, M., 2019. Dadnet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM International Conference on Multimedia. pp. 1823–1832.
[15]
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778.
[16]
Heinrich K., Roth A., Zschech P., Everything counts: a taxonomy of deep learning approaches for object counting, in: ECIS, 2019.
[17]
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4700–4708.
[18]
Idrees, H., Saleemi, I., Seibert, C., Shah, M., 2013. Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2547–2554.
[19]
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., Shah, M., 2018. Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 532–546.
[20]
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., Shao, L., 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6133–6142.
[21]
Jingying W., A survey on crowd counting methods and datasets, in: Advances in Computer, Communication and Computational Sciences, Springer, 2021, pp. 851–863.
[22]
Juliani A., Berges V.-P., Teng E., Cohen A., Harper J., Elion C., Goy C., Gao Y., Henry H., Mattar M., Lange D., Unity: A general platform for intelligent agents, 2020.
[23]
Lempitsky V., Zisserman A., Learning to count objects in images, Adv. Neural Inf. Process. Syst. 23 (2010).
[24]
Liu X., Chen S.W., Aditya S., Sivakumar N., Dcunha S., Qu C., Taylor C.J., Das J., Kumar V., Robust fruit counting: Combining deep learning, tracking, and structure from motion, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 1045–1052.
[25]
Ma T., Liu S., Wang Q., On data annotation efficiency for image based crowd counting, in: International Conference on Visual Communications and Image Processing (VCIP), IEEE, 2022.
[26]
Ma, N., Zhang, X., Zheng, H.-T., Sun, J., 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 116–131.
[27]
Najman L., Schmitt M., Watershed of a continuous function, Signal Process. 38 (1) (1994) 99–112.
[28]
Okamoto, K., Yanai, K., 2016. An automatic calorie estimation system of food images on a smartphone. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management. pp. 63–70.
[29]
Oo L.M., Aung N.Z., A simple and efficient method for automatic strawberry shape and size estimation and classification, Biosyst. Eng. 170 (2018) 96–107.
[30]
Pandit A., Rangole J., Literature review on object counting using image processing techniques, Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 3 (4) (2014) 8509–8512.
[31]
Paszke A., Gross S., Chintala S., Chanan G., Yang E., DeVito Z., Lin Z., Desmaison A., Antiga L., Lerer A., Automatic differentiation in pytorch, in: Thirty-First Conference on Neural Information Processing Systems, 2017.
[32]
Ponce J.M., Aquino A., Millán B., Andújar J.M., Olive-fruit mass and size estimation using image analysis and feature modeling, Sensors 18 (9) (2018) 2930.
[33]
Ren S., He K., Girshick R., Sun J., Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. 28 (2015) 91–99.
[34]
Sa J., Choi Y., Lee H., Chung Y., Park D., Cho J., Fast pig detection with a top-view camera under various illumination conditions, Symmetry 11 (2) (2019) 266.
[35]
Salamí E., Gallardo A., Skorobogatov G., Barrado C., On-the-fly olive tree counting using a UAS and cloud services, Remote Sens. 11 (3) (2019) 316.
[36]
Samet H., Tamminen M., Efficient component labeling of images of arbitrary dimension represented by linear bintrees, IEEE Trans. Pattern Anal. Mach. Intell. 10 (4) (1988) 579–586.
[37]
Shaker N., Abou-Zleikha M., Transfer learning for cross-game prediction of player experience, in: 2016 IEEE Conference on Computational Intelligence and Games (CIG), IEEE, 2016, pp. 1–8.
[38]
Simonyan K., Zisserman A., Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.
[39]
Sindagi V.A., Patel V.M., A survey of recent advances in CNN-based single image crowd counting and density estimation, Pattern Recognit. Lett. 107 (2018) 3–16. Video Surveillance-oriented Biometrics.
[40]
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z., Rethinking the inception architecture for computer vision, in: CVPR, 2016, pp. 2818–2826.
[41]
Tong P., Han P., Li S., Li N., Bu S., Li Q., Li K., Counting trees with point-wise supervised segmentation network, Eng. Appl. Artif. Intell. 100 (2021).
[42]
Wang Q., Breckon T.P., Crowd counting via segmentation guided attention networks and curriculum loss, IEEE Trans. Intell. Transp. Syst. (2022).
[43]
Wang Z., Walsh K.B., Verma B., On-tree mango fruit size estimation using RGB-D images, Sensors 17 (12) (2017) 2738.
[44]
Wei Y., Tran S., Xu S., Kang B., Springer M., Deep learning for retail product recognition: Challenges and techniques, Comput. Intell. Neurosci. 2020 (2020).
[45]
Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., Zhang, L., 2018. DOTA: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3974–3983.
[46]
Xu W., Liang D., Zheng Y., Xie J., Ma Z., Dilated-scale-aware category-attention ConvNet for multi-class object counting, IEEE Signal Process. Lett. (2021).
[47]
Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N., 2020. Reverse perspective network for perspective-aware object counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4374–4383.
[48]
Zhang G., Pan Y., Zhang L., Tiong R.L.K., Cross-scale generative adversarial network for crowd density estimation from images, Eng. Appl. Artif. Intell. 94 (2020).
[49]
Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., Shao, L., 2019. Relational attention network for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6788–6797.
[50]
Zhang B., Wang N., Zhao Z., Abraham A., Liu H., Crowd counting based on attention-guided multi-scale fusion networks, Neurocomputing 451 (2021) 12–24.
[51]
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y., 2016. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 589–597.
[52]
Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Wu, H., Nie, Q., Cheng, H., Liu, C., et al., 2018. Visdrone-vdt2018: The vision meets drone video detection and tracking challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops.

Cited By

View all
  • (2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024

Index Terms

  1. A benchmark for multi-class object counting and size estimation using deep convolutional neural networks
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Engineering Applications of Artificial Intelligence
      Engineering Applications of Artificial Intelligence  Volume 116, Issue C
      Nov 2022
      1625 pages

      Publisher

      Pergamon Press, Inc.

      United States

      Publication History

      Published: 01 November 2022

      Author Tags

      1. Multi-class object counting
      2. Crowd counting
      3. Object size estimation
      4. Convolutional neural networks
      5. Synthetic dataset

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media