Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

From macro to micro: rethinking multi-scale pedestrian detection

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Pedestrian detection is the use of computer vision techniques to determine whether there are pedestrians in an image or video sequence and give their precise positioning, but the difference in the scale of pedestrians has always been a difficult problem in pedestrian detection. In contrast to existing research, this study jointly considers the problem of multi-scale pedestrian detection at both the macro- and micro-levels. At the macro-level, the shape and location of an anchor are predicted by feature maps to guide its generation, and the obtained anchor can better adapt to pedestrian targets at different scales. At the micro-level, the standard convolution in the backbone network is replaced with switchable atrous convolution, which effectively solves the problem of scale differences between pedestrians. Finally, the classification and regression tasks in pedestrian detection are completed more efficiently through the use of a Double Head. These elements are combined to form a multi-scale pedestrian detection network, and experimental results show that the model proposed in this paper can substantially improve the performance of multi-scale pedestrian detection. The detection accuracy on the COCOPersons dataset reaches an average precision (AP) of 57.3. Compared with the pedestrian detection accuracy of Faster R-CNN based on a feature pyramid network at large, medium, and small scales, the accuracy of our model is significantly improved at 1.7 AP, 2.5 AP, and 6.8 AP, respectively. On the Caltech pedestrian dataset, the \({\text {MR}}^{2}\) of Near, Medium and Far subsets reach 0.45%, 13.78% and 48.85%, respectively. And on the CityPersons pedestrian dataset, the \({\text {MR}}^{2}\) of Small, Medium and Large subsets reach 12.1%, 2.6% and 5.5%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author, Ning He, upon reasonable request.

References

  1. Alsaih, K., Yusoff, M.Z., Tang, T.B., Faye, I., Mériaudeau, F.: Performance evaluation of convolutions and atrous convolutions in deep networks for retinal disease segmentation on optical coherence tomography volumes. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, pp. 1863–1866 (2020)

  2. Brazil, G., Liu, X.: Pedestrian detection with autoregressive network phases. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7231–7240 (2019)

  3. Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154–6162 (2018)

  4. Cai, J., Lee, F., Yang, S., Lin, C., Chen, H., Kotani, K., Chen, Q.: Pedestrian as points: an improved anchor-free method for center-based pedestrian detection. IEEE Access 8, 179666–179677 (2020)

    Article  Google Scholar 

  5. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587

  6. Chen, K.M., Cofer, E.M., Zhou, J., Troyanskaya, O.G.: Selene: a pytorch-based deep learning library for sequence data. Nat. Methods 16(4), 315–318 (2019)

    Article  Google Scholar 

  7. Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13039–13048 (2021)

  8. Cheng, Q., Chen, M., Wu, Y., Chen, F., Lin, S.: Magnifiernet: Learning efficient small-scale pedestrian detector towards multiple dense regions. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 1483–1490 (2021)

  9. Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 29 (2016)

  10. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)

  11. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1. Ieee, pp. 886–893 (2005)

  12. Ding, M., Zhang, S., Yang, J.: Learning a dynamic high-resolution network for multi-scale pedestrian detection. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 9076–9082 (2021)

  13. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)

    Article  Google Scholar 

  14. Du, X., El-Khamy, M., Lee, J., Davis, L.: Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 953–961 (2017)

  15. Fan, Q., Zhuo, W., Tang, C.K., Tai, Y.W.: Few-shot object detection with attention-rpn and multi-relation detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4013–4022 (2020)

  16. Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045 (2019)

  17. Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  19. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  20. He, Y., He, N., Zhang, R., Yan, K., Yu, H.: Multi-scale feature balance enhancement network for pedestrian detection. Multimedia Syst. 28(3), 1135–1145 (2022)

    Article  Google Scholar 

  21. Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection (2015). arXiv preprint arXiv:1509.04874

  22. Huang, S., Lu, Z., Cheng, R., He, C.: Fapn: Feature-aligned pyramid network for dense image prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 864–873 (2021)

  23. Jin, Z., Liu, B., Chu, Q., Yu, N.: Safnet: a semi-anchor-free network with enhanced feature pyramid for object detection. IEEE Trans. Image Process. 29, 9445–9457 (2020)

    Article  MATH  Google Scholar 

  24. Kim, M., Ilyas, N., Kim, K.: Amsaseg: an attention-based multi-scale atrous convolutional neural network for real-time object segmentation from 3d point cloud. IEEE Access 9, 70789–70796 (2021)

    Article  Google Scholar 

  25. Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Detnet: a backbone network for object detection (2018). arXiv preprint arXiv:1804.06215

  26. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp. 740–755 (2014)

  27. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  28. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  29. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer, pp. 21–37 (2016)

  30. Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5187–5196 (2019)

  31. Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: Cbnet: A novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11653–11660 (2020)

  32. Liu, X., Chen, H.X., Liu, B.Y.: Dynamic anchor: a feature-guided anchor strategy for object detection. Appl. Sci. 12(10), 4897 (2022)

    Article  Google Scholar 

  33. Ma, W., Tian, T., Xu, H., Huang, Y., Li, Z.: Aabo: Adaptive anchor box optimization for object detection via bayesian sub-sampling. In: European Conference on Computer Vision. Springer, pp. 560–575 (2020)

  34. Ming, Q., Zhou, Z., Miao, L., Zhang, H., Li, L.: Dynamic anchor learning for arbitrary-oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2355–2363 (2021)

  35. Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29(1), 51–59 (1996)

    Article  Google Scholar 

  36. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019)

  37. Qiao, S., Chen, L.C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10213–10224 (2021)

  38. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  39. Singh, B., Najibi, M., Davis, L.S.: Sniper: Efficient multi-scale training. Adv. Neural Inf. Process. Syst. 31 (2018)

  40. Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 536–551 (2018)

  41. Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., et al.: Sparse r-cnn: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)

  42. Tan, Y., Yao, H., Li, H., Lu, X., Xie, H.: Prf-ped: Multi-scale pedestrian detector with prior-based receptive field. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 6059–6064 (2021)

  43. Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness nms and bounded iou loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6877–6885 (2018)

  44. Vu, T., Kang, H., Yoo, C.D.: Scnet: training inference sample consistency for instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2701–2709 (2021)

  45. Wang, W.: Adapted center and scale prediction: more stable and more accurate (2020). arXiv preprint arXiv:2002.09053

  46. Wang, Z., Ji, S.: Smoothed dilated convolutions for improved dense prediction. Data Min. Knowl. Disc. 35(4), 1470–1496 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  47. Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2965–2974 (2019)

  48. Wang, H., Li, Y., Wang, S.: Fast pedestrian detection with attention-enhanced multi-scale rpn and soft-cascaded decision trees. IEEE Trans. Intell. Transp. Syst. 21(12), 5086–5093 (2019)

    Article  Google Scholar 

  49. Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., Fu, Y.: Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10186–10195 (2020)

  50. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

  51. Yang, T., Zhang, X., Li, Z., Zhang, W., Sun, J.: Metaanchor: learning to detect objects with customized anchors. Adv. Neural Inf. Process. Syst. 31 (2018)

  52. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv preprint arXiv:1511.07122

  53. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 472–480 (2017)

  54. Zhang, L., Lin, L., Liang, X., He, K.: Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision. Springer, pp. 443–457 (2016)

  55. Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)

  56. Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6995–7003 (2018)

  57. Zhang, H., Chang, H., Ma, B., Wang, N., Chen, X.: Dynamic r-cnn: towards high quality object detection via dynamic training. In: European Conference on Computer Vision. Springer, pp. 260–275 (2020)

  58. Zhang, Y., Wang, Z., Mao, Y.: Rpn prototype alignment for domain adaptive object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12425–12434 (2021)

  59. Zhong, Y., Wang, J., Peng, J., Zhang, L.: Anchor box optimization for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1286–1294 (2020)

  60. Zhu, Y., Wang, J., Zhao, C., Guo, H., Lu, H.: Scale-adaptive deconvolutional regression network for pedestrian detection. In: Asian Conference on Computer Vision. Springer, pp. 416–430 (2017)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61872042, 61972375, 62172045), the Science and Technology Project of Beijing Municipal Commission of Education (KM202111417009, KM201811417005), the Major Project of Technological Innovation 2030—“New Generation Artificial Intelligence” (2018AAA0100800), Premium Funding Project for Academic Human Resources Development in Beijing Union University (BPHR2020AZ01, BPH2020EZ01), the Key Project of Beijing Municipal Commission of Education (KZ201911417048).

Author information

Authors and Affiliations

Authors

Contributions

NH, contributed to the conception of the study; YH, performed the data analyses, performed the experiment and wrote the manuscript; HY, RZ and KY, helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Ning He.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 810 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Y., He, N., Yu, H. et al. From macro to micro: rethinking multi-scale pedestrian detection. Multimedia Systems 29, 1417–1429 (2023). https://doi.org/10.1007/s00530-023-01058-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01058-1

Keywords