From macro to micro: rethinking multi-scale pedestrian detection

He, Yuzhe; He, Ning; Yu, Haigang; Zhang, Ren; Yan, Kang

doi:10.1007/s00530-023-01058-1

From macro to micro: rethinking multi-scale pedestrian detection

Regular Paper
Published: 01 March 2023

Volume 29, pages 1417–1429, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yuzhe He¹,
Ning He¹,
Haigang Yu¹,
Ren Zhang¹ &
…
Kang Yan¹

376 Accesses
Explore all metrics

Abstract

Pedestrian detection is the use of computer vision techniques to determine whether there are pedestrians in an image or video sequence and give their precise positioning, but the difference in the scale of pedestrians has always been a difficult problem in pedestrian detection. In contrast to existing research, this study jointly considers the problem of multi-scale pedestrian detection at both the macro- and micro-levels. At the macro-level, the shape and location of an anchor are predicted by feature maps to guide its generation, and the obtained anchor can better adapt to pedestrian targets at different scales. At the micro-level, the standard convolution in the backbone network is replaced with switchable atrous convolution, which effectively solves the problem of scale differences between pedestrians. Finally, the classification and regression tasks in pedestrian detection are completed more efficiently through the use of a Double Head. These elements are combined to form a multi-scale pedestrian detection network, and experimental results show that the model proposed in this paper can substantially improve the performance of multi-scale pedestrian detection. The detection accuracy on the COCOPersons dataset reaches an average precision (AP) of 57.3. Compared with the pedestrian detection accuracy of Faster R-CNN based on a feature pyramid network at large, medium, and small scales, the accuracy of our model is significantly improved at 1.7 AP, 2.5 AP, and 6.8 AP, respectively. On the Caltech pedestrian dataset, the ${\text {MR}}^{2}$ of Near, Medium and Far subsets reach 0.45%, 13.78% and 48.85%, respectively. And on the CityPersons pedestrian dataset, the ${\text {MR}}^{2}$ of Small, Medium and Large subsets reach 12.1%, 2.6% and 5.5%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

R-SSD: refined single shot multibox detector for pedestrian detection

Article 14 January 2022

Multi-scale feature balance enhancement network for pedestrian detection

Article 05 March 2022

A Scale-Aware YOLO Model for Pedestrian Detection

Data availability

The data that support the findings of this study are available from the corresponding author, Ning He, upon reasonable request.

References

Alsaih, K., Yusoff, M.Z., Tang, T.B., Faye, I., Mériaudeau, F.: Performance evaluation of convolutions and atrous convolutions in deep networks for retinal disease segmentation on optical coherence tomography volumes. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, pp. 1863–1866 (2020)
Brazil, G., Liu, X.: Pedestrian detection with autoregressive network phases. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7231–7240 (2019)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154–6162 (2018)
Cai, J., Lee, F., Yang, S., Lin, C., Chen, H., Kotani, K., Chen, Q.: Pedestrian as points: an improved anchor-free method for center-based pedestrian detection. IEEE Access 8, 179666–179677 (2020)
Article Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587
Chen, K.M., Cofer, E.M., Zhou, J., Troyanskaya, O.G.: Selene: a pytorch-based deep learning library for sequence data. Nat. Methods 16(4), 315–318 (2019)
Article Google Scholar
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13039–13048 (2021)
Cheng, Q., Chen, M., Wu, Y., Chen, F., Lin, S.: Magnifiernet: Learning efficient small-scale pedestrian detector towards multiple dense regions. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 1483–1490 (2021)
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 29 (2016)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1. Ieee, pp. 886–893 (2005)
Ding, M., Zhang, S., Yang, J.: Learning a dynamic high-resolution network for multi-scale pedestrian detection. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 9076–9082 (2021)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)
Article Google Scholar
Du, X., El-Khamy, M., Lee, J., Davis, L.: Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 953–961 (2017)
Fan, Q., Zhuo, W., Tang, C.K., Tai, Y.W.: Few-shot object detection with attention-rpn and multi-relation detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4013–4022 (2020)
Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045 (2019)
Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, Y., He, N., Zhang, R., Yan, K., Yu, H.: Multi-scale feature balance enhancement network for pedestrian detection. Multimedia Syst. 28(3), 1135–1145 (2022)
Article Google Scholar
Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection (2015). arXiv preprint arXiv:1509.04874
Huang, S., Lu, Z., Cheng, R., He, C.: Fapn: Feature-aligned pyramid network for dense image prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 864–873 (2021)
Jin, Z., Liu, B., Chu, Q., Yu, N.: Safnet: a semi-anchor-free network with enhanced feature pyramid for object detection. IEEE Trans. Image Process. 29, 9445–9457 (2020)
Article MATH Google Scholar
Kim, M., Ilyas, N., Kim, K.: Amsaseg: an attention-based multi-scale atrous convolutional neural network for real-time object segmentation from 3d point cloud. IEEE Access 9, 70789–70796 (2021)
Article Google Scholar
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Detnet: a backbone network for object detection (2018). arXiv preprint arXiv:1804.06215
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp. 740–755 (2014)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer, pp. 21–37 (2016)
Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5187–5196 (2019)
Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: Cbnet: A novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11653–11660 (2020)
Liu, X., Chen, H.X., Liu, B.Y.: Dynamic anchor: a feature-guided anchor strategy for object detection. Appl. Sci. 12(10), 4897 (2022)
Article Google Scholar
Ma, W., Tian, T., Xu, H., Huang, Y., Li, Z.: Aabo: Adaptive anchor box optimization for object detection via bayesian sub-sampling. In: European Conference on Computer Vision. Springer, pp. 560–575 (2020)
Ming, Q., Zhou, Z., Miao, L., Zhang, H., Li, L.: Dynamic anchor learning for arbitrary-oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2355–2363 (2021)
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29(1), 51–59 (1996)
Article Google Scholar
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019)
Qiao, S., Chen, L.C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10213–10224 (2021)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Singh, B., Najibi, M., Davis, L.S.: Sniper: Efficient multi-scale training. Adv. Neural Inf. Process. Syst. 31 (2018)
Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 536–551 (2018)
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., et al.: Sparse r-cnn: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Tan, Y., Yao, H., Li, H., Lu, X., Xie, H.: Prf-ped: Multi-scale pedestrian detector with prior-based receptive field. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 6059–6064 (2021)
Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness nms and bounded iou loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6877–6885 (2018)
Vu, T., Kang, H., Yoo, C.D.: Scnet: training inference sample consistency for instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2701–2709 (2021)
Wang, W.: Adapted center and scale prediction: more stable and more accurate (2020). arXiv preprint arXiv:2002.09053
Wang, Z., Ji, S.: Smoothed dilated convolutions for improved dense prediction. Data Min. Knowl. Disc. 35(4), 1470–1496 (2021)
Article MathSciNet MATH Google Scholar
Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2965–2974 (2019)
Wang, H., Li, Y., Wang, S.: Fast pedestrian detection with attention-enhanced multi-scale rpn and soft-cascaded decision trees. IEEE Trans. Intell. Transp. Syst. 21(12), 5086–5093 (2019)
Article Google Scholar
Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., Fu, Y.: Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10186–10195 (2020)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Yang, T., Zhang, X., Li, Z., Zhang, W., Sun, J.: Metaanchor: learning to detect objects with customized anchors. Adv. Neural Inf. Process. Syst. 31 (2018)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv preprint arXiv:1511.07122
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 472–480 (2017)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision. Springer, pp. 443–457 (2016)
Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)
Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6995–7003 (2018)
Zhang, H., Chang, H., Ma, B., Wang, N., Chen, X.: Dynamic r-cnn: towards high quality object detection via dynamic training. In: European Conference on Computer Vision. Springer, pp. 260–275 (2020)
Zhang, Y., Wang, Z., Mao, Y.: Rpn prototype alignment for domain adaptive object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12425–12434 (2021)
Zhong, Y., Wang, J., Peng, J., Zhang, L.: Anchor box optimization for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1286–1294 (2020)
Zhu, Y., Wang, J., Zhao, C., Guo, H., Lu, H.: Scale-adaptive deconvolutional regression network for pedestrian detection. In: Asian Conference on Computer Vision. Springer, pp. 416–430 (2017)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61872042, 61972375, 62172045), the Science and Technology Project of Beijing Municipal Commission of Education (KM202111417009, KM201811417005), the Major Project of Technological Innovation 2030—“New Generation Artificial Intelligence” (2018AAA0100800), Premium Funding Project for Academic Human Resources Development in Beijing Union University (BPHR2020AZ01, BPH2020EZ01), the Key Project of Beijing Municipal Commission of Education (KZ201911417048).

Author information

Authors and Affiliations

Beijing Union University, Beijing, 100101, China
Yuzhe He, Ning He, Haigang Yu, Ren Zhang & Kang Yan

Authors

Yuzhe He
View author publications
You can also search for this author in PubMed Google Scholar
Ning He
View author publications
You can also search for this author in PubMed Google Scholar
Haigang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ren Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kang Yan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

NH, contributed to the conception of the study; YH, performed the data analyses, performed the experiment and wrote the manuscript; HY, RZ and KY, helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Ning He.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 810 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, Y., He, N., Yu, H. et al. From macro to micro: rethinking multi-scale pedestrian detection. Multimedia Systems 29, 1417–1429 (2023). https://doi.org/10.1007/s00530-023-01058-1

Download citation

Received: 26 June 2022
Accepted: 02 February 2023
Published: 01 March 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00530-023-01058-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From macro to micro: rethinking multi-scale pedestrian detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

R-SSD: refined single shot multibox detector for pedestrian detection

Multi-scale feature balance enhancement network for pedestrian detection

A Scale-Aware YOLO Model for Pedestrian Detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 810 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

From macro to micro: rethinking multi-scale pedestrian detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

R-SSD: refined single shot multibox detector for pedestrian detection

Multi-scale feature balance enhancement network for pedestrian detection

A Scale-Aware YOLO Model for Pedestrian Detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 810 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation