IDPD: improved deformable-DETR for crowd pedestrian detection

Han, Wenjing; He, Ning; Wang, Xin; Sun, Fengxi; Liu, Shengjie

doi:10.1007/s11760-023-02896-2

IDPD: improved deformable-DETR for crowd pedestrian detection

Original Paper
Published: 17 December 2023

Volume 18, pages 2243–2253, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Wenjing Han¹,
Ning He¹,
Xin Wang¹,
Fengxi Sun¹ &
…
Shengjie Liu²

598 Accesses
Explore all metrics

Abstract

Pedestrian detection is an important basis for many pedestrian-related applications and studies, and has received extensive attention in recent years. The end-to-end DEtection TRansformer (DETR) is a method that avoids the manual design of components and achieves better results than convolutional neural networks in general object detection. Inspired by this, we present the Improved Deformable-DETR for crowd Pedestrian Detection (IDPD). First, we propose a dynamic neck, specifically, one that uses omni-dimensional dynamic convolution to change the number of channels in the neck feature maps, to alleviate the problem of pedestrian information loss caused by the reduction in the number of channels in the feature maps. Second, we design a hybrid decoding loss that incorporates one-to-one Hungarian matching loss, one-to-many Hungarian matching auxiliary loss, and reconstruction loss for reconstructing full-body boxes from noisy visible part boxes based on contrastive denoising method, to tackle the slow convergence issue in Deformable-DETR for crowd pedestrian detection caused by the more serious positive and negative sample imbalance and unstable bipartite map matching problems. IDPD was experimentally evaluated on the CrowdHuman validation dataset. When using ResNet-50 as the backbone network, it obtains the results of 93.22% AP, 39.22% MR$^{-2}$, and 85.02% JI, outperforming the Deformable-DETR baseline and surpassing CNN-based models. Furthermore, even better results are obtained (94.16% AP, 37.05% MR$^{-2}$, and 86.07% JI) when using Swin-T as the backbone network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

Real-time high-precision pedestrian tracking: a detection–tracking–correction strategy based on improved SSD and Cascade R-CNN

Article 25 November 2021

R-SSD: refined single shot multibox detector for pedestrian detection

Article 14 January 2022

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020). Springer
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. arXiv preprint arXiv:2207.13080 (2022)
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dn-detr: Accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13619–13627 (2022)
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.-Y.: Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022)
Lin, M., Li, C., Bu, X., Sun, M., Lin, C., Yan, J., Ouyang, W., Deng, Z.: Detr for crowd pedestrian detection. arXiv preprint arXiv:2012.06785 (2020)
Zheng, A., Zhang, Y., Zhang, X., Qi, X., Sun, J.: Progressive end-to-end object detection in crowded scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 857–866 (2022)
Chu, X., Zheng, A., Zhang, X., Sun, J.: Detection in crowded scenes: One proposal, multiple predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12214–12223 (2020)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Li, C., Zhou, A., Yao, A.: Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947 (2022)
Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Zhang, S., Benenson, R., Schiele, B.: Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. 28, 845 (2015)
Google Scholar
Zhou, C., Yuan, J.: Bi-box regression for pedestrian detection and occlusion estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 135–151 (2018)
Zhang, K., Xiong, F., Sun, P., Hu, L., Li, B., Yu, G.: Double anchor r-cnn for human detection in a crowd. arXiv preprint arXiv:1909.09998 (2019)
Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Relational learning for joint head and human detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10647–10654 (2020)
Liu, S., Huang, D., Wang, Y.: Adaptive nms: Refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6459–6468 (2019)
Zhou, P., Zhou, C., Peng, P., Du, J., Sun, X., Guo, X., Huang, F.: Noh-nms: Improving pedestrian detection by nearby objects hallucination. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1967–1975 (2020)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774–7783 (2018)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 637–653 (2018)
Meng, D., Chen, X., Fan, Z., Zeng, G., Li, H., Yuan, Y., Sun, L., Wang, J.: Conditional detr for fast training convergence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3651–3660 (2021)
Wang, Y., Zhang, X., Yang, T., Sun, J.: Anchor detr: Query design for transformer-based detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2567–2575 (2022)
Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329 (2022)
Ge, Z., Jie, Z., Huang, X., Xu, R., Yoshie, O.: Ps-rcnn: Detecting secondary human instances in a crowd via primary object suppression. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2020). IEEE
Rukhovich, D., Sofiiuk, K., Galeev, D., Barinova, O., Konushin, A.: Iterdet: iterative scheme for object detection in crowded environments. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pp. 344–354 (2021). Springer
Xu, Z., Li, B., Yuan, Y., Dang, A.: Beta R-CNN: Looking into pedestrian detection from another perspective. Adv. Neural. Inf. Process. Syst. 33, 19953–19963 (2020)
Zhang, S., Wang, X., Wang, J., Pang, J., Lyu, C., Zhang, W., Luo, P., Chen, K.: Dense distinct query for end-to-end object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7329–7338 (2023)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 618–634 (2018)

Download references

Acknowledgements

The authors would like to express their sincere gratitude to the College of Smart City of Beijing Union University of Beijing, china, and College of Robotics of Beijing Union University of Beijing, china, for their invaluable support and assistance in this study.

Funding

This work is supported by the National Natural Science Foundation of China ( 62272049, 62236006, 62172045), the key Projects of Beijing Union University (ZKZD202301).”

Author information

Authors and Affiliations

College of Smart City, Beijing Union University, Beijing, 100101, China
Wenjing Han, Ning He, Xin Wang & Fengxi Sun
College of Robotics, Beijing Union University, Beijing, 100101, China
Shengjie Liu

Authors

Wenjing Han
View author publications
You can also search for this author in PubMed Google Scholar
Ning He
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fengxi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shengjie Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. WH contributed to conceptualization, methodology, software, investigation, formal analysis, and writing—original draft; NH contributed to supervision, funding acquisition, and writing—review and editing; XW contributed to visualization and data curation; FS involved in ablation experiments and validation; SL involved in writing—original draft. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ning He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, W., He, N., Wang, X. et al. IDPD: improved deformable-DETR for crowd pedestrian detection. SIViP 18, 2243–2253 (2024). https://doi.org/10.1007/s11760-023-02896-2

Download citation

Received: 26 August 2023
Revised: 09 October 2023
Accepted: 15 November 2023
Published: 17 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11760-023-02896-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IDPD: improved deformable-DETR for crowd pedestrian detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

Real-time high-precision pedestrian tracking: a detection–tracking–correction strategy based on improved SSD and Cascade R-CNN

R-SSD: refined single shot multibox detector for pedestrian detection

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

IDPD: improved deformable-DETR for crowd pedestrian detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

Real-time high-precision pedestrian tracking: a detection–tracking–correction strategy based on improved SSD and Cascade R-CNN

R-SSD: refined single shot multibox detector for pedestrian detection

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation