Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dilated Convolution-based Feature Refinement Network for Crowd Localization

Published: 12 July 2023 Publication History

Abstract

As an emerging computer vision task, crowd localization has received increasing attention due to its ability to produce more accurate spatially predictions. However, continuous scale variations in complex crowd scenes lead to tiny individuals at the edges, so that existing methods cannot achieve precise crowd localization. Aiming at alleviating the above problems, we propose a novel Dilated Convolution-based Feature Refinement Network (DFRNet) to enhance the representation learning capability. Specifically, the DFRNet is built with three branches that can capture the information of each individual in crowd scenes more precisely. More specifically, we introduce a Feature Perception Module to model long-range contextual information at different scales by adopting multiple dilated convolutions, thus providing sufficient feature information to perceive tiny individuals at the edge of images. Afterwards, a Feature Refinement Module is deployed at multiple stages of the three branches to facilitate the mutual refinement of feature information at different scales, thus further improving the expression capability of multi-scale contextual information. By incorporating the above modules, DFRNet can locate individuals in complex scenes more precisely. Extensive experiments on multiple datasets demonstrate that the proposed method has more advanced performance compared to existing methods and can be more accurately adapted to complex crowd scenes.

References

[1]
Shahira Abousamra, Minh Hoai, Dimitris Samaras, and Chao Chen. 2021. Localization in the crowd with topological constraints. In Proceedings of the AAAI Annual Conference on Artificial Intelligence (AAAI’21).
[2]
Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1014–1021.
[3]
Marcella Astrid, Muhammad Zaigham Zaheer, and Seung-Ik Lee. 2021. Synthetic temporal anomaly guided end-to-end video anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 207–214.
[4]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 12 (2017), 2481–2495.
[5]
Geng Chen and Peirong Guo. 2021. Enhanced information fusion network for crowd counting. arXiv:2101.04279. Retrieved from https://arxiv.org/abs/2101.04279.
[6]
Guangyi Chen, Junlong Li, Jiwen Lu, and Jie Zhou. 2021. Human trajectory prediction via counterfactual analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9824–9833.
[7]
Xinya Chen, Yanrui Bin, Nong Sang, and Changxin Gao. 2019. Scale pyramid network for crowd counting. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 1941–1950.
[8]
Xiuqi Chen, Xiao Yu, Huijun Di, and Shunzhou Wang. 2021. SA-InterNet: Scale-aware interaction network for joint crowd counting and localization. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV’21). Springer, 203–215.
[9]
Jian Cheng, Haipeng Xiong, Zhiguo Cao, and Hao Lu. 2021. Decoupled two-stage crowd counting and beyond. IEEE Trans. Image Process. 30 (2021), 2862–2875.
[10]
Kalyan Das, Jiming Jiang, and J. N. K. Rao. 2004. Mean squared error of empirical predictor. Ann. Stat. 32, 2 (2004), 818–840.
[11]
Mauro dos Santos de Arruda, Lucas Prado Osco, Plabiany Rodrigo Acosta, Diogo Nunes Gonçalves, José Marcato Junior, Ana Paula Marques Ramos, Edson Takashi Matsubara, Zhipeng Luo, Jonathan Li, Jonathan de Andrade Silva, et al. 2021. Counting and locating high-density objects using convolutional neural network. arXiv:2102.04366. Retrieved from https://arxiv.org/abs/2102.04366.
[12]
Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. Retinaface: Single-stage dense face localisation in the wild. arXiv:1905.00641. Retrieved from https://arxiv.org/abs/1905.00641.
[13]
Junyu Gao, Maoguo Gong, and Xuelong Li. 2021. Congested crowd instance localization with dilated convolutional swin transformer. arXiv:2108.00584. Retrieved from https://arxiv.org/abs/2108.00584.
[14]
Junyu Gao, Tao Han, Qi Wang, and Yuan Yuan. 2019. Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv:1912.03677. Retrieved from https:/arxiv.org/abs/1912.03677.
[15]
Junyu Gao, Tao Han, Yuan Yuan, and Qi Wang. 2020. Learning independent instance maps for crowd localization. arXiv:2012.04164. Retrieved from https://arxiv.org/abs/2012.04164.
[16]
Junru Gu, Chen Sun, and Hang Zhao. 2021. Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15303–15312.
[17]
Dan Guo, Kun Li, Zheng-Jun Zha, and Meng Wang. 2019. Dadnet: Dilated-attention-deformable convnet for crowd counting. In Proceedings of the 27th ACM International Conference on Multimedia. 1823–1832.
[18]
Mohammad Hossain, Mehrdad Hosseinzadeh, Omit Chanda, and Yang Wang. 2019. Crowd counting using scale-aware attention networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 1280–1288.
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[20]
Peiyun Hu and Deva Ramanan. 2017. Finding tiny faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 951–959.
[21]
Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, and Mubarak Shah. 2018. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European Conference on Computer Vision (ECCV’18). 532–546.
[22]
Minyang Jiang, Jianzhe Lin, and Z. Jane Wang. 2021. A smartly simple way for joint crowd counting and localization. Neurocomputing 459 (2021), 35–43.
[23]
Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, and Ling Shao. 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6133–6142.
[24]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012), 1097–1105.
[25]
Xuelong Li, Mulin Chen, Feiping Nie, and Qi Wang. 2017. A multiview-based parameter free framework for group detection. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.
[26]
Xuelong Li, Mulin Chen, and Qi Wang. 2020. Quantifying and detecting collective motion in crowd scenes. IEEE Trans. Image Process. 29 (2020), 5571–5583.
[27]
Zhihang Li, Xu Tang, Junyu Han, Jingtuo Liu, and Ran He. 2019. Pyramidbox++: High performance detector for finding tiny face. arXiv:1904.00386. Retrieved from https://arxiv.org/abs/1904.00386.
[28]
Dongze Lian, Jing Li, Jia Zheng, Weixin Luo, and Shenghua Gao. 2019. Density map regression guided detection network for rgb-d crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1821–1830.
[29]
Dingkang Liang, Wei Xu, Yingying Zhu, and Yu Zhou. 2021. Focal inverse distance transform maps for crowd localization and counting in dense crowd. arXiv:2102.07925. Retrieved from https://arxiv.org/abs/2102.07925.
[30]
Dingkang Liang, Wei Xu, Yingying Zhu, and Yu Zhou. 2021. Reciprocal distance transform maps for crowd counting and people localization in dense crowd (unpublished).
[31]
Chenchen Liu, Xinyu Weng, and Yadong Mu. 2019. Recurrent attentive zooming for joint crowd counting and precise localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1217–1226.
[32]
Jiang Liu, Chenqiang Gao, Deyu Meng, and Alexander G. Hauptmann. 2018. Decidenet: Counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5197–5206.
[33]
Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, and Liang Lin. 2021. Cross-modal collaborative representation learning and a large-scale RGBT benchmark for crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4823–4833.
[34]
Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. 2019. Crowd counting with deep structured scale integration network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1774–1783.
[35]
Youwei Pang, Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. 2020. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9413–9422.
[36]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[37]
Deepak Babu Sam, Skand Vishwanath Peri, N. S. Mukuntha, and R. Venkatesh Babu. 2020. Going beyond the regression paradigm with accurate dot prediction for dense crowds. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20). IEEE, 2853–2861.
[38]
Deepak Babu Sam, Skand Vishwanath Peri, Mukuntha Narayanan Sundararaman, Amogh Kamath, and Venkatesh Babu Radhakrishnan. 2020. Locate, size and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell. (2020).
[39]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.
[40]
Qingyu Song, Changan Wang, Zhengkai Jiang, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Yang Wu. 2021. Rethinking counting and localization in crowds: A purely point-based framework. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3365–3374.
[41]
Russell Stewart, Mykhaylo Andriluka, and Andrew Y. Ng. 2016. End-to-end people detection in crowded scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2325–2333.
[42]
Tim Van Oosterhout, Sander Bakkes, Ben J. A. Kröse, et al. 2011. Head detection in stereo data for people counting and segmentation. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP’11). 620–625.
[43]
Jia Wan and Antoni Chan. 2020. Modeling noisy annotations for crowd counting. Adv. Neural Inf. Process. Syst. 33 (2020).
[44]
Jia Wan, Ziquan Liu, and Antoni B. Chan. 2021. A generalized loss function for crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1974–1983.
[45]
Hengli Wang, Rui Fan, Yuxiang Sun, and Ming Liu. 2021. Dynamic fusion module evolves drivable area and road anomaly detection: A benchmark and algorithms. IEEE Trans. Cybernet. (2021).
[46]
Qi Wang, Junyu Gao, Wei Lin, and Xuelong Li. 2020. NWPU-crowd: A large-scale benchmark for crowd counting and localization. IEEE Trans. Pattern Anal. Mach. Intell. 43, 6 (2020), 2141–2149.
[47]
Yi Wang, Junhui Hou, Xinyu Hou, and Lap-Pui Chau. 2021. A self-training approach for point-supervised object detection and counting in crowds. IEEE Trans. Image Process. 30 (2021), 2876–2887.
[48]
Yi Wang, Xinyu Hou, and Lap-Pui Chau. 2021. Dense point prediction: A simple baseline for crowd counting and localization. In Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (ICMEW’21). IEEE, 1–6.
[49]
Zhou Wang and Alan C. Bovik. 2009. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Sign. Process. Mag. 26, 1 (2009), 98–117.
[50]
Cort J. Willmott and Kenji Matsuura. 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 1 (2005), 79–82.
[51]
Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, and Masayoshi Tomizuka. 2019. Autoscale: Learning to scale for crowd counting (unpublished).
[52]
Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, and Zhenjun Han. 2020. Scale match for tiny person detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1257–1265.
[53]
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 589–597.

Cited By

View all
  • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
  • (2024)Survey of Machine Learning for Software-assisted Hardware Design Verification: Past, Present, and ProspectACM Transactions on Design Automation of Electronic Systems10.1145/366130829:4(1-42)Online publication date: 21-Jun-2024
  • (2024)Real-Time Attentive Dilated U-Net for Extremely Dark Image EnhancementACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365466820:8(1-19)Online publication date: 12-Jun-2024
  • Show More Cited By

Index Terms

  1. Dilated Convolution-based Feature Refinement Network for Crowd Localization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 6
    November 2023
    858 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3599695
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 July 2023
    Online AM: 20 December 2022
    Accepted: 01 September 2022
    Revised: 26 July 2022
    Received: 13 April 2022
    Published in TOMM Volume 19, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Dilated convolution
    2. Feature Refinement
    3. crowd localization
    4. contextual information

    Qualifiers

    • Research-article

    Funding Sources

    • Science and Technology Innovation (STI) 2030-Major Projects
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)196
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
    • (2024)Survey of Machine Learning for Software-assisted Hardware Design Verification: Past, Present, and ProspectACM Transactions on Design Automation of Electronic Systems10.1145/366130829:4(1-42)Online publication date: 21-Jun-2024
    • (2024)Real-Time Attentive Dilated U-Net for Extremely Dark Image EnhancementACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365466820:8(1-19)Online publication date: 12-Jun-2024
    • (2024)A Circuit System for High-Speed Multichannel Parallel Sampling of Quasi-Optical Signals2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)10.1109/ICDCECE60827.2024.10548094(01-06)Online publication date: 26-Apr-2024
    • (2024)Quantization Method Integrated with Progressive Quantization and Distillation LearningProcedia Computer Science10.1016/j.procs.2023.11.032228:C(281-290)Online publication date: 27-Feb-2024
    • (2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 30-Jan-2024
    • (2023)MGFGNet: an automatic underwater acoustic target recognition method based on the multi-gradient flow global feature enhancement networkFrontiers in Marine Science10.3389/fmars.2023.130622910Online publication date: 12-Dec-2023
    • (2023)Multilevel Electricity Text Named Entity Classification Based on Enhanced XLNet Algorithm2023 International Conference on Data Science and Network Security (ICDSNS)10.1109/ICDSNS58469.2023.10245815(1-7)Online publication date: 28-Jul-2023
    • (2023)An Improved Cascade R-CNN Method for Meter Box Defect Detection2023 International Conference on Data Science and Network Security (ICDSNS)10.1109/ICDSNS58469.2023.10245350(1-5)Online publication date: 28-Jul-2023
    • (2023)Mobile Deployment Method for Defect Recognition Algorithms in Power Transmission Scenarios2023 International Conference on Data Science and Network Security (ICDSNS)10.1109/ICDSNS58469.2023.10244925(1-6)Online publication date: 28-Jul-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media