Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Beyond the Parts: Learning Coarse-to-Fine Adaptive Alignment Representation for Person Search

Published: 25 February 2023 Publication History

Abstract

Person search is a time-consuming computer vision task that entails locating and recognizing query people in scenic pictures. Body components are commonly mismatched during matching due to position variation, occlusions, and partially absent body parts, resulting in unsatisfactory person search results. Existing approaches for extracting local characteristics of the human body using keypoint information are unable to handle the search job when distinct body parts are misaligned, ignoring to exploit multiple granularities, which is crucial in the person search process. Moreover, the alignment learning methods learn body part features with fixed and equal weights, ignoring the beneficial contextual information, e.g., the umbrella carried by the pedestrian, which supplements compelling clues for identifying the person. In this paper, we propose a Coarse-to-Fine Adaptive Alignment Representation (CFA2R) network for learning multiple granular features in misaligned person search in the coarse-to-fine perspective. To exploit more beneficial body parts and related context of the cropped pedestrians, we design a Part-Attentional Progressive Module (PAPM) to guide the network to focus on informative body parts and positive accessorial regions. Besides, we propose a Re-weighting Alignment Module (RAM) shedding light on more contributive parts instead of treating them equally. Specifically, adaptive re-weighted but not fixed part features are reconstructed by Re-weighting Reconstruction module, considering that different parts serve unequally during image matching. Extensive experiments conducted on CUHK-SYSU and PRW datasets demonstrate competitive performance of our proposed method.

References

[1]
Jean-Paul Ainam, Ke Qin, Guisong Liu, Guangchun Luo, and Brighter Agyemang. 2020. Enforcing affinity feature learning through self-attention for person re-identification. ACM Trans. Multimedia Comput. Commun. Appl. 16, 1 (2020), 16:1–16:22.
[2]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2021. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2021), 172–186.
[3]
Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, and Alexander G. Hauptmann. 2018. RCAA: Relational context-aware agents for person search. In Proc. Springer Eur. Conf. Comput. Vis., Vol. 11213. 86–102.
[4]
Di Chen, Shanshan Zhang, Wanli Ouyang, Jian Yang, and Bernt Schiele. 2020. Hierarchical online instance matching for person search. In Proc. AAAI Conf. Artif. Intell.10518–10525.
[5]
Di Chen, Shanshan Zhang, Wanli Ouyang, Jian Yang, and Ying Tai. 2018. Person search via a mask-guided two-stream CNN model. In Proc. Springer Eur. Conf. Comput. Vis., Vol. 11211. 764–781.
[6]
Di Chen, Shanshan Zhang, Wanli Ouyang, Jian Yang, and Ying Tai. 2020. Person search by separated modeling and A mask-guided two-stream CNN model. IEEE Trans. Image Process. 29 (2020), 4669–4682.
[7]
Di Chen, Shanshan Zhang, Jian Yang, and Bernt Schiele. 2021. Norm-aware embedding for efficient person search and tracking. Int. J. Comput. Vis. 129, 11 (2021), 3154–3168.
[8]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proc. IEEE/CVF Int. Conf. Comput. Vis.764–773.
[9]
Ju Dai, Pingping Zhang, Huchuan Lu, and Hongyu Wang. 2020. Dynamic imposter based online instance matching for person search. Pattern Recognit. 100 (2020), 107120.
[10]
Piotr Dollár, Ron Appel, Serge J. Belongie, and Pietro Perona. 2014. Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 8 (2014), 1532–1545.
[11]
Piotr Dollár, Zhuowen Tu, Pietro Perona, and Serge J. Belongie. 2009. Integral channel features. In Proc. BMVA Brit. Mach. Vis. Conf.1–11.
[12]
Wenkai Dong, Zhaoxiang Zhang, Chunfeng Song, and Tieniu Tan. 2020. Bi-directional interaction network for person search. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.2836–2845.
[13]
Wenkai Dong, Zhaoxiang Zhang, Chunfeng Song, and Tieniu Tan. 2020. Instance guided proposal network for person search. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.2582–2591.
[14]
Pedro F. Felzenszwalb, Ross B. Girshick, David A. McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627–1645.
[15]
Chuchu Han, Zhedong Zheng, Changxin Gao, Nong Sang, and Yi Yang. 2021. Decoupled and memory-reinforced networks: Towards effective feature learning for one-step person search. In Proc. AAAI Conf. Artif. Intell.1505–1512.
[16]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2020. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2 (2020), 386–397.
[17]
Zhenwei He and Lei Zhang. 2018. End-to-end detection and re-identification integrated net for person search. In Proc. Springer Asian Conf. Comput. Vis., Vol. 11362. 349–364.
[18]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.7132–7141.
[19]
Wenxin Huang, Ruimin Hu, Xiao Wang, Chao Liang, and Jun Chen. 2021. Occluded suspect search via channel-guided mechanism. Neural Comput. Appl. 33, 3 (2021), 961–971.
[20]
Xuemei Jia, Xian Zhong, Mang Ye, Wenxuan Liu, and Wenxin Huang. 2022. Complementary data augmentation for cloth-changing person re-identification. IEEE Trans. Image Process. 31 (2022), 4227–4239.
[21]
Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Zhen Han, Tao Lu, Baojin Huang, and Junjun Jiang. 2021. Decomposition makes better rain removal: An improved attention-guided deraining network. IEEE Trans. Circuits Syst. Video Technol. 31, 10 (2021), 3981–3995.
[22]
Xiao-Yuan Jing, Xiaoke Zhu, Fei Wu, Ruimin Hu, Xinge You, Yunhong Wang, Hui Feng, and Jing-Yu Yang. 2017. Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. IEEE Trans. Image Process. 26, 3 (2017), 1363–1378.
[23]
Hanjae Kim, Sunghun Joung, Ig-Jae Kim, and Kwanghoon Sohn. 2021. Prototype-guided saliency feature learning for person search. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.4865–4874.
[24]
Xu Lan, Xiatian Zhu, and Shaogang Gong. 2018. Person search by multi-scale matching. In Proc. Springer Eur. Conf. Comput. Vis., Vol. 11205. 553–569.
[25]
Jianheng Li, Fuhang Liang, Yuanxun Li, and Wei-Shi Zheng. 2019. Fast person search pipeline. In Proc. IEEE Int. Conf. Multimedia Expo. 1114–1119.
[26]
Wenbo Li, Ze Chen, Zhenyong Fu, and Hongtao Lu. 2018. Multilevel collaborative attention network for person search. In Proc. Springer Asian Conf. Comput. Vis., Vol. 11361. 467–482.
[27]
Wei Li, Shaogang Gong, and Xiatian Zhu. 2021. Hierarchical distillation learning for scalable person search. Pattern Recognit. 114 (2021), 107862.
[28]
Xiang Li, Wei-Shi Zheng, Xiaojuan Wang, Tao Xiang, and Shaogang Gong. 2015. Multi-scale learning for low-resolution person re-identification. In Proc. IEEE/CVF Int. Conf. Comput. Vis.3765–3773.
[29]
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.2197–2206.
[30]
Shengcai Liao and Stan Z. Li. 2015. Efficient PSD constrained asymmetric metric learning for person re-identification. In Proc. IEEE/CVF Int. Conf. Comput. Vis.3685–3693.
[31]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proc. Springer Eur. Conf. Comput. Vis., Vol. 8693. 740–755.
[32]
Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. 2015. Bilinear CNN models for fine-grained visual recognition. In Proc. IEEE/CVF Int. Conf. Comput. Vis.1449–1457.
[33]
Chuang Liu, Hua Yang, Ji Zhu, Xinzhe Li, Zhigang Chang, and Shibao Zheng. 2021. Graph similarity rectification for person search. Neurocomputing 465 (2021), 184–194.
[34]
Hao Liu, Jiashi Feng, Zequn Jie, Jayashree Karlekar, Bo Zhao, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. Neural person search machines. In Proc. IEEE/CVF Int. Conf. Comput. Vis.493–501.
[35]
Jiawei Liu, Zheng-Jun Zha, Richang Hong, Meng Wang, and Yongdong Zhang. 2020. Dual context-aware refinement network for person search. In Proc. ACM Int. Conf. Multimedia. 3450–3459.
[36]
Bingpeng Ma, Yu Su, and Frédéric Jurie. 2012. Local descriptors encoded by Fisher vectors for person re-identification. In Proc. Springer Eur. Conf. Comput. Vis. Workshops, Vol. 7583. 413–422.
[37]
Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.1363–1372.
[38]
Bharti Munjal, Sikandar Amin, Federico Tombari, and Fabio Galasso. 2019. Query-guided end-to-end person search. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.811–820.
[39]
Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. 2019. Mask-guided attention network for occluded pedestrian detection. In Proc. IEEE/CVF Int. Conf. Comput. Vis.4966–4974.
[40]
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.
[41]
Jianlou Si, Honggang Zhang, Chun-Guang Li, Jason Kuen, Xiangfei Kong, Alex C. Kot, and Gang Wang. 2018. Dual attention matching network for context-aware feature sequence based person re-identification. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.5363–5372.
[42]
Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. 2018. Mask-guided contrastive attention model for person re-identification. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.1179–1188.
[43]
Yumin Suh, Jingdong Wang, Siyu Tang, Tao Mei, and Kyoung Mu Lee. 2018. Part-aligned bilinear representations for person re-identification. In Proc. Springer Eur. Conf. Comput. Vis., Vol. 11218. 418–437.
[44]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.1–9.
[45]
Maoqing Tian, Shuai Yi, Hongsheng Li, Shihua Li, Xuesen Zhang, Jianping Shi, Junjie Yan, and Xiaogang Wang. 2018. Eliminating background-bias for robust person re-identification. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.5794–5803.
[46]
Xiao Wang, Wu Liu, Jun Chen, Xiaobo Wang, Chenggang Yan, and Tao Me. 2020. Listen, look, and find the one: Robust person search with multimodality index. ACM Trans. Multimedia Comput. Commun. Appl. 16, 2 (2020), 47:1–47:20.
[47]
Xiao Wang, Zheng Wang, Wu Liu, Xin Xu, Jing Chen, and Chia-Wen Lin. 2021. Consistency-constancy bi-knowledge learning for pedestrian detection in night surveillance. In Proc. ACM Int. Conf. Multimedia. 4463–4471.
[48]
Jing Xiao, Ruimin Hu, Liang Liao, Yu Chen, Zhongyuan Wang, and Zixiang Xiong. 2016. Knowledge-based coding of objects for multisource surveillance video data. IEEE Trans. Multim. 18, 9 (2016), 1691–1706.
[49]
Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.3376–3385.
[50]
Xin Xu, Lei Liu, Xiaolong Zhang, Weili Guan, and Ruimin Hu. 2021. Rethinking data collection for person re-identification: Active redundancy reduction. Pattern Recognit. 113 (2021), 107827.
[51]
Xin Xu, Shiqin Wang, Zheng Wang, Xiaolong Zhang, and Ruimin Hu. 2021. Exploring image enhancement for salient object detection in low light images. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1s (2021), 1–19.
[52]
Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, and Ling Shao. 2021. Anchor-free person search. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.7690–7699.
[53]
Yichao Yan, Qiang Zhang, Bingbing Ni, Wendong Zhang, Minghao Xu, and Xiaokang Yang. 2019. Learning context graph for person search. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.2158–2167.
[54]
Bin Yang, Junjie Yan, Zhen Lei, and Stan Z. Li. 2015. Convolutional channel features. In Proc. IEEE/CVF Int. Conf. Comput. Vis.82–90.
[55]
Tuo Yu, Haiming Jin, Wai-Tian Tan, and Klara Nahrstedt. 2018. SKEPRID: Pose and illumination change-resistant skeleton-based person re-identification. ACM Trans. Multimedia Comput. Commun. Appl. 14, 4 (2018), 82:1–82:24.
[56]
Liliang Zhang, Liang Lin, Xiaodan Liang, and Kaiming He. 2016. Is faster R-CNN doing well for pedestrian detection? In Proc. Springer Eur. Conf. Comput. Vis.443–457.
[57]
Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.1239–1248.
[58]
Xinyu Zhang, Xinlong Wang, Jia-Wang Bian, Chunhua Shen, and Mingyu You. 2021. Diverse knowledge distillation for end-to-end person search. In Proc. AAAI Conf. Artif. Intell.3412–3420.
[59]
Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.1278–1287.
[60]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.6230–6239.
[61]
Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, and Xiaoou Tang. 2017. Spindle net: Person re-identification with human body region guided feature decomposition and fusion. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.907–915.
[62]
Liang Zheng, Hengheng Zhang, Shaoyan Sun, Manmohan Chandraker, Yi Yang, and Qi Tian. 2017. Person re-identification in the wild. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.3346–3355.
[63]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2018. A discriminatively learned CNN embedding for person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2018), 13:1–13:20.
[64]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2019. Pedestrian alignment network for large-scale person re-identification. IEEE Trans. Circuits Syst. Video Technol. 29, 10 (2019), 3037–3045.
[65]
Xian Zhong, Yiting Liu, Wenxin Huang, Xiao Wang, Bo Ma, and Jingling Yuan. 2021. Part-aligned network with background for misaligned person search. In Proc. IEEE Int. Conf. Acoustics Speech Signal Process.4250–4254.
[66]
Xian Zhong, Shilei Zhao, Xiao Wang, Kui Jiang, Wenxuan Liu, Wenxin Huang, and Zheng Wang. 2021. Unsupervised vehicle search in the wild: A new benchmark. In Proc. ACM Int. Conf. Multimedia. 5316–5325.
[67]
Yingji Zhong, Xiaoyu Wang, and Shiliang Zhang. 2020. Robust partial matching for person search in the wild. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.6826–6834.
[68]
Shuren Zhou, Ying Wang, Fan Zhang, and Jie Wu. 2021. Cross-view similarity exploration for unsupervised cross-domain person re-identification. Neural Comput. Appl. 33, 9 (2021), 4001–4011.

Cited By

View all
  • (2024)Multi Fine-Grained Fusion Network for Depression DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366524720:8(1-23)Online publication date: 29-Jun-2024
  • (2024)Joint Distortion Restoration and Quality Feature Learning for No-reference Image Quality AssessmentACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364989920:7(1-20)Online publication date: 27-Mar-2024
  • (2024)A Reconfigurable Framework for Neural Network Based Video In-Loop FilteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364046720:6(1-20)Online publication date: 8-Mar-2024
  • Show More Cited By

Index Terms

  1. Beyond the Parts: Learning Coarse-to-Fine Adaptive Alignment Representation for Person Search

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 3
      May 2023
      514 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3582886
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 February 2023
      Online AM: 08 October 2022
      Accepted: 26 September 2022
      Revised: 30 August 2022
      Received: 29 December 2021
      Published in TOMM Volume 19, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Person search
      2. alignment representation learning
      3. coarse-to-fine
      4. Part-Attentional Progressive Module
      5. Re-weighting Alignment Module

      Qualifiers

      • Research-article

      Funding Sources

      • Department of Science and Technology, Hubei Provincial People’s Government
      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)162
      • Downloads (Last 6 weeks)21
      Reflects downloads up to 16 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Multi Fine-Grained Fusion Network for Depression DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366524720:8(1-23)Online publication date: 29-Jun-2024
      • (2024)Joint Distortion Restoration and Quality Feature Learning for No-reference Image Quality AssessmentACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364989920:7(1-20)Online publication date: 27-Mar-2024
      • (2024)A Reconfigurable Framework for Neural Network Based Video In-Loop FilteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364046720:6(1-20)Online publication date: 8-Mar-2024
      • (2024)Unlabeled Data Assistant: Improving Mask Robustness for Face RecognitionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.335410919(3109-3123)Online publication date: 15-Jan-2024
      • (2024)Facial Feature Priors Guided Blind Face Inpainting2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA61579.2024.10664817(1-6)Online publication date: 5-Aug-2024
      • (2024)GPAN-PS: Global-Response Pedestrian Attention Network for End-to-End Person SearchIEEE Access10.1109/ACCESS.2024.348723512(157686-157698)Online publication date: 2024
      • (2024)ICLRPattern Recognition10.1016/j.patcog.2023.110168148:COnline publication date: 17-Apr-2024
      • (2024)Person search over security video surveillance systems using deep learning methodsImage and Vision Computing10.1016/j.imavis.2024.104930143:COnline publication date: 2-Jul-2024
      • (2024)Multi-view hyperspectral image classification via weighted sparse representationMultimedia Tools and Applications10.1007/s11042-024-18917-2Online publication date: 22-Mar-2024
      • (2024)Malleable pruning meets more scaled wide-area of attention model for real-time crack detectionThe Visual Computer10.1007/s00371-024-03522-zOnline publication date: 17-Jun-2024
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media