research-article

Enhancing identification for person search with multi-scale multi-grained representation learning

Authors:

Bingpeng MaAuthors Info & Claims

Volume 150, Issue C

https://doi.org/10.1016/j.patcog.2024.110361

Published: 01 June 2024 Publication History

Abstract

Person Search aims to simultaneously address Person Detection and Person Re-ID. There are various challenges in person search such as significant scale variations, occlusions, and partial instances. In this paper, we propose a Multi-Scale Multi-Grained (MSMG) sequential network for end-to-end person search, intended to alleviate these issues. To generate re-id representations robust to scale changes, MSMG leverages multi-scale RoI features and aggregates them with a proposed Multi-Scale feature Aggregation Encoder (MSAE). In this way, the aggregated multi-scale re-id features are enriched with more semantic information and detailed information, thereby being more discriminative for identification. Moreover, to produce re-id representations more robust to occlusions and partial instances, MSMG introduces a Multi-Grained feature Learning Decoder (MGLD) focused on multi-grained feature learning. MGLD adaptively decodes multi-grained re-id representations with more accurate semantic information through a regional deformable cross-attention module. Finally, the multi-scale multi-grained re-id representation substantially improves the identification accuracy under challenging cases. Through comprehensive experiments, we demonstrate that our method achieves state-of-the-art performance on two benchmark datasets. On the challenging PRW benchmark, MSMG obtains the best-reported results with a mean average precision (mAP) score of 61.3%.

Highlights

•

We propose MSAE, an encoder for scale-robust re-id representations in person search.

•

We design MGLD, a decoder tackling occlusions and partial instances in person search.

•

The proposed MSMG achieves the best-reported mAP score of 61.3% on PRW.

References

[1]

L. Zheng, H. Zhang, S. Sun, M. Chandraker, Y. Yang, Q. Tian, Person re-identification in the wild, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 1367–1376.

[2]

D. Chen, S. Zhang, W. Ouyang, J. Yang, Y. Tai, Person search via a mask-guided two-stream cnn model, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 734–750.

[3]

C. Wang, B. Ma, H. Chang, S. Shan, X. Chen, Tcts: A task-consistent two-stage framework for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11952–11961.

[4]

C. Han, J. Ye, Y. Zhong, X. Tan, C. Zhang, C. Gao, N. Sang, Re-id driven localization refinement for person search, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9814–9823.

[5]

X. Lan, X. Zhu, S. Gong, Person search by multi-scale matching, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 536–552.

[6]

W. Dong, Z. Zhang, C. Song, T. Tan, Instance guided proposal network for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2585–2594.

[7]

T. Xiao, S. Li, B. Wang, L. Lin, X. Wang, Joint detection and identification feature learning for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 3415–3424.

[8]

Z. Li, D. Miao, Sequential end-to-end network for efficient person search, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 2011–2019.

[9]

D. Chen, S. Zhang, J. Yang, B. Schiele, Norm-aware embedding for efficient person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12615–12624.

[10]

Y. Yan, J. Li, J. Qin, S. Bai, S. Liao, L. Liu, F. Zhu, L. Shao, Anchor-free person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7690–7699.

[11]

Y. Zhong, X. Wang, S. Zhang, Robust partial matching for person search in the wild, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6827–6835.

[12]

D. Chen, S. Zhang, W. Ouyang, J. Yang, B. Schiele, Hierarchical online instance matching for person search, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 10518–10525.

[13]

W. Dong, Z. Zhang, C. Song, T. Tan, Bi-directional interaction network for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2839–2848.

[14]

C. Han, Z. Zheng, C. Gao, N. Sang, Y. Yang, Decoupled and memory-reinforced networks: Towards effective feature learning for one-step person search, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 1505–1512.

[15]

H. Kim, S. Joung, I.-J. Kim, K. Sohn, Prototype-guided saliency feature learning for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4865–4874.

[16]

B.-J. Han, K. Ko, J.-Y. Sim, End-to-end trainable trident person search network using adaptive gradient propagation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 925–933.

[17]

R. Yu, D. Du, R. LaLonde, D. Davila, C. Funk, A. Hoogs, B. Clipp, Cascade Transformers for End-to-End Person Search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7267–7276.

[18]

J. Cao, Y. Pang, R.M. Anwer, H. Cholakkal, J. Xie, M. Shah, F.S. Khan, PSTR: End-to-End One-Step Person Search With Transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9458–9467.

[19]

Yan Y., Li J., Qin J., Zheng P., Liao S., Yang X., Efficient person search: An anchor-free approach, Int. J. Comput. Vis. (2023) 1–20.

[20]

Liu C., Yang H., Zhou Q., Zheng S., Making person search enjoy the merits of person re-identification, Pattern Recognit. 127 (2022).

[21]

Ren S., He K., Girshick R., Sun J., Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. 28 (2015).

[22]

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable DETR: Deformable Transformers for End-to-End Object Detection, in: International Conference on Learning Representations, 2021.

[23]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2980–2988.

[24]

Z. Tian, C. Shen, H. Chen, T. He, Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636.

[25]

Huang J., Yu X., An D., Wei Y., Bai X., Zheng J., Wang C., Zhou J., Learning consistent region features for lifelong person re-identification, Pattern Recognit. 144 (2023).

[26]

Zhang P., Yu X., Bai X., Wang C., Zheng J., Ning X., Joint discriminative representation learning for end-to-end person search, Pattern Recognit. (2023).

[27]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: International Conference on Learning Representations, 2021.

[28]

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 568–578.

[29]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.

[30]

Chu X., Tian Z., Wang Y., Zhang B., Ren H., Wei X., Xia H., Shen C., Twins: Revisiting the design of spatial attention in vision transformers, Adv. Neural Inf. Process. Syst. 34 (2021) 9355–9366.

[31]

C.-F. Chen, R. Panda, Q. Fan, Regionvit: Regional-to-local attention for vision transformers, in: International Conference on Learning Representations, 2022.

[32]

Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, et al., Swin transformer v2: Scaling up capacity and resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12009–12019.

[33]

Wang W., Xie E., Li X., Fan D.-P., Song K., Liang D., Lu T., Luo P., Shao L., Pvt v2: Improved baselines with pyramid vision transformer, Comput. Vis. Media 8 (3) (2022) 415–424.

[34]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: Proceedings of the European Conference on Computer Vision, 2020, pp. 213–229.

[35]

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).

Digital Library

[36]

Y. Li, J. He, T. Zhang, X. Liu, Y. Zhang, F. Wu, Diverse part discovery: Occluded person re-identification with part-aware transformer, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2898–2907.

[37]

Z. Xia, X. Pan, S. Song, L.E. Li, G. Huang, Vision transformer with deformable attention, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4794–4803.

[38]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

[39]

Xiao J., Xie Y., Tillo T., Huang K., Wei Y., Feng J., IAN: the individual aggregation network for person search, Pattern Recognit. 87 (2019) 332–340.

[40]

H. Liu, J. Feng, Z. Jie, K. Jayashree, B. Zhao, M. Qi, J. Jiang, S. Yan, Neural person search machines, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 493–501.

[41]

X. Chang, P.-Y. Huang, Y.-D. Shen, X. Liang, Y. Yang, A.G. Hauptmann, Rcaa: Relational context-aware agents for person search, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 84–100.

[42]

Y. Yan, Q. Zhang, B. Ni, W. Zhang, M. Xu, X. Yang, Learning context graph for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2158–2167.

[43]

B. Munjal, S. Amin, F. Tombari, F. Galasso, Query-guided end-to-end person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 811–820.

[44]

Han C., Zheng Z., Su K., Yu D., Yuan Z., Gao C., Sang N., Yang Y., DMRNet++: Learning discriminative features with decoupled networks and enriched pairs for one-step person search, IEEE Trans. Pattern Anal. Mach. Intell. (2023).

[45]

M. Fiaz, H. Cholakkal, R.M. Anwer, F.S. Khan, SAT: Scale-Augmented Transformer for Person Search, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4820–4829.

[46]

L. Jaffe, A. Zakhor, Gallery Filter Network for Person Search, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1684–1693.

Recommendations

Cross-scale global attention feature pyramid network for person search
Abstract
Person search aims to locate the target person in real unconstrained scene images. It faces many challenges such as multi-scale and fine-grained. To address the challenges, a novel cross-scale global attention feature pyramid network (CSGAFPN) is ...
Highlights
- A multi-head global attention module is designed.
- Cross-scale global attention feature pyramid network is proposed
- An adaptive feature aggregation with attention layer is designed.
- 94.9% and 47.9% of mAP on CUHK-SYSU and PRW ...
Sequential Transformer for End-to-End Person Search
Neural Information Processing
Abstract
Person Search aims to simultaneously localize and recognize a target person from realistic and uncropped gallery images. One major challenge of person search comes from the contradictory goals of the two sub-tasks, i.e., person detection focuses ...
Person Search with Joint Detection, Segmentation and Re-identification
Human Centered Computing
Abstract
Person search is a new and challenging task proposed in recent years. It aims to jointly handle person detection and person re-identification in an end-to-end deep learning neural network. In this paper, we propose a new multi-task framework, ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 150, Issue C

Jun 2024

726 pages

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 June 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents