Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enhancing identification for person search with multi-scale multi-grained representation learning

Published: 01 June 2024 Publication History

Abstract

Person Search aims to simultaneously address Person Detection and Person Re-ID. There are various challenges in person search such as significant scale variations, occlusions, and partial instances. In this paper, we propose a Multi-Scale Multi-Grained (MSMG) sequential network for end-to-end person search, intended to alleviate these issues. To generate re-id representations robust to scale changes, MSMG leverages multi-scale RoI features and aggregates them with a proposed Multi-Scale feature Aggregation Encoder (MSAE). In this way, the aggregated multi-scale re-id features are enriched with more semantic information and detailed information, thereby being more discriminative for identification. Moreover, to produce re-id representations more robust to occlusions and partial instances, MSMG introduces a Multi-Grained feature Learning Decoder (MGLD) focused on multi-grained feature learning. MGLD adaptively decodes multi-grained re-id representations with more accurate semantic information through a regional deformable cross-attention module. Finally, the multi-scale multi-grained re-id representation substantially improves the identification accuracy under challenging cases. Through comprehensive experiments, we demonstrate that our method achieves state-of-the-art performance on two benchmark datasets. On the challenging PRW benchmark, MSMG obtains the best-reported results with a mean average precision (mAP) score of 61.3%.

Highlights

We propose MSAE, an encoder for scale-robust re-id representations in person search.
We design MGLD, a decoder tackling occlusions and partial instances in person search.
The proposed MSMG achieves the best-reported mAP score of 61.3% on PRW.

References

[1]
L. Zheng, H. Zhang, S. Sun, M. Chandraker, Y. Yang, Q. Tian, Person re-identification in the wild, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 1367–1376.
[2]
D. Chen, S. Zhang, W. Ouyang, J. Yang, Y. Tai, Person search via a mask-guided two-stream cnn model, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 734–750.
[3]
C. Wang, B. Ma, H. Chang, S. Shan, X. Chen, Tcts: A task-consistent two-stage framework for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11952–11961.
[4]
C. Han, J. Ye, Y. Zhong, X. Tan, C. Zhang, C. Gao, N. Sang, Re-id driven localization refinement for person search, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9814–9823.
[5]
X. Lan, X. Zhu, S. Gong, Person search by multi-scale matching, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 536–552.
[6]
W. Dong, Z. Zhang, C. Song, T. Tan, Instance guided proposal network for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2585–2594.
[7]
T. Xiao, S. Li, B. Wang, L. Lin, X. Wang, Joint detection and identification feature learning for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 3415–3424.
[8]
Z. Li, D. Miao, Sequential end-to-end network for efficient person search, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 2011–2019.
[9]
D. Chen, S. Zhang, J. Yang, B. Schiele, Norm-aware embedding for efficient person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12615–12624.
[10]
Y. Yan, J. Li, J. Qin, S. Bai, S. Liao, L. Liu, F. Zhu, L. Shao, Anchor-free person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7690–7699.
[11]
Y. Zhong, X. Wang, S. Zhang, Robust partial matching for person search in the wild, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6827–6835.
[12]
D. Chen, S. Zhang, W. Ouyang, J. Yang, B. Schiele, Hierarchical online instance matching for person search, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 10518–10525.
[13]
W. Dong, Z. Zhang, C. Song, T. Tan, Bi-directional interaction network for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2839–2848.
[14]
C. Han, Z. Zheng, C. Gao, N. Sang, Y. Yang, Decoupled and memory-reinforced networks: Towards effective feature learning for one-step person search, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 1505–1512.
[15]
H. Kim, S. Joung, I.-J. Kim, K. Sohn, Prototype-guided saliency feature learning for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4865–4874.
[16]
B.-J. Han, K. Ko, J.-Y. Sim, End-to-end trainable trident person search network using adaptive gradient propagation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 925–933.
[17]
R. Yu, D. Du, R. LaLonde, D. Davila, C. Funk, A. Hoogs, B. Clipp, Cascade Transformers for End-to-End Person Search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7267–7276.
[18]
J. Cao, Y. Pang, R.M. Anwer, H. Cholakkal, J. Xie, M. Shah, F.S. Khan, PSTR: End-to-End One-Step Person Search With Transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9458–9467.
[19]
Yan Y., Li J., Qin J., Zheng P., Liao S., Yang X., Efficient person search: An anchor-free approach, Int. J. Comput. Vis. (2023) 1–20.
[20]
Liu C., Yang H., Zhou Q., Zheng S., Making person search enjoy the merits of person re-identification, Pattern Recognit. 127 (2022).
[21]
Ren S., He K., Girshick R., Sun J., Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. 28 (2015).
[22]
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable DETR: Deformable Transformers for End-to-End Object Detection, in: International Conference on Learning Representations, 2021.
[23]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2980–2988.
[24]
Z. Tian, C. Shen, H. Chen, T. He, Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636.
[25]
Huang J., Yu X., An D., Wei Y., Bai X., Zheng J., Wang C., Zhou J., Learning consistent region features for lifelong person re-identification, Pattern Recognit. 144 (2023).
[26]
Zhang P., Yu X., Bai X., Wang C., Zheng J., Ning X., Joint discriminative representation learning for end-to-end person search, Pattern Recognit. (2023).
[27]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: International Conference on Learning Representations, 2021.
[28]
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 568–578.
[29]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
[30]
Chu X., Tian Z., Wang Y., Zhang B., Ren H., Wei X., Xia H., Shen C., Twins: Revisiting the design of spatial attention in vision transformers, Adv. Neural Inf. Process. Syst. 34 (2021) 9355–9366.
[31]
C.-F. Chen, R. Panda, Q. Fan, Regionvit: Regional-to-local attention for vision transformers, in: International Conference on Learning Representations, 2022.
[32]
Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, et al., Swin transformer v2: Scaling up capacity and resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12009–12019.
[33]
Wang W., Xie E., Li X., Fan D.-P., Song K., Liang D., Lu T., Luo P., Shao L., Pvt v2: Improved baselines with pyramid vision transformer, Comput. Vis. Media 8 (3) (2022) 415–424.
[34]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: Proceedings of the European Conference on Computer Vision, 2020, pp. 213–229.
[35]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
[36]
Y. Li, J. He, T. Zhang, X. Liu, Y. Zhang, F. Wu, Diverse part discovery: Occluded person re-identification with part-aware transformer, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2898–2907.
[37]
Z. Xia, X. Pan, S. Song, L.E. Li, G. Huang, Vision transformer with deformable attention, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4794–4803.
[38]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[39]
Xiao J., Xie Y., Tillo T., Huang K., Wei Y., Feng J., IAN: the individual aggregation network for person search, Pattern Recognit. 87 (2019) 332–340.
[40]
H. Liu, J. Feng, Z. Jie, K. Jayashree, B. Zhao, M. Qi, J. Jiang, S. Yan, Neural person search machines, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 493–501.
[41]
X. Chang, P.-Y. Huang, Y.-D. Shen, X. Liang, Y. Yang, A.G. Hauptmann, Rcaa: Relational context-aware agents for person search, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 84–100.
[42]
Y. Yan, Q. Zhang, B. Ni, W. Zhang, M. Xu, X. Yang, Learning context graph for person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2158–2167.
[43]
B. Munjal, S. Amin, F. Tombari, F. Galasso, Query-guided end-to-end person search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 811–820.
[44]
Han C., Zheng Z., Su K., Yu D., Yuan Z., Gao C., Sang N., Yang Y., DMRNet++: Learning discriminative features with decoupled networks and enriched pairs for one-step person search, IEEE Trans. Pattern Anal. Mach. Intell. (2023).
[45]
M. Fiaz, H. Cholakkal, R.M. Anwer, F.S. Khan, SAT: Scale-Augmented Transformer for Person Search, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4820–4829.
[46]
L. Jaffe, A. Zakhor, Gallery Filter Network for Person Search, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1684–1693.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition
Pattern Recognition  Volume 150, Issue C
Jun 2024
726 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 June 2024

Author Tags

  1. Person search
  2. Transformer
  3. Multi-scale
  4. Multi-granularity

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media