Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FedSH: Towards Privacy-Preserving Text-Based Person Re-Identification

Published: 06 November 2023 Publication History

Abstract

Text-based person re-identification (ReID) has enabled canonical applications in searching for and tracking targets from large-scale surveillance images with textual descriptions. Yet, existing text-based person ReID systems employ centralized model training that gathers images captured by different institutes' cameras into one place, which poses severe privacy threats to sensitive institutional information. This work is then devoted to exploring privacy-preserving text-based person ReID and proposes the framework of FedSH by tailoring the federated learning paradigm for distributed searching knowledge extraction. Specifically, FedSH resolves the local model generalization and entity boundary obscuring limitations, caused by inner-institute data homogeneity and inter-institute data heterogeneity, via building multi-granularity feature representation and a semantically self-aligned network. Meanwhile, it reduces the communication burden introduced by the embedding for multiple modals by updating common representation subspaces during federated learning. Experimental results on two public benchmarks demonstrate that our method can achieve at most 16.47% and 16.02% person ReID performance improvement by the Rank-1 metric, compared with 6 State-of-The-Art (SoTA) baselines and 6 ablation studies. We believe that our work will inspire the community to investigate the potential of implementing Federated Learning in real-world image retrieval and ReID scenarios.

References

[1]
M. Ye et al., “Deep learning for person re-identification: A survey and outlook,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, Jun. 2022.
[2]
J. Liu et al., “From distributed machine learning to federated learning: A survey,” Knowl. Inf. Syst., vol. 64, pp. 885–917, 2022.
[3]
D. H. Mahlool and M. H. Abed, “A comprehensive survey on federated learning: Concept and applications,” in Proc. Mobile Comput. Sustain. Inform., 2022, pp. 539–553.
[4]
S. Li et al., “Person search with natural language description,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5187–5196.
[5]
S. Li, T. Xiao, H. Li, W. Yang, and X. Wang, “Identity-aware textual-visual matching with latent co-attention,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1908–1917.
[6]
S. Aggarwal, V. B. Radhakrishnan, and A. Chakraborty, “Text-based person search via attribute-aided matching,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2020, pp. 2606–2614.
[7]
Z. Wang, Z. Fang, J. Wang, and Y. Yang, “ViTAA: Visual-textual attributes alignment in person search by natural language,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 402–420.
[8]
Z. Ding, C. Ding, Z. Shao, and D. Tao, “Semantically self-aligned network for text-to-image part-aware person re-identification,” 2021, arXiv:2107.12666.
[9]
S. Zhang et al., “Text-based person search in full images via semantic-driven proposal generation,” in Proc. 4th Int. Workshop Human Centric Multimedia Anal., 2023, pp. 5–14.
[10]
S. Zhao, C. Gao, Y. Shao, W.-S. Zheng, and N. Sang, “Weakly supervised text-based person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11375–11384.
[11]
A. Farooq, M. Awais, J. Kittler, and S. S. Khalid, “AXM-Net: Implicit cross-modal feature alignment for person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 4477–4485.
[12]
Y. Chen, G. Zhang, Y. Lu, Z. Wang, and Y. Zheng, “TIPCB: A simple but effective part-based convolutional baseline for text-based person search,” Neurocomputing, vol. 494, pp. 171–181, 2022.
[13]
L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification: Past, present and future,” 2016, arXiv:1610.02984.
[14]
W. Zhuang et al., “Performance optimization of federated person re-identification via benchmark analysis,” in Proc. ACM Int. Conf. Multimedia, 2020, pp. 955–963.
[15]
J. Konečný et al., “Federated learning: Strategies for improving communication efficiency,” 2016, arXiv:1610.05492.
[16]
S. Caldas et al., “Leaf: A benchmark for federated settings,” 2018, arXiv:1812.01097.
[17]
Y. Guo et al., “PREFER: Point-of-interest recommendation with efficiency and privacy-preservation via federated edge learning,” Proc. ACM Interactive, Mobile, Wearable Ubiquitous Technol., vol. 5, no. 1, pp. 1–25, 2021.
[18]
B. Liu et al., “DISTFL: Distribution-aware federated learning for mobile scenarios,” Proc. ACM Interactive, Mobile, Wearable Ubiquitous Technol., vol. 5, no. 4, pp. 1–26, 2021.
[19]
J. Yao, Z. Dou, and J.-R. Wen, “FedPS: A privacy protection enhanced personalized search framework,” in Proc. Web Conf., 2021, pp. 3757–3766.
[20]
W. Zhuang, Y. Wen, and S. Zhang, “Joint optimization in edge-cloud continuum for federated unsupervised person re-identification,” in Proc. ACM Int. Conf. Multimedia, 2021, pp. 433–441.
[21]
L. Zong et al., “FedCMR: Federated cross-modal retrieval,” in Proc. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 1672–1676.
[22]
H. Zeng, T. Zhou, Y. Guo, Z. Cai, and F. Liu, “FedCav: Contribution-aware model aggregation on distributed heterogeneous data in federated learning,” in Proc. Int. Conf. Parallel Process., 2021, pp. 1–10.
[23]
B. Xiong, X. Yang, F. Qi, and C. Xu, “A unified framework for multi-modal federated learning,” Neurocomputing, vol. 480, pp. 110–118, 2022.
[24]
S. Sun, G. Wu, and S. Gong, “Decentralised person re-identification with selective knowledge aggregation,” 2021, arXiv:2110.11384.
[25]
F. Yang, Z. Zhong, Z. Luo, S. Li, and N. Sebe, “Federated and generalized person re-identification through domain and feature hallucinating,” 2022, arXiv:2203.02689.
[26]
G. Wu and S. Gong, “Decentralised learning from independent multi-domain labels for person re-identification,” in Proc. AAAI Conf. Artif. Intell., vol. 35, no. 4, 2021, pp. 2898–2906.
[27]
Q. Xie, W. Zhou, G.-J. Qi, Q. Tian, and H. Li, “Progressive unsupervised person re-identification by tracklet association with spatio-temporal regularization,” IEEE Trans. Multimedia, vol. 23, pp. 597–610, 2021.
[28]
H. Galiyawala, M. S. Raval, and D. Savaliya, “DSA-PR: Discrete soft biometric attribute-based person retrieval in surveillance videos,” in Proc. IEEE Int. Conf. Adv. Video Signal Based Surveill., 2021, pp. 1–7.
[29]
T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang, “Joint detection and identification feature learning for person search,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3376–3385.
[30]
N. Spolaor et al., “A systematic review on content-based video retrieval,” Eng. Appl. Artif. Intell., vol. 90, 2020, Art. no.
[31]
H. Luo, W. Jiang, X. Fan, and C. Zhang, “STNReID: Deep convolutional networks with pairwise spatial transformer networks for partial person re-identification,” IEEE Trans. Multimedia, vol. 22, pp. 2905–2913, 2020.
[32]
B. Chen, W. Deng, and J. Hu, “Mixed high-order attention network for person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 371–381.
[33]
Z. Dai, M. Chen, X. Gu, S. Zhu, and P. Tan, “Batch dropblock network for person re-identification and beyond,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 3690–3700.
[34]
F. Yang, Z. Zhong, Z. Luo, S. Lian, and S. Li, “Leveraging virtual and real person for unsupervised person re-identification,” IEEE Trans. Multimedia, vol. 22, pp. 2444–2453, 2020.
[35]
X. Chen et al., “Salience-guided cascaded suppression network for person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3297–3307.
[36]
Z. Liu, L. Zhang, and Y. Yang, “Hierarchical bi-directional feature perception network for person re-identification,” in Proc. ACM Int. Conf. Multimedia, 2020, pp. 4289–4298.
[37]
G. Zhang, P. Zhang, J. Qi, and H. Lu, “Hat: Hierarchical aggregation transformers for person re-identification,” in Proc. ACM Int. Conf. Multimedia, 2021, pp. 516–525.
[38]
Y. Zhang and H. Lu, “Deep cross-modal projection learning for image-text matching,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 686–701.
[39]
K. Niu, Y. Huang, W. Ouyang, and L. Wang, “Improving description-based person re-identification by multi-granularity image-text alignments,” IEEE Trans. Image Process., vol. 29, pp. 5542–5556, 2020.
[40]
Y. Jing et al., “Pose-guided multi-granularity attention network for text-based person search,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 11189–11196.
[41]
C. Gao et al., “Contextual non-local alignment over full-scale representation for text-based person search,” 2021, arXiv:2101.03036.
[42]
C. Wang, Z. Luo, Y. Lin, and S. Li, “Text-based person search via multi-granularity embedding learning,” in Proc. Int. Joint Conf. Artif. Intell., 2021, pp. 1068–1074.
[43]
A. Zhu et al., “DSSL: Deep surroundings-person separation learning for text-based person retrieval,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 209–217.
[44]
W. Ma, T. Zhou, J. Qin, Q. Zhou, and Z. Cai, “Joint-attention feature fusion network and dual-adaptive NMS for object detection,” Knowl.-Based Syst., vol. 241, 2022, Art. no.
[45]
S. Chen, Y. Zhao, Q. Jin, and Q. Wu, “Fine-grained video-text retrieval with hierarchical graph reasoning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10635–10644.
[46]
Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 480–496.
[47]
H. Yao et al., “Deep representation learning with part loss for person re-identification,” IEEE Trans. Image Process., vol. 28, no. 6, pp. 2860–2871, Jun. 2019.
[48]
Z. Zheng et al., “Dual-path convolutional image-text embeddings with instance loss,” ACM Trans. Multimedia Comput., Commun. Appl., vol. 16, no. 2, pp. 1–23, 2020.
[49]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. Artif. Intell. Statist., 2017, pp. 1273–1282.
[50]
T. Li et al., “Federated optimization in heterogeneous networks,” Proc. Mach. Learn. Syst., vol. 2, pp. 429–450, 2020.
[51]
K.-H. Lee, X. Chen, G. Hua, H. Hu, and X. He, “Stacked cross attention for image-text matching,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 201–216.
[52]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[53]
L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.
[54]
A. Radford et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 8748–8763.
[55]
C. Jia et al., “Scaling up visual and vision-language representation learning with noisy text supervision,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 4904–4916.

Cited By

View all
  • (2024)Text-and-Image Learning Transformer for Cross-modal Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3686160Online publication date: 15-Oct-2024
  • (2024)Prototypical Prompting for Text-to-image Person Re-identificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681165(2331-2340)Online publication date: 28-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 26, Issue
2024
10405 pages

Publisher

IEEE Press

Publication History

Published: 06 November 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Text-and-Image Learning Transformer for Cross-modal Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3686160Online publication date: 15-Oct-2024
  • (2024)Prototypical Prompting for Text-to-image Person Re-identificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681165(2331-2340)Online publication date: 28-Oct-2024

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media