Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3612297acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification

Published: 27 October 2023 Publication History

Abstract

Visible-Infrared person re-identification is challenging due to the large modality gap. To bridge the gap, most studies heavily rely on the correlation of visible-infrared holistic person images, which may perform poorly under severe distribution shifts. In contrast, we find that some cross-modal correlated high-frequency components contain discriminative visual patterns and are less affected by variations such as wavelength, pose, and background clutter than holistic images. Therefore, we are motivated to bridge the modality gap based on such high-frequency components, and propose Prototype-guided High-frequency Patch Enhancement (ProtoHPE) with two core designs. First, to enhance the representation ability of cross-modal correlated high-frequency components, we split patches with such components by Wavelet Transform and exponential moving average Vision Transformer (ViT), then empower ViT to take the split patches as auxiliary input. Second, to obtain semantically compact and discriminative high-frequency representations of the same identity, we propose Multimodal Prototypical Contrast. To be specific, it hierarchically captures comprehensive semantics of different modal instances, facilitating the aggregation of high-frequency representations belonging to the same identity. With it, ViT can capture key high-frequency components during inference without relying on ProtoHPE, thus bringing no extra complexity. Extensive experiments validate the effectiveness of ProtoHPE.

References

[1]
Alex Andonian, Shixing Chen, and Raffay Hamid. 2022. Robust cross-modal representation learning with progressive self-distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16430--16441.
[2]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650--9660.
[3]
Cuiqun Chen, Mang Ye, Meibin Qi, Jingjing Wu, Jianguo Jiang, and Chia-Wen Lin. 2022. Structure-aware positional transformer for visible-infrared person re-identification. IEEE Transactions on Image Processing 31 (2022), 2352--2364.
[4]
Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Yixiang Huang, Yiping Bao, and Erjin Zhou. 2022. Prototypical Contrastive Language Image Pretraining. arXiv preprint arXiv:2206.10996 (2022).
[5]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.
[6]
Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15750--15758.
[7]
Xinlei Chen, Saining Xie, and Kaiming He. 2021. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9640--9649.
[8]
Yehansen Chen, Lin Wan, Zhihang Li, Qianyan Jing, and Zongyuan Sun. 2021. Neural feature search for rgb-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 587--597.
[9]
Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10257--10266.
[10]
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training. In IJCAI, Vol. 1. 6.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[12]
Linus Ericsson, Henry Gouk, Chen Change Loy, and Timothy M Hospedales. 2022. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine 39, 3 (2022), 42--62.
[13]
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He. 2021. CM-NAS: Cross-modality neural architecture search for visible-infrared person reidentification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11823--11832.
[14]
Shin Fujieda, Kohei Takayama, and Toshiya Hachisuka. 2018. Wavelet convolutional neural networks. arXiv preprint arXiv:1805.08620 (2018).
[15]
Yajun Gao, Tengfei Liang, Yi Jin, Xiaoyan Gu, Wu Liu, Yidong Li, and Congyan Lang. 2021. MSO: Multi-feature space joint optimization network for rgb-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 5257--5265.
[16]
Xin Hao, Sanyuan Zhao, Mang Ye, and Jianbing Shen. 2021. Cross-modality person re-identification via modality confusion and center aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16403--16412.
[17]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729--9738.
[18]
Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, and Tao Mei. 2020. Fastreid: A pytorch toolbox for general instance re-identification. arXiv preprint arXiv:2006.02631 (2020).
[19]
Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. 2021. Tran-sreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15013--15022.
[20]
Kongzhu Jiang, Tianzhu Zhang, Xiang Liu, Bingqiao Qian, Yongdong Zhang, and Feng Wu. 2022. Cross-Modality Transformer for Visible-Infrared Person Re-Identification. In European Conference on Computer Vision. Springer, 480--496.
[21]
Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible cross-modal person re-identification with an x modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4610--4617.
[22]
Junnan Li, Pan Zhou, Caiming Xiong, and Steven CH Hoi. 2020. Prototypical con-trastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020).
[23]
Haijun Liu, Xiaoheng Tan, and Xichuan Zhou. 2020. Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Transactions on Multimedia 23 (2020), 4414--4425.
[24]
Hu Lu, Xuezhang Zou, and Pingping Zhang. 2022. Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification. arXiv preprint arXiv:2212.00226 (2022).
[25]
Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, and Nenghai Yu. 2020. Cross-modality person re-identification with shared-specific feature transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13379--13389.
[26]
Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, and Rongrong Ji. 2022. X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval. In Proceedings of the 30th ACM International Conference on Multimedia. 638--647.
[27]
Changtao Miao, Zichang Tan, Qi Chu, Huan Liu, Honggang Hu, and Nenghai Yu. 2023. F 2 Trans: High-Frequency Fine-Grained Transformer for Face Forgery Detection. IEEE Transactions on Information Forensics and Security 18 (2023), 1039--1051.
[28]
Changtao Miao, Zichang Tan, Qi Chu, Nenghai Yu, and Guodong Guo. 2022. Hier-archical frequency-assisted interactive networks for face manipulation detection. IEEE Transactions on Information Forensics and Security 17 (2022), 3008--3021.
[29]
Nan Pu, Wei Chen, Yu Liu, Erwin M Bakker, and Michael S Lew. 2020. Dual gaussian-based variational subspace disentanglement for visible-infrared person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia. 2149--2158.
[30]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[31]
Lei Tan, Pingyang Dai, Rongrong Ji, and Yongjian Wu. 2022. Dynamic Prototype Mask for Occluded Person Re-Identification. In Proceedings of the 30th ACM International Conference on Multimedia. 531--540.
[32]
Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3623--3632.
[33]
Guan-An Wang, Tianzhu Zhang, Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, and Zeng-Guang Hou. 2020. Cross-modality paired-images generation for RGB-infrared person re-identification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 12144--12151.
[34]
Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ichi Satoh. 2019. Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 618--626.
[35]
Ziyu Wei, Xi Yang, Nannan Wang, and Xinbo Gao. 2021. Syncretic modality collaborative learning for visible infrared person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 225--234.
[36]
Ancong Wu, Wei-Shi Zheng, Shaogang Gong, and Jianhuang Lai. 2020. Rgb-ir person re-identification by cross-modality similarity preservation. International journal of computer vision 128, 6 (2020), 1765--1785.
[37]
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE international conference on computer vision. 5380--5389.
[38]
Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, and Rongrong Ji. 2021. Discover cross-modality nuances for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4330--4339.
[39]
Cheng Yan, Guansong Pang, Xiao Bai, Changhong Liu, Ning Xin, Lin Gu, and Jun Zhou. 2021. Beyond triplet loss: person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia (2021).
[40]
Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, and Tao Mei. 2022. Wave-vit: Unifying wavelet and transformers for visual representation learning. arXiv preprint arXiv:2207.04978 (2022).
[41]
Mang Ye, Xiangyuan Lan, Qingming Leng, and Jianbing Shen. 2020. Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Transactions on Image Processing 29 (2020), 9387--9399.
[42]
Mang Ye, Weijian Ruan, Bo Du, and Mike Zheng Shou. 2021. Channel augmented joint learning for visible-infrared recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13567--13576.
[43]
Mang Ye, Jianbing Shen, David J Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In European Conference on Computer Vision. Springer, 229--247.
[44]
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. 2021. Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence 44, 6 (2021), 2872--2893.
[45]
Mang Ye, Jianbing Shen, and Ling Shao. 2020. Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Transactions on Information Forensics and Security 16 (2020), 728--739.
[46]
Guiwei Zhang, Yongfei Zhang, Tianyu Zhang, Bo Li, and Shiliang Pu. 2023. PHA: Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14133--14142.
[47]
Liyan Zhang, Guodong Du, Fan Liu, Huawei Tu, and Xiangbo Shu. 2021. Global-local multiple granularity learning for cross-modality visible-infrared person reidentification. IEEE Transactions on Neural Networks and Learning Systems (2021).
[48]
Qiang Zhang, Changzhou Lai, Jianan Liu, Nianchang Huang, and Jungong Han. 2022. Fmcnet: Feature-level modality compensation for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7349--7358.
[49]
Tianyu Zhang, Lingxi Xie, Longhui Wei, Yongfei Zhang, Bo Li, and Qi Tian. 2020. Single camera training for person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12878--12885.
[50]
Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. 2021. Unrealperson: An adaptive pipeline towards costless person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11506--11515.
[51]
Yiyuan Zhang, Yuhao Kang, Sanyuan Zhao, and Jianbing Shen. 2022. Dual-Semantic Consistency Learning for Visible-Infrared Person Re-Identification. IEEE Transactions on Information Forensics and Security 18 (2022), 1554--1565.
[52]
Yiyuan Zhang, Sanyuan Zhao, Yuhao Kang, and Jianbing Shen. 2022. Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-Identification. In European Conference on Computer Vision. Springer, 462--479.
[53]
Kuan Zhu, Haiyun Guo, Tianyi Yan, Yousong Zhu, Jinqiao Wang, and Ming Tang. 2022. Part-Aware Self-Supervised Pre-Training for Person Re-Identification. arXiv preprint arXiv:2203.03931 (2022).

Cited By

View all
  • (2025)Auxiliary Representation Guided Network for Visible-Infrared Person Re-IdentificationIEEE Transactions on Multimedia10.1109/TMM.2024.352177327(340-355)Online publication date: 1-Jan-2025
  • (2024)Semi-supervised Visible-Infrared Person Re-identification via Modality Unification and Confidence GuidanceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680735(5761-5770)Online publication date: 28-Oct-2024
  • (2024)Joint Visual-Textual Reasoning and Visible-Infrared Modality Alignment for Person Re-Identification2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688362(1-6)Online publication date: 15-Jul-2024
  • Show More Cited By

Index Terms

  1. ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. high-frequency enhancement
    2. prototypical contrast
    3. vi-reid

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)113
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Auxiliary Representation Guided Network for Visible-Infrared Person Re-IdentificationIEEE Transactions on Multimedia10.1109/TMM.2024.352177327(340-355)Online publication date: 1-Jan-2025
    • (2024)Semi-supervised Visible-Infrared Person Re-identification via Modality Unification and Confidence GuidanceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680735(5761-5770)Online publication date: 28-Oct-2024
    • (2024)Joint Visual-Textual Reasoning and Visible-Infrared Modality Alignment for Person Re-Identification2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688362(1-6)Online publication date: 15-Jul-2024
    • (2024)MP2PMatchJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104128100:COnline publication date: 17-Jul-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media