research-article

Progressive Attribute Embedding for Accurate Cross-modality Person Re-ID

Authors:

Ruoran JiaAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 4309 - 4317

https://doi.org/10.1145/3503161.3548336

Published: 10 October 2022 Publication History

Abstract

Attributes are important information to bridge the appearance gap across modalities, but have not been well explored in cross-modality person ReID. This paper proposes a progressive attribute embedding module (PAE) to effectively fuse the fine-grained semantic attribute information and the global structural visual information. Through a novel cascade way, we use attribute information to learn the relationship between the person images in different modalities, which significantly relieves the modality heterogeneity. Meanwhile, by embedding attribute information to guide more discriminative image feature generation, it simultaneously reduces the inter-class similarity and the intra-class discrepancy. In addition, we propose an attribute-based auxiliary learning strategy (AAL) to supervise the network to learn modality-invariant and identity-specific local features by joint attribute and identity classification losses. The PAE and AAL are jointly optimized in an end-to-end framework, namely, progressive attribute embedding network (PAENet). One can plug PAE and AAL into current mainstream models, as we implement them in five cross-modality person ReID frameworks to further boost the performance. Extensive experiments on public datasets demonstrate the effectiveness of the proposed method against the state-of-the-art cross-modality person ReID methods.

References

[1]

Yu-Tong Cao, JingyaWang, and Dacheng Tao. 2020. Symbiotic Adversarial Learning for Attribute-based Person Search. In Proceedings of the European Conference on Computer Vision (ECCV). 230--247.

Digital Library

[2]

Cuiqun Chen, Mang Ye, Meibin Qi, Jingjing Wu, Jianguo Jiang, and Chia-Wen Lin. 2022. Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification. IEEE Transactions on Image Processing 31 (2022), 2352--2364.

[3]

Yehansen Chen, Lin Wan, Zhihang Li, Qianyan Jing, and Zongyuan Sun. 2021. Neural Feature Search for Rgb-infrared Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 587--597.

[4]

Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-CMD: Hierarchical Cross-modality Disentanglement for Visible-infrared Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10257--10266.

[5]

Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality Person Re-identification with Generative Adversarial Training. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 677--683.

[6]

Chun-Mei Feng, Yunlu Yan, Geng Chen, Huazhu Fu, Yong Xu, and Ling Shao. 2021. Accelerated Multi-modal Mr Imaging with Transformers. arXiv preprint arXiv:2106.14248 (2021).

[7]

Zhanxiang Feng, Jianhuang Lai, and Xiaohua Xie. 2019. Learning Modality specific Representations for Visible-infrared Person Re-identification. IEEE Transactions on Image Processing 29 (2019), 579--590.

Digital Library

[8]

Sixue Gong, Xiaoming Liu, and Anil K Jain. 2020. Jointly De-biasing Face Recognition and Demographic Attribute Estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 330--347.

Digital Library

[9]

Yi Hao, NannanWang, Jie Li, and Xinbo Gao. 2019. HSME: Hypersphere Manifold Embedding for Visible-thermal Person Re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 8385--8392.

Digital Library

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[11]

Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, and Zheng-Jun Zha. 2022. Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-Identification. arXiv preprint arXiv:2203.01735 (2022).

[12]

Chaitra Jambigi, Ruchit Rawal, and Anirban Chakraborty. 2021. MMD-ReID: A Simple but Effective Solution for Visible-thermal Person ReID. arXiv preprint arXiv:2111.05059 (2021).

[13]

Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible Cross-modal Person Re-identification with an X Modality. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 4610--4617.

[14]

Huafeng Li, Shuanglin Yan, Zhengtao Yu, and Dapeng Tao. 2019. Attributeidentity Embedding and Self-supervised Learning for Scalable Person Reidentification. IEEE Transactions on Circuits and Systems for Video Technology 30, 10 (2019), 3472--3485.

[15]

Yongguo Ling, Zhun Zhong, Zhiming Luo, Paolo Rota, Shaozi Li, and Nicu Sebe. 2020. Class-aware Modality Mix and Center-guided Metric Learning for Visiblethermal Person Re-identification. In Proceedings of the 28th ACM International Conference on Multimedia (ACM MM). 889--897.

Digital Library

[16]

Haijun Liu, Xiaoheng Tan, and Xichuan Zhou. 2020. Parameter Sharing Exploration and Hetero-center Triplet Loss for Visible-thermal Person Re-identification. IEEE Transactions on Multimedia 23 (2020), 4414--4425.

[17]

Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, and Xiaogang Wang. 2017. Hydraplus-net: Attentive Deep Features for Pedestrian Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 350--359.

[18]

Yan Lu, YueWu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, and Nenghai Yu. 2020. Cross-modality Person Re-identification with Shared-specific Feature Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13379--13389.

[19]

Dat Tien Nguyen, Hyung Gil Hong, Ki Wan Kim, and Kang Ryoung Park. 2017. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors 17, 3 (2017), 605.

[20]

Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, and Yonghong Tian. 2016. Unsupervised Cross-dataset Transfer Learning for Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1306--1315.

[21]

Wanru Song, Jieying Zheng, Yahong Wu, Changhong Chen, and Feng Liu. 2019. A Two-stage Attribute-constraint Network for Video-based Person Reidenti fication. IEEE Access 7 (2019), 8508--8518.

[22]

Zheng Tang, Milind Naphade, Stan Birchfield, Jonathan Tremblay, William Hodge, Ratnesh Kumar, ShuoWang, and Xiaodong Yang. 2019. Pamtri: Pose-aware Multitask Learning for Vehicle Re-identification Using Highly Randomized Synthetic Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 211--220.

[23]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using T-SNE. Journal of machine learning research 9, 11 (2008).

[24]

Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. Rgb-infrared Cross-modality Person Re-identification Via Joint Pixel and Feature Alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3623--3632.

[25]

Guan-An Wang, Tianzhu Zhang, Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, and Zeng-Guang Hou. 2020. Cross-modality Paired-images Generation for RGB-infrared Person Re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 12144--12151.

[26]

Jingya Wang, Xiatian Zhu, Shaogang Gong, and Wei Li. 2018. Transferable Joint Attribute-identity Deep Learning for Unsupervised Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2275--2284.

[27]

Xiaogang Wang, Gianfranco Doretto, Thomas Sebastian, Jens Rittscher, and Peter Tu. 2007. Shape and Appearance Context Modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1--8.

[28]

Zheng Wang, Junjun Jiang, Yang Wu, Mang Ye, Xiang Bai, and Shin'ichi Satoh. 2019. Learning Sparse and Identity-preserved Hidden Attributes for Person Re-identification. IEEE Transactions on Image Processing 29 (2019), 2013--2025.

[29]

Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ichi Satoh. 2019. Learning to Reduce Dual-level Discrepancy for Infrared-visible Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 618--626.

[30]

Ancong Wu, Wei-Shi Zheng, Shaogang Gong, and Jianhuang Lai. 2020. RGB-IR Person Re-identification by Cross-modality Similarity Preservation. International Journal of Computer Vision 128, 6 (2020), 1765--1785.

Digital Library

[31]

Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared Cross-modality Person Re-identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 5380--5389.

[32]

Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, and Rongrong Ji. 2021. Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4330--4339.

[33]

Yue Yao, Liang Zheng, Xiaodong Yang, Milind Naphade, and Tom Gedeon. 2020. Simulating Content Consistent Vehicle Datasets with Attribute Descent. In Proceedings of the European Conference on Computer Vision (ECCV). 775--791.

Digital Library

[34]

Mang Ye, Xiangyuan Lan, Qingming Leng, and Jianbing Shen. 2020. Crossmodality Person Re-identification via Modality-aware Collaborative Ensemble Learning. IEEE Transactions on Image Processing 29 (2020), 9387--9399.

[35]

Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong Yuen. 2018. Hierarchical Discriminative Learning for Visible Thermal Person Re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 7501--7508.

[36]

Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C Yuen. 2019. Bi-directional Center-constrained Top-ranking for Visible thermal Person Re-identification. IEEE Transactions on Information Forensics and Security 15 (2019), 407--419.

Digital Library

[37]

Mang Ye, Jianbing Shen, David J Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic Dual-attentive Aggregation Learning for Visible-infrared Person Reidentification. In Proceedings of the European Conference on Computer Vision (ECCV). 229--247.

[38]

Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. 2021. Deep Learning for Person Re-identification: A Survey and Outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1--1.

[39]

Mang Ye, Jianbing Shen, and Ling Shao. 2020. Visible-infrared Person Reidentification via Homogeneous Augmented Tri-modal Learning. IEEE Transactions on Information Forensics and Security 16 (2020), 728--739.

[40]

Jianfu Zhang, Li Niu, and Liqing Zhang. 2020. Person Re-identification with Reinforced Attribute Attention Selection. IEEE Transactions on Image Processing 30 (2020), 603--616.

Digital Library

[41]

Liyan Zhang, Guodong Du, Fan Liu, Huawei Tu, and Xiangbo Shu. 2021. Globallocal Multiple Granularity Learning for Cross-modality Visible-infrared Person Re-identification. IEEE Transactions on Neural Networks and Learning Systems (2021), 1--11.

[42]

Shikun Zhang, Changhong Chen, Wanru Song, and Zongliang Gan. 2020. Deep Feature Learning with Attributes for Cross-modality Person Re-identification. Journal of Electronic Imaging 29, 3 (2020), 033017.

[43]

Zhiwei Zhao, Bin Liu, Qi Chu, Yan Lu, and Nenghai Yu. 2021. Joint Colorirrelevant Consistency Learning and Identity-aware Modality Adaptation for Visible-infrared Cross Modality Person Re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 35. 3520--3528.

[44]

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable Person Re-identification: A Benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1116--1124.

Cited By

Liu MZhang ZBian YWang XSun YZhang BWang Y(2025)Cross-Modality Semantic Consistency Learning for Visible-Infrared Person Re-IdentificationIEEE Transactions on Multimedia10.1109/TMM.2024.352184327(568-580)Online publication date: 2025
https://doi.org/10.1109/TMM.2024.3521843
Wang JZheng AYan YHe RTang J(2024)Attribute-Guided Cross-Modal Interaction and Enhancement for Audio-Visual MatchingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338894919(4986-4998)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3388949
Li ZWang QChen LZhang XYin Y(2024)Cascaded Cross-modal Alignment for Visible-Infrared Person Re-IdentificationKnowledge-Based Systems10.1016/j.knosys.2024.112585305:COnline publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112585
Show More Cited By

Index Terms

Progressive Attribute Embedding for Accurate Cross-modality Person Re-ID
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Cross-Modality Transformer for Visible-Infrared Person Re-Identification
Computer Vision – ECCV 2022
Abstract
Visible-infrared person re-identification (VI-ReID) is a challenging task due to the large cross-modality discrepancies and intra-class variations. Existing works mainly focus on learning modality-shared representations by embedding different ...
Dual-alignment Feature Embedding for Cross-modality Person Re-identification
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Person re-identification aims at searching pedestrians across different cameras, which is a key problem in video surveillance. With requirements in night environment, RGB-infrared person re-identification which could be regarded as a cross-modality ...
Modality interactive attention for cross-modality person re-identification
Abstract
The visible-infrared person re-identification (VI-ReID) task is challenging in image retrievals because of the modality gaps between visible and infrared images. Different from the most existing methods which either strive to capture modality ...
Highlights
- Establish an interactive relation between modality-shared features and modality-specific features.
- A modality interactive attention module is introduced to narrow down the modal gap.
- Two feature extraction strategies have been ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Natural Science Foundation of Anhui Higher Education Institution of China

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
316
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)11

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu MZhang ZBian YWang XSun YZhang BWang Y(2025)Cross-Modality Semantic Consistency Learning for Visible-Infrared Person Re-IdentificationIEEE Transactions on Multimedia10.1109/TMM.2024.352184327(568-580)Online publication date: 2025
https://doi.org/10.1109/TMM.2024.3521843
Wang JZheng AYan YHe RTang J(2024)Attribute-Guided Cross-Modal Interaction and Enhancement for Audio-Visual MatchingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338894919(4986-4998)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3388949
Li ZWang QChen LZhang XYin Y(2024)Cascaded Cross-modal Alignment for Visible-Infrared Person Re-IdentificationKnowledge-Based Systems10.1016/j.knosys.2024.112585305:COnline publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112585
Mukhtar HMukhtar U(2024)Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentificationKnowledge-Based Systems10.1016/j.knosys.2024.112540(112540)Online publication date: Sep-2024
https://doi.org/10.1016/j.knosys.2024.112540
Jiang JXiao JWang RLi TZhang WRan RXiang S(2023)Graph Sampling-Based Multi-Stream Enhancement Network for Visible-Infrared Person Re-IdentificationSensors10.3390/s2318794823:18(7948)Online publication date: 18-Sep-2023
https://doi.org/10.3390/s23187948
Liang KWang XZhang HMa ZGuo JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Hierarchical Visual Attribute Learning in the WildProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612274(3415-3423)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612274
Zhu JJin JYang ZWu XWang X(2023)Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00261(2626-2629)Online publication date: Jun-2023
https://doi.org/10.1109/CVPRW59228.2023.00261

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten