research-article

Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval

Authors:

Jinhui TangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 8922 - 8931

https://doi.org/10.1145/3581783.3612009

Published: 27 October 2023 Publication History

Abstract

Text-based person retrieval is a challenging task that aims to search pedestrian images with the same identity according to language descriptions. Current methods usually indiscriminately measure the similarity between text and image by matching global visual-textual features and matched local region-word features. However, these methods underestimate the key cue role of mismatched region-word pairs and ignore the problem of low similarity between matched region-word pairs. To alleviate these issues, we propose a novel Pedestrian-specific Bipartite-aware Similarity Learning (PBSL) framework that efficiently reveals the plausible and credible levels of contribution of pedestrian-specific mismatched and matched region-word pairs towards overall similarity. Specifically, to focus on mismatched region-word pairs, we first develop a new co-interactive attention that utilizes cross-modal information to guide the extraction of pedestrian-specific information in a single modality. We then design a negative similarity regularization mechanism to use the negative similarity score as a bias to correct the overall similarity. Additionally, to enhance the contribution of matched region-word pairs, we introduce graph networks to aggregate and propagate local information of pedestrian-specific, using overall visual-textual similarity to evaluate locally matched region-word pairs for weight refinement. Finally, extensive experiments are conducted on the CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets to demonstrate the competitive performance of the proposed PBSL in the text-based person retrieval task.

References

[1]

Yucheng Chen, Rui Huang, Hong Chang, Chuanqi Tan, Tao Xue, and Bingpeng Ma. 2021. Cross-Modal Knowledge Adaptation for Language-Based Person Search. IEEE Transactions on Image Processing, Vol. 30 (2021), 4057--4069.

Digital Library

[2]

Yuhao Chen, Guoqing Zhang, Yujiang Lu, Zhenxing Wang, and Yuhui Zheng. 2022. TIPCB: A simple but effective part-based convolutional baseline for text-based person search. Neurocomputing, Vol. 494 (2022), 171--181.

[3]

Zefeng Ding, Changxing Ding, Zhiyin Shao, and Dacheng Tao. 2021. Semantically self-aligned network for text-to-image part-aware person re-identification. arXiv preprint arXiv:2107.12666 (2021).

[4]

Neng Dong, Liyan Zhang, Shuanglin Yan, Hao Tang, and Jinhui Tang. 2023. Erasing, Transforming, and Noising Defense Network for Occluded Person Re-Identification. arXiv preprint arXiv:2307.07187 (2023).

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR.

[6]

Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, et al. 2022. An empirical study of training end-to-end vision-and-language transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18166--18176.

[7]

Xiaoyu Du, Xiang Wang, Xiangnan He, Zechao Li, Jinhui Tang, and Tat-Seng Chua. 2020. How to learn item representation for cold-start multimedia recommendation?. In Proceedings of the 28th ACM International Conference on Multimedia. 3469--3477.

Digital Library

[8]

Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. Invariant Representation Learning for Multimedia Recommendation. In Proceedings of the 30th ACM International Conference on Multimedia. 619--628.

Digital Library

[9]

Ammarah Farooq, Muhammad Awais, Josef Kittler, and Syed Safwan Khalid. 2022. AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI.

[10]

Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence labelling with recurrent neural networks (2012), 37--45.

[11]

Xiao Han, Sen He, Li Zhang, and Tao Xiang. 2021. Text-Based Person Search with Limited Data. In 32nd British Machine Vision Conference, BMVC.

[12]

Lisa Anne Hendricks, John Mellor, Rosalia Schneider, Jean-Baptiste Alayrac, and Aida Nematzadeh. 2021. Decoupling the role of data, attention, and losses in multimodal transformers. Transactions of the Association for Computational Linguistics, Vol. 9 (2021), 570--585.

[13]

Ding Jiang and Mang Ye. 2023. Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

[14]

Ya Jing, Chenyang Si, Junbo Wang, Wei Wang, Liang Wang, and Tieniu Tan. 2020. Pose-guided multi-granularity attention network for text-based person search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11189--11196.

[15]

Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In European Conference on Computer Vision, ECCV.

Digital Library

[16]

Shiping Li, Min Cao, and Min Zhang. 2022. Learning Semantic-Aligned Feature Representation for Text-Based Person Search. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP.

[17]

Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, and Xiaogang Wang. 2017a. Identity-aware textual-visual matching with latent co-attention. In Proceedings of the IEEE International Conference on Computer Vision. 1890--1899.

[18]

Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017b. Person search with natural language description. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1970--1979.

[19]

Zechao Li, Hao Tang, Zhimao Peng, Guo-Jun Qi, and Jinhui Tang. 2023. Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Transactions on Neural Networks and Learning Systems (2023).

[20]

Jiawei Liu, Zheng-Jun Zha, Richang Hong, Meng Wang, and Yongdong Zhang. 2019. Deep adversarial graph attention convolution network for text-based person search. In 27th ACM International Conference on Multimedia, MM.

Digital Library

[21]

Xinchen Liu, Wu Liu, Jinkai Zheng, Chenggang Yan, and Tao Mei. 2020. Beyond the parts: Learning multi-view cross-part correlation for vehicle re-identification. In Proceedings of the 28th ACM International Conference on Multimedia. 907--915.

Digital Library

[22]

Kai Niu, Linjiang Huang, Yan Huang, Peng Wang, Liang Wang, and Yanning Zhang. 2022. Cross-modal Co-occurrence Attributes Alignments for Person Search by Language. In Proceedings of the 30th ACM International Conference on Multimedia. 4426--4434.

Digital Library

[23]

Kai Niu, Yan Huang, Wanli Ouyang, and Liang Wang. 2020b. Improving description-based person re-identification by multi-granularity image-text alignments. IEEE Transactions on Image Processing, Vol. 29 (2020), 5542--5556.

[24]

Kai Niu, Yan Huang, and Liang Wang. 2020a. Textual Dependency Embedding for Person Search by Language. In 28th ACM International Conference on Multimedia, MM.

Digital Library

[25]

Biao Qian, Yang Wang, Richang Hong, and Meng Wang. 2023 a. Adaptive Data-Free Quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7960--7968.

[26]

Biao Qian, Yang Wang, Richang Hong, and Meng Wang. 2023 b. Rethinking data-free quantization as a zero-sum game. arXiv preprint arXiv:2302.09572 (2023).

[27]

Biao Qian, Yang Wang, Hongzhi Yin, Richang Hong, and Meng Wang. 2022. Switchable online knowledge distillation. In European Conference on Computer Vision. Springer, 449--466.

Digital Library

[28]

Nikolaos Sarafianos, Xiang Xu, and Ioannis A. Kakadiaris. 2019. Adversarial representation learning for text-to-image matching. In IEEE/CVF International Conference on Computer Vision, ICCV.

[29]

Zhiyin Shao, Xinyu Zhang, Meng Fang, Zhifeng Lin, Jian Wang, and Changxing Ding. 2022. Learning Granularity-Unified Representations for Text-to-Image Person Re-identification. In 30th ACM International Conference on Multimedia, MM.

[30]

Fei Shen, Xiaoyu Du, Liyan Zhang, and Jinhui Tang. 2023 a. Triplet Contrastive Learning for Unsupervised Vehicle Re-identification. arXiv preprint arXiv:2301.09498 (2023).

[31]

Fei Shen, Xiaoxiao Peng, Lisheng Wang, Xingmeng Zhang, Mei Shu, and Yayun Wang. 2022. HSGM: A Hierarchical Similarity Graph Module for Object Re-identification. In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.

[32]

Fei Shen, Yi Xie, Jianqing Zhu, Xiaobin Zhu, and Huanqiang Zeng. 2023 b. Git: Graph interactive transformer for vehicle re-identification. IEEE Transactions on Image Processing, Vol. 32 (2023), 1039--1051.

[33]

Fei Shen, Jianqing Zhu, Xiaobin Zhu, Jingchang Huang, Huanqiang Zeng, Zhen Lei, and Canhui Cai. 2021a. An Efficient Multiresolution Network for Vehicle Reidentification. IEEE Internet of Things Journal, Vol. 9, 11 (2021), 9049--9059.

[34]

Fei Shen, Jianqing Zhu, Xiaobin Zhu, Yi Xie, and Jingchang Huang. 2021b. Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 7 (2021), 8793--8804.

Digital Library

[35]

Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, and Xiao Wang. 2022. See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval. In European Conference on Computer Vision Workshop on Real-World Surveillance, ECCVW.

[36]

Teng Sun, Liqiang Jing, Yinwei Wei, Xuemeng Song, Zhiyong Cheng, and Liqiang Nie. 2023. Dual Consistency-enhanced Semi-supervised Sentiment Analysis towards COVID-19 Tweets. In IEEE Transactions on Knowledge and Data Engineering. IEEE.

[37]

Teng Sun, Chun Wang, Xuemeng Song, Fuli Feng, and Liqiang Nie. 2022b. Response generation by jointly modeling personalized linguistic styles and emotions. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 18, 2 (2022), 1--20.

Digital Library

[38]

Teng Sun, Wenjie Wang, Liqaing Jing, Yiran Cui, Xuemeng Song, and Liqiang Nie. 2022a. Counterfactual reasoning for out-of-distribution multimodal sentiment analysis. In Proceedings of the 30th ACM International Conference on Multimedia. 15--23.

Digital Library

[39]

Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In European Conference on Computer Vision, ECCV.

Digital Library

[40]

Hao Tang, Zechao Li, Zhimao Peng, and Jinhui Tang. 2020. Blockmix: meta regularization and self-calibrated inference for metric-based meta-learning. In Proceedings of the 28th ACM international conference on multimedia. 610--618.

Digital Library

[41]

Hao Tang, Chengcheng Yuan, Zechao Li, and Jinhui Tang. 2022. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition, Vol. 130 (2022), 108792.

Digital Library

[42]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[43]

Chengji Wang, Zhiming Luo, Yaojin Lin, and Shaozi Li. 2021a. Text-based Person Search via Multi-Granularity Embedding Learning. In Thirtieth International Joint Conference on Artificial Intelligence, IJCAI.

[44]

Yang Wang. 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 17, 1s (2021), 1--25.

Digital Library

[45]

Yang Wang, Jinjia Peng, Huibing Wang, and Meng Wang. 2022b. Progressive learning with multi-scale attention network for cross-domain vehicle re-identification. Science China Information Sciences, Vol. 65, 6 (2022), 160103.

[46]

Zhe Wang, Zhiyuan Fang, Jun Wang, and Yezhou Yang. 2020a. Vitaa: Visual-textual attributes alignment in person search by natural language. In European Conference on Computer Vision, ECCV.

Digital Library

[47]

Zheng Wang, Zhenwei Gao, Xing Xu, Yadan Luo, Yang Yang, and Heng Tao Shen. 2022a. Point to Rectangle Matching for Image Text Retrieval. In Proceedings of the 30th ACM International Conference on Multimedia. 4977--4986.

Digital Library

[48]

Zijie Wang, Jingyi Xue, Aichun Zhu, Yifeng Li, Mingyi Zhang, and Chongliang Zhong. 2021b. AMEN: Adversarial Multi-space Embedding Network for TextBased Person Re-identification. In Chinese Conference on Pattern Recognition and Computer Vision, PRCV.

[49]

Zijie Wang, Aichun Zhu, Jingyi Xue, Daihong Jiang, Chao Liu, Yifeng Li, and Fangqiang Hu. 2022c. SUM: Serialized Updating and Matching for text-based person retrieval. Knowledge-Based Systems, Vol. 248 (2022), 108891.

Digital Library

[50]

Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, and Yifeng Li. 2022d. CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval. In 30th ACM International Conference on Multimedia, MM.

[51]

Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, and Yifeng Li. 2022 e. Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold. In 30th ACM International Conference on Multimedia, MM.

Digital Library

[52]

Zijie Wang, Aichun Zhu, Zhe Zheng, Jing Jin, Zhouxin Xue, and Gang Hua. 2020b. IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification. Journal of Electronic Imaging, Vol. 29, 4 (2020), 043028.

[53]

Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 79--88.

[54]

Boqiang Xu, Lingxiao He, Xingyu Liao, Wu Liu, Zhenan Sun, and Tao Mei. 2020. Black re-id: A head-shoulder descriptor for the challenging problem of person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia. 673--681.

Digital Library

[55]

Wenhao Xu, Zhiyin Shao, and Changxing Ding. 2023. Mining False Positive Examples for Text-Based Person Re-identification. arXiv preprint arXiv:2303.08466 (2023).

[56]

Rui Yan, Lingxi Xie, Xiangbo Shu, Liyan Zhang, and Jinhui Tang. 2023. Progressive Instance-Aware Feature Learning for Compositional Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[57]

Rui Yan, Lingxi Xie, Jinhui Tang, Xiangbo Shu, and Qi Tian. 2020. HiGCIN: Hierarchical graph-based cross inference network for group activity recognition. IEEE transactions on pattern analysis and machine intelligence (2020).

[58]

Shuanglin Yan, Neng Dong, Liyan Zhang, and Jinhui Tang. 2022a. CLIP-Driven Fine-grained Text-Image Person Re-identification. arXiv preprint arXiv:2210.10276 (2022).

[59]

Shuanglin Yan, Hao Tang, Liyan Zhang, and Jinhui Tang. 2022b. Image-specific information suppression and implicit local alignment for text-based person search. arXiv preprint arXiv:2208.14365 (2022).

[60]

Shuanglin Yan, Yafei Zhang, Minghong Xie, Dacheng Zhang, and Zhengtao Yu. 2022c. Cross-domain person re-identification with pose-invariant feature decomposition and hypergraph structure alignment. Neurocomputing, Vol. 467 (2022), 229--241.

Digital Library

[61]

Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, and Tat-Seng Chua. 2020a. Tree-augmented cross-modal encoding for complex-query video retrieval. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 1339--1348.

Digital Library

[62]

Xun Yang, Xiaoyu Du, and Meng Wang. 2020b. Learning to match on graph for fashion compatibility modeling. In Proceedings of the AAAI Conference on artificial intelligence, Vol. 34. 287--294.

[63]

Xun Yang, Fuli Feng, Wei Ji, Meng Wang, and Tat-Seng Chua. 2021. Deconfounded video moment retrieval with causal intervention. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1--10.

Digital Library

[64]

Xun Yang, Meng Wang, and Dacheng Tao. 2017. Person re-identification with metric learning using privileged information. IEEE Transactions on Image Processing, Vol. 27, 2 (2017), 791--805.

[65]

Xun Yang, Shanshan Wang, Jian Dong, Jianfeng Dong, Meng Wang, and Tat-Seng Chua. 2022. Video moment retrieval with cross-modal neural architecture search. IEEE Transactions on Image Processing, Vol. 31 (2022), 1204--1216.

[66]

Zican Zha, Hao Tang, Yunlian Sun, and Jinhui Tang. 2023. Boosting few-shot fine-grained recognition with background suppression and foreground alignment. IEEE Transactions on Circuits and Systems for Video Technology (2023).

Digital Library

[67]

Huatian Zhang, Zhendong Mao, Kun Zhang, and Yongdong Zhang. 2022. Show your faith: Cross-modal confidence-aware network for image-text matching. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3262--3270.

[68]

Ying Zhang and Huchuan Lu. 2018. Deep cross-modal projection learning for image-text matching. In Proceedings of the European conference on computer vision (ECCV). 686--701.

Digital Library

[69]

Kecheng Zheng, Wu Liu, Jiawei Liu, Zheng-Jun Zha, and Tao Mei. 2020a. Hierarchical gumbel attention network for text-based person search. In 28th ACM International Conference on Multimedia, MM.

Digital Library

[70]

Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, and Tao Mei. 2020b. VehicleNet: Learning robust visual representation for vehicle re-identification. IEEE Transactions on Multimedia, Vol. 23 (2020), 2683--2693.

Digital Library

[71]

Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, and Yi-Dong Shen. 2020c. Dual-path convolutional image-text embeddings with instance loss. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 16, 2 (2020), 51:1--51:23.

Digital Library

[72]

Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the IEEE international conference on computer vision. 3754--3762.

[73]

Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, and Gang Hua. 2021. DSSL: deep surroundings-person separation learning for text-based person retrieval. In Proceedings of the 29th ACM International Conference on Multimedia. 209--217.

Digital Library

Cited By

Xie JZheng JFang WCai YLi Q(2025)Explicitly diverse visual question generationNeural Networks10.1016/j.neunet.2024.107002184(107002)Online publication date: Apr-2025
https://doi.org/10.1016/j.neunet.2024.107002
Zheng AMa ZSun YWang ZLi CTang J(2025)Flare-aware cross-modal enhancement network for multi-spectral vehicle Re-identificationInformation Fusion10.1016/j.inffus.2024.102800116(102800)Online publication date: Apr-2025
https://doi.org/10.1016/j.inffus.2024.102800
Li ZLi JShi YLing HChen JWang RHuang SLarson K(2024)Cross-modal generation and alignment via attribute-guided prompt for unsupervised text-based person retrievalProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/116(1047-1055)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/116
Show More Cited By

Index Terms

Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Top-k retrieval in databases

Recommendations

Learning similarity with cosine similarity ensemble

This paper proposes a cosine similarity ensemble (CSE) method to learn similarity.CSE is a selective ensemble and combines multiple cosine similarity learners.A learner redefines the pattern vectors and determines its threshold adaptively.Experimental ...
Multiperspective Graph-Theoretic Similarity Measure
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Determining the similarity between two objects is pertinent to many applications. When the basis for similarity is a set of object-to-object relationships, it is natural to rely on graph-theoretic measures. One seminal technique for measuring the ...
Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia Modeling

Emerging multimedia content including images and texts are always jointly utilized to describe the same semantics. As a result, cross-media retrieval becomes increasingly important, which is able to retrieve the results of the same semantics with the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of Jiangsu Province
2021 Jiangsu Shuangchuang (Mass Innovation and Entrepreneurship) Talent Program
National Natural Science Founda- tion of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
533
Total Downloads

Downloads (Last 12 months)316
Downloads (Last 6 weeks)40

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xie JZheng JFang WCai YLi Q(2025)Explicitly diverse visual question generationNeural Networks10.1016/j.neunet.2024.107002184(107002)Online publication date: Apr-2025
https://doi.org/10.1016/j.neunet.2024.107002
Zheng AMa ZSun YWang ZLi CTang J(2025)Flare-aware cross-modal enhancement network for multi-spectral vehicle Re-identificationInformation Fusion10.1016/j.inffus.2024.102800116(102800)Online publication date: Apr-2025
https://doi.org/10.1016/j.inffus.2024.102800
Li ZLi JShi YLing HChen JWang RHuang SLarson K(2024)Cross-modal generation and alignment via attribute-guided prompt for unsupervised text-based person retrievalProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/116(1047-1055)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/116
Shen FDu XZhang LShu XTang J(2024)Triplet Contrastive Representation Learning for Unsupervised Vehicle Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3695255Online publication date: 6-Sep-2024
https://dl.acm.org/doi/10.1145/3695255
Wang JCao DLu SMa ZXiao JChua TCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Causal-driven Large Language Models with Faithful Reasoning for Knowledge Question AnsweringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681263(4331-4340)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681263
Yan SLiu JDong NZhang LTang JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Prototypical Prompting for Text-to-image Person Re-identificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681165(2331-2340)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681165
Chen WLusi AGao QBian SLi BGuo JZhang DYang CHu WHuang F(2024)CB-YOLO: Dense Object Detection of YOLO for Crowded Wheat Head Identification and LocalizationJournal of Circuits, Systems and Computers10.1142/S0218126625500793Online publication date: 27-Nov-2024
https://doi.org/10.1142/S0218126625500793
Wang MSun YXiang JZhong Y(2024)CITNet: Convolution Interaction Transformer Network for Hyperspectral and LiDAR Image ClassificationIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.347796562(1-18)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3477965
Weng WWei MRen JShen F(2024)Enhancing Aerial Object Detection With Selective Frequency Interaction NetworkIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33810965:12(6109-6120)Online publication date: Dec-2024
https://doi.org/10.1109/TAI.2024.3381096
Jia JLee GWang ZLyu ZHe Y(2024)Siamese Meets Diffusion Network: SMDNet for Enhanced Change Detection in High-Resolution RS ImageryIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.338454517(8189-8202)Online publication date: 2024
https://doi.org/10.1109/JSTARS.2024.3384545
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten