research-article

Context Sensing Attention Network for Video-based Person Re-identification

Authors:

Changxing Ding,

Jianxin Pang, and

Xiangmin XuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 4

Article No.: 143, Pages 1 - 20

https://doi.org/10.1145/3573203

Published: 27 February 2023 Publication History

Abstract

Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames. Recent approaches handle this problem using temporal aggregation strategies. In this work, we propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps. First, we introduce the Context Sensing Channel Attention (CSCA) module, which emphasizes responses from informative channels for each frame. These informative channels are identified with reference not only to each individual frame, but also to the content of the entire sequence. Therefore, CSCA explores both the individuality of each frame and the global context of the sequence. Second, we propose the Contrastive Feature Aggregation (CFA) module, which predicts frame weights for temporal aggregation. Here, the weight for each frame is determined in a contrastive manner: i.e., not only by the quality of each individual frame, but also by the average quality of the other frames in a sequence. Therefore, it effectively promotes the contribution of relatively good frames. Extensive experimental results on four datasets show that CSA-Net consistently achieves state-of-the-art performance.

References

[1]

Abhishek Aich, Meng Zheng, Srikrishna Karanam, Terrence Chen, Amit K. Roy-Chowdhury, and Ziyan Wu. 2021. Spatio-temporal representation factorization for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 152–162.

[2]

Cuiqun Chen, Mang Ye, Meibin Qi, Jingjing Wu, Yimin Liu, and Jianguo Jiang. 2022. Saliency and granularity: Discovering temporal coherence for video-based person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 32, 9 (2022), 6100–6112. DOI:

[3]

Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, and Xiaogang Wang. 2018. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1169–1178.

[4]

Guangyi Chen, Yongming Rao, Jiwen Lu, and Jie Zhou. 2020. Temporal coherence or temporal motion: Which is more critical for video-based person re-identification? In Proceedings of the European Conference on Computer Vision. Springer, 660–676.

Digital Library

[5]

Zengqun Chen, Zhiheng Zhou, Junchu Huang, Pengyu Zhang, and Bo Li. 2020. Frame-guided region-aligned representation for video person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence. 10591–10598.

[6]

Changxing Ding, Kan Wang, Pengfei Wang, and Dacheng Tao. 2022. Multi-task learning with coarse priors for robust part-aware person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022), 1474–1488.

[7]

Chanho Eom, Geon Lee, Junghyup Lee, and Bumsub Ham. 2021. Video-based person re-identification with spatial and temporal memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12036–12045.

[8]

Hehe Fan, Liang Zheng, Chenggang Yan, and Yi Yang. 2018. Unsupervised person re-identification: Clustering and fine-tuning. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 4 (2018), 1–18.

Digital Library

[9]

Pengfei Fang, Pan Ji, Lars Petersson, and Mehrtash Harandi. 2021. Set augmented triplet loss for video person re-identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 464–473.

[10]

Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2009. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2009), 1627–1645.

Digital Library

[11]

Yang Fu, Xiaoyang Wang, Yunchao Wei, and Thomas Huang. 2019. Sta: Spatial-temporal attention for large-scale video-based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence. 8287–8294.

Digital Library

[12]

Yajun Gao, Tengfei Liang, Yi Jin, Xiaoyan Gu, Wu Liu, Yidong Li, and Congyan Lang. 2021. MSO: Multi-feature space joint optimization network for RGB-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 5257–5265.

Digital Library

[13]

Wenhang Ge, Chunyan Pan, Ancong Wu, Hongwei Zheng, and Wei-Shi Zheng. 2021. Cross-camera feature prediction for intra-camera supervised person re-identification across distant scenes. In Proceedings of the 29th ACM International Conference on Multimedia. 3644–3653.

Digital Library

[14]

Xinqian Gu, Hong Chang, Bingpeng Ma, and Shiguang Shan. 2022. Motion feature aggregation for video-based person re-identification. IEEE Transactions on Image Processing 31 (2022), 3908–3919. DOI:

Digital Library

[15]

Xinqian Gu, Hong Chang, Bingpeng Ma, Hongkai Zhang, and Xilin Chen. 2020. Appearance-preserving 3d convolution for video-based person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 228–243.

Digital Library

[16]

Xinqian Gu, Bingpeng Ma, Hong Chang, Shiguang Shan, and Xilin Chen. 2019. Temporal knowledge propagation for image-to-video person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9647–9656.

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–778.

[18]

Ruibing Hou, Hong Chang, Bingpeng Ma, Rui Huang, and Shiguang Shan. 2021. BiCnet-TKS: Learning efficient spatial-temporal representation for video person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2014–2023.

[19]

Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2020. Temporal complementary learning for video person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 388–405.

Digital Library

[20]

Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, and Xilin Chen. 2019. Interaction-and-aggregation network for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9317–9326.

[21]

Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, and Xilin Chen. 2019. VRSTC: Occlusion-free video person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7183–7192.

[22]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7132–7141.

[23]

Shuping Hu, Kan Wang, Jun Cheng, Huan Tan, and Jianxin Pang. 2022. Triplet ratio loss for robust person re-identification. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Springer, 42–54.

Digital Library

[24]

Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, and Shiliang Zhang. 2019. Global-local temporal representations for video person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3958–3967.

[25]

Jianing Li, Shiliang Zhang, and Tiejun Huang. 2019. Multi-scale 3d convolution network for video based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8618–8625.

Digital Library

[26]

Jianing Li, Shiliang Zhang, and Tiejun Huang. 2020. Multi-scale temporal cues learning for video person re-identification. IEEE Transactions on Image Processing 29 (2020), 4461–4473. DOI:

[27]

Mengliu Li, Han Xu, Jinjun Wang, Wenpeng Li, and Yongli Sun. 2020. Temporal aggregation with clip-level attention for video-based person re-identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.

[28]

Shuang Li, Slawomir Bak, Peter Carr, and Xiaogang Wang. 2018. Diversity regularized spatiotemporal attention for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 369–378.

[29]

Zhaoju Li, Zongwei Zhou, Nan Jiang, Zhenjun Han, Junliang Xing, and Jianbin Jiao. 2020. Spatial preserved graph convolution networks for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 1–14.

Digital Library

[30]

Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, and Jiashi Feng. 2018. Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2018), 2788–2802.

Digital Library

[31]

Jiawei Liu, Zheng-Jun Zha, Xuejin Chen, Zilei Wang, and Yongdong Zhang. 2019. Dense 3D-convolutional neural network for person re-identification in videos. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 1–19.

Digital Library

[32]

Liangchen Liu, Xi Yang, Nannan Wang, and Xinbo Gao. 2021. Viewing from frequency domain: A DCT-based information enhancement network for video person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 227–235.

Digital Library

[33]

Xuehu Liu, Pingping Zhang, Chenyang Yu, Huchuan Lu, and Xiaoyun Yang. 2021. Watching you: Global-guided reciprocal learning for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13334–13343.

[34]

Yu Liu, Junjie Yan, and Wanli Ouyang. 2017. Quality aware network for set to set recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5790–5799.

[35]

Yiheng Liu, Zhenxun Yuan, Wengang Zhou, and Houqiang Li. 2019. Spatial and temporal mutual promotion for video-based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8786–8793.

Digital Library

[36]

Neeraj Matiyali and Gaurav Sharma. 2020. Video person re-identification using learned clip similarity aggregation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2655–2664.

[37]

Niall McLaughlin, Jesus Martinez Del Rincon, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1325–1334.

[38]

Bo Pang, Deming Zhai, Junjun Jiang, and Xianming Liu. 2022. Fully unsupervised person re-identification via selective contrastive learning. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2 (2022), 1–15.

Digital Library

[39]

Zequn Qin, Pengyi Zhang, Fei Wu, and Xi Li. 2021. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 783–792.

[40]

Dripta S. Raychaudhuri and Amit K. Roy-Chowdhury. 2020. Exploiting temporal coherence for self-supervised one-shot video re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 258–274.

Digital Library

[41]

Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision. Springer, 17–35.

[42]

Weijian Ruan, Chao Liang, Yi Yu, Zheng Wang, Wu Liu, Jun Chen, and Jiayi Ma. 2020. Correlation discrepancy insight network for video re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 4 (2020), 1–21.

Digital Library

[43]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 815–823.

[44]

Chen Shen, Zhongming Jin, Wenqing Chu, Rongxin Jiang, Yaowu Chen, Guo-Jun Qi, and Xian-Sheng Hua. 2019. Multi-level similarity perception network for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 2 (2019), 1–19.

Digital Library

[45]

Guanglu Song, Biao Leng, Yu Liu, Congrui Hetang, and Shaofan Cai. 2018. Region-based quality estimation network for large-scale person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[46]

Arulkumar Subramaniam, Athira Nambiar, and Anurag Mittal. 2019. Co-segmentation inspired attention networks for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 562–572.

[47]

Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision. 480–496.

Digital Library

[48]

Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1139–1147.

Digital Library

[49]

Zengming Tang and Jun Huang. 2022. Harmonious multi-branch network for person re-identification with harder triplet loss. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 4 (2022), 1–21.

Digital Library

[50]

Haoran Wang, Licheng Jiao, Fang Liu, Lingling Li, Xu Liu, Deyi Ji, and Weihao Gan. 2021. IPGN: Interactiveness proposal graph network for human-object interaction detection. IEEE Transactions on Image Processing 30 (2021), 6583–6593. DOI:

Digital Library

[51]

Haoran Wang, Licheng Jiao, Shuyuan Yang, Lingling Li, and Zexin Wang. 2020. Simple and effective: Spatial rescaling for person reidentification. IEEE Transactions on Neural Networks and Learning Systems 33, 1 (2020), 145–156. DOI:

[52]

Hanzheng Wang, Jiaqi Zhao, Yong Zhou, Rui Yao, Ying Chen, and Silin Chen. 2021. AMC-net: Attentive modality-consistent network for visible-infrared person re-identification. Neurocomputing 463 (2021), 226–236. DOI:

Digital Library

[53]

Kan Wang, Changxing Ding, Stephen J. Maybank, and Dacheng Tao. 2020. CDPM: Convolutional deformable part models for semantically aligned person re-identification. IEEE Transactions on Image Processing 29 (2020), 3416–3428. DOI:

Digital Library

[54]

Kan Wang, Shuping Hu, Jun Cheng, Jianxin Pang, and Huan Tan. 2022. RA loss: Relation-aware loss for robust person re-identification. In Proceedings of the Asian Conference on Computer Vision. 177–194.

[55]

Kan Wang, Pengfei Wang, Changxing Ding, and Dacheng Tao. 2021. Batch coherence-driven network for part-aware person re-identification. IEEE Transactions on Image Processing 30 (2021), 3405–3418. DOI:

Digital Library

[56]

Pengfei Wang, Changxing Ding, Zhiyin Shao, Zhibin Hong, Shengli Zhang, and Dacheng Tao. 2022. Quality-aware part models for occluded person re-identification. IEEE Transactions on Multimedia (2022). DOI:

Digital Library

[57]

Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the European Conference on Computer Vision. Springer, 688–703.

[58]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2017. Non-local neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7794–7803.

[59]

Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5177–5186.

[60]

Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 4733–4742.

[61]

Sheng Xu, Chang Liu, Baochang Zhang, Jinhu Lü, Guodong Guo, and David Doermann. 2022. BiRe-ID: Binary neural network for efficient person re-ID. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 1s (2022), 1–22.

Digital Library

[62]

Yichao Yan, Jie Qin, Jiaxin Chen, Li Liu, Fan Zhu, Ying Tai, and Ling Shao. 2020. Learning multi-granular hypergraphs for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2899–2908.

[63]

Jinrui Yang, Wei-Shi Zheng, Qize Yang, Yingcong Chen, and Qi Tian. 2020. Spatial-temporal graph convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3289–3299.

[64]

Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3 (2017), 1–23.

Digital Library

[65]

Guowen Zhang, Pingping Zhang, Jinqing Qi, and Huchuan Lu. 2021. Hat: Hierarchical aggregation transformers for person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 516–525.

Digital Library

[66]

Wenyu Zhang, Qing Ding, Jian Hu, Yi Ma, and Mingzhe Lu. 2021. Pixel-wise graph attention networks for person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 5231–5238.

Digital Library

[67]

Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, and Zhibo Chen. 2020. Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10407–10416.

[68]

Yiru Zhao, Xu Shen, Zhongming Jin, Hongtao Lu, and Xian-sheng Hua. 2019. Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4913–4922.

[69]

Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 868–884.

[70]

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision. 1116–1124.

Digital Library

[71]

Meng Zheng, Srikrishna Karanam, Ziyan Wu, and Richard J. Radke. 2019. Re-identification with consistent attentive siamese networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5735–5744.

Digital Library

[72]

Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2017), 1–20.

Digital Library

[73]

Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence. 13001–13008.

[74]

Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4747–4756.

Cited By

Zhang YLin XYang HHe JQing LHe XLi YChen H(2024)A Multi-Attention Feature Distillation Neural Network for Lightweight Single Image Super-ResolutionInternational Journal of Intelligent Systems10.1155/2024/32552332024Online publication date: 15-Feb-2024
https://dl.acm.org/doi/10.1155/2024/3255233
Zhang DZhu WLiao XQi FYang GDing X(2024)Spatiotemporal Inconsistency Learning and Interactive Fusion for Deepfake Video DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3664654Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664654
Zhang PLiu MSong XCao DGao ZNie L(2024)Universal Relocalizer for Weakly Supervised Referring Expression GroundingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604520:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3656045
Show More Cited By

Index Terms

Context Sensing Attention Network for Video-based Person Re-identification
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Video-Based Convolutional Attention for Person Re-Identification
Image Analysis and Processing – ICIAP 2019
Abstract
In this paper we consider the problem of video-based person re-identification, which is the task of associating videos of the same person captured by different and non-overlapping cameras. We propose a Siamese framework in which video frames of ...
Read More
Learning discriminative features with a dual-constrained guided network for video-based person re-identification
Abstract
Video-based person re-identification (ReID) aims at matching pedestrians in a large video gallery across different cameras. However, some interference factors in most real-world scenarios, such as occlusion, pose variations and new appearances, ...
Read More
An Efficient Non-local Attention Network for Video-based Person Re-identification
ICIT '19: Proceedings of the 2019 7th International Conference on Information Technology: IoT and Smart City

A spatial and temporal attention strategy based on Non-local Networks is proposed for video-based person re-identification. The most existing methods design attention mechanisms on high-level features, which ignore the low-level features with more ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 4

July 2023

263 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3582888

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Online AM: 01 December 2022

Accepted: 25 November 2022

Revised: 25 October 2022

Received: 14 June 2022

Published in TOMM Volume 19, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Guangdong Provincial Key Laboratory of Human Digital Twin
Key-Area Research and Development Program of Guangdong Province, China
Program of Guangdong Provincial Key Laboratory of Robot Localization and Navigation Technology
Natural Science Foundation of China
Shenzhen Technology Project
CAS Key Technology Talent Program, and Guangdong Technology Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
337
Total Downloads

Downloads (Last 12 months)175
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YLin XYang HHe JQing LHe XLi YChen H(2024)A Multi-Attention Feature Distillation Neural Network for Lightweight Single Image Super-ResolutionInternational Journal of Intelligent Systems10.1155/2024/32552332024Online publication date: 15-Feb-2024
https://dl.acm.org/doi/10.1155/2024/3255233
Zhang DZhu WLiao XQi FYang GDing X(2024)Spatiotemporal Inconsistency Learning and Interactive Fusion for Deepfake Video DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3664654Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664654
Zhang PLiu MSong XCao DGao ZNie L(2024)Universal Relocalizer for Weakly Supervised Referring Expression GroundingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604520:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3656045
Ben HWang SWang MHong RGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Pseudo Content Hallucination for Unpaired Image CaptioningProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658080(320-329)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658080
Antil ADhiman C(2024)MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364081720:6(1-21)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3640817
Li MZhou THuang ZYang JYang JGong C(2024)Dynamic Weighted Adversarial Learning for Semi-Supervised Classification under Intersectional Class MismatchACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363531020:4(1-24)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3635310
Shi PHu MShi XRen F(2024)Deep Modular Co-Attention Shifting Network for Multimodal Sentiment AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363470620:4(1-23)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3634706
Feng ZXu JMa LZhang S(2024)Efficient Video Transformers via Spatial-temporal Token Merging for Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363378120:4(1-21)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3633781
Nai KChen S(2024)Learning a Novel Ensemble Tracker for Robust Visual TrackingIEEE Transactions on Multimedia10.1109/TMM.2023.330793926(3194-3206)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3307939
Chen SDa HWang DZhang XYan YZhu S(2024)HASI: Hierarchical Attention-Aware Spatio–Temporal Interaction for Video-Based Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334042834:6(4973-4988)Online publication date: Jun-2024
https://doi.org/10.1109/TCSVT.2023.3340428
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents