default search action
7th PRCV 2024: Urumqi, China - Part V
- Zhouchen Lin, Ming-Ming Cheng, Ran He, Kurban Ubul, Wushouer Silamu, Hongbin Zha, Jie Zhou, Cheng-Lin Liu:
Pattern Recognition and Computer Vision - 7th Chinese Conference, PRCV 2024, Urumqi, China, October 18-20, 2024, Proceedings, Part V. Lecture Notes in Computer Science 15035, Springer 2025, ISBN 978-981-97-8619-0
Multi-modal Information Processing
- Zehui Wang, Zhihan Zhang, Hongtao Wang:
A Multi-modal Framework with Contrastive Learning and Sequential Encoding for Enhanced Sleep Stage Detection. 3-17 - Anran Wu, Shuwen Yang, Yujia Xia, Xingjiao Wu, Tianlong Ma, Liang He:
Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process. 18-33 - Yaokun Zhong, Tianming Liang, Jian-Fang Hu:
Time-Frequency Mutual Learning for Moment Retrieval and Highlight Detection. 34-48 - Yanyu Qi, Ruohao Guo, Zhenbo Li, Dantong Niu, Liao Qu:
Masked Visual Pre-training for RGB-D and RGB-T Salient Object Detection. 49-66 - Xian Qu, Yingyi Yang, Xiaoming Mai:
Cascade Coarse-to-Fine Point-Query Transformer for RGB-T Crowd Counting. 67-83 - Jiaqi Hu, Jiedong Zhuang, Xiaoyu Liang, Dayong Wang, Lu Yu, Haoji Hu:
Perceptual Image Compression with Text-Guided Multi-level Fusion. 84-97 - Haiwen Zhang, Zixi Yang, Yuanzhi Liu, Xinran Wang, Zheqi He, Kongming Liang, Zhanyu Ma:
Evaluating Attribute Comprehension in Large Vision-Language Models. 98-113 - Yihang Meng, Hao Cheng, Zihua Wang, Hongyuan Zhu, Xiuxian Lao, Yu Zhang:
Efficient Multi-modal Human-Centric Contrastive Pre-training with a Pseudo Body-Structured Prior. 114-128 - Yusong Hu, Yuting Gao, Zihan Xu, Ke Li, Xialei Liu:
A3R: Vision Language Pre-training by Attentive Alignment and Attentive Reconstruction. 129-142 - Yuxuan Wang, Tianwei Cao, Kongming Liang, Zhongjiang He, Hao Sun, Yongxiang Li, Zhanyu Ma:
Mixture-of-Hand-Experts: Repainting the Deformed Hand Images Generated by Diffusion Models. 143-157 - Xi Yu, Wenti Huang, Jun Long:
ConD2: Contrastive Decomposition Distilling for Multimodal Sentiment Analysis. 158-172 - Ruihao Zhang, Jinsong Geng, Cenyu Liu, Wei Zhang, Zunlei Feng, Liang Xue, Yijun Bei:
Multi-layer Tuning CLIP for Few-Shot Image Classification. 173-186 - Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao:
DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model. 187-200 - Zebao Zhang, Shuang Yang, Haiwei Pan:
Text-Dominant Interactive Attention for Cross-Modal Sentiment Analysis. 201-215 - Yuqiu Kong, Junhua Liu, Cuili Yao:
Dual Context Perception Transformer for Referring Image Segmentation. 216-230 - Min Luo, Boda Lin, Binghao Tang, Haolong Yan, Si Li:
ELEMO: Elements Focused Emotion Recognition for Sticker Images. 231-245 - Lin Cao, Wenwen Sun, Yanan Guo, Shoujing Wang, Boqian Lv:
Cross-Modal Dual Matching and Comparison for Text-to-Image Person Re-identification. 246-259 - Turghun Tayir, Lin Li, Mieradilijiang Maimaiti, Yusnur Muhtar:
Low-Resource Machine Translation with Different Granularity Image Features. 260-273 - Weinan Guan, Wei Wang, Bo Peng, Jing Dong, Tieniu Tan:
ST-SBV: Spatial-Temporal Self-Blended Videos for Deepfake Detection. 274-288 - Zichun Wang, Xu Cheng:
Learning a Robust Synthetic Modality with Dual-Level Alignment for Visible-Infrared Person Re-identification. 289-303 - Ruitao Pu, Dezhong Peng, Fujun Hua:
Deep Noisy Multi-label Learning for Robust Cross-Modal Retrieval. 304-317 - Weitao Song, Weiran Chen, Jialiang Xu, Yi Ji, Ying Li, Chunping Liu:
Uncertainty-Aware with Negative Samples for Video-Text Retrieval. 318-332 - Suyan Cheng, Feifei Zhang, Haoliang Zhou, Changsheng Xu:
Multi-modal Knowledge-Enhanced Fine-Grained Image Classification. 333-346 - Jiaxi Wang, Wenhui Hu, Xueyang Liu, Beihu Wu, Yuting Qiu, Yingying Cai:
Bridging Modality Gap for Visual Grounding with Effecitve Cross-Modal Distillation. 347-363 - Chunlan Zhan, Wenhua Qian, Peng Liu:
EGSRNet: Emotion-Label Guiding and Similarity Reasoning Network for Multimodal Sentiment Analysis. 364-378 - Min Zhu, Guanming Liu, Zhihua Wei:
VL-MPFT: Multitask Parameter-Efficient Fine-Tuning for Visual-Language Pre-trained Models via Task-Adaptive Masking. 379-394 - Zhuzhu Zhang, Xian Fu, Tianrui Wu, Yu Sun, Ningning Zhang, Hui Zhang:
A Multimodal Fake News Detection Model Leveraging Image Frequency and Spatial Domain Analysis with Deep Dynamic Trade-Off Fusion. 395-409 - Min Zheng, Chunpeng Wu, Yue Wang, Weiwei Liu, Qinghe Ye, Ke Chang, Cuncun Shi, Fei Zhou:
Efficiency-Aware Fine-Grained Vision-Language Retrieval via a Global-Contextual Autoencoder. 410-423 - Xiaorui Shi:
Towards Making the Most of Knowledge Across Languages for Multimodal Cross-Lingual Summarization. 424-438 - Zhengqing Gao, Xiang Ao, Xu-Yao Zhang, Cheng-Lin Liu:
Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning. 439-452 - Tianci Xun, Zhong Zheng, Yulin He, Wei Chen, Weiwei Zheng:
Unleashing the Class-Incremental Learning Potential of Foundation Models by Virtual Feature Generation and Replay. 453-467 - Jiaxuan Li, Likun Huang, Chuanhu Zhu, Song Zhang, Qiang Li:
Multimodal Feature Hierarchical Fusion for Text-Image Person Re-identification. 468-481 - Xiaoyu Liang, Jiayuan Yu, Lianrui Mu, Jiedong Zhuang, Jiaqi Hu, Yuchen Yang, Jiangnan Ye, Lu Lu, Jian Chen, Haoji Hu:
Mitigating Hallucination in Visual-Language Models via Re-balancing Contrastive Decoding. 482-496 - Shanshan Chen, Dan Xu, Kangjian He:
Multimodal Medical Image Registration Using Optimized Phase Consistency Within Joint Frequency-Space Domain. 497-510 - Weichen Huang, Xinyue Ju, You Zhou, Yipeng Xu, Gang Yang:
Two Semantic Information Extension Enhancement Methods For Zero-Shot Learning. 511-525 - Yihan Zhao, Wei Xi, Gairui Bai, Xinhui Liu, Jizhong Zhao:
Robust Contrastive Learning Against Audio-Visual Noisy Correspondence. 526-540 - Xiaofan Wang, Xiuhong Li, Zhe Li, Chenyu Zhou, Fan Chen, Dan Yang:
Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning. 541-554 - Zirui Shang, Shuo Yang, Xinxiao Wu:
Efficient Language-Driven Action Localization by Feature Aggregation and Prediction Adjustment. 555-568 - Ran Xu:
Greedy Fusion Oriented Representations for Multimodal Sentiment Analysis. 569-581 - Zhiyun Chen, Qing Zhang, Jie Liu, Yufei Wang, Haocheng Lv, LanXuan Wang, Jianyong Duan, Mingying Xv, Hao Wang:
Counterfactual Multimodal Fact-Checking Method Based on Causal Intervention. 582-595 - Min Li, Feng Li, Enguang Zuo, Xiaoyi Lv, Chen Chen, Cheng Chen:
Rethinking the Necessity of Learnable Modal Alignment for Medical Image Fusion. 596-610 - Yanting Zhang, Jingyi Guo, Cairong Yan, Zhijun Fang:
Taming Diffusion for Fashion Clothing Generation with Versatile Condition. 611-625
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.