Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2025
Less is more: A closer look at semantic-based few-shot learning
AbstractFew-shot Learning (FSL) aims to learn and distinguish new categories from a scant number of available samples, presenting a significant challenge in the realm of deep learning. Recent researchers have sought to leverage the additional semantic or ...
Highlights- Previous Semantic-based few-shot methods focus on designing complex fusion modules, while ignoring the generalization capacity of language models.
- We propose a simple framework, which fully exploits the LM with learnable prompts.
- ...
- research-articleJanuary 2025
Action-guided prompt tuning for video grounding
AbstractVideo grounding aims to locate a moment-of-interest semantically corresponding to a given query. We claim that existing methods overlook two critical issues: (1) the sparsity of language, and (2) the human perception process of events. To be ...
Highlights- Language features compared to video features tends to be sparser.
- Verbs carry discriminative information for distinguishing different videos.
- Salient action and temporal order of actions are two key factors.
- A Prompt ...
- research-articleJanuary 2025
Cerberus: Attribute-based person re-identification using semantic IDs
Expert Systems with Applications: An International Journal (EXWA), Volume 259, Issue Chttps://doi.org/10.1016/j.eswa.2024.125320AbstractWe introduce a new framework, dubbed Cerberus, for attribute-based person re-identification (reID). Our approach leverages person attribute labels to learn local and global person representations that encode specific traits, such as gender and ...
Highlights- We introduce Cerberus, a novel attribute-based person re-ID framework.
- Cerberus handles reID, PAR, and (partial) APS tasks, using a unified model.
- We encourage discerning differences between similar persons via semantic guidance.
- ArticleDecember 2024
DFIMat: Decoupled Flexible Interactive Matting in Multi-person Scenarios
AbstractInteractive portrait matting refers to extracting the soft portrait from a given image that best meets the user’s intent through their inputs. Existing methods often underperform in complex scenarios, mainly due to three factors. (1) Most works ...
- ArticleDecember 2024
OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction
Abstract3D occupancy prediction based on multi-sensor fusion, crucial for a reliable autonomous driving system, enables fine-grained under- standing of 3D scenes. Previous fusion-based 3D occupancy predictions relied on depth estimation for processing 2D ...
-
- research-articleFebruary 2025
Integrating multimodal learning for improved vital health parameter estimation
Computers in Biology and Medicine (CBIM), Volume 183, Issue Chttps://doi.org/10.1016/j.compbiomed.2024.109104AbstractMalnutrition poses a significant threat to global health, resulting from an inadequate intake of essential nutrients that adversely impacts vital organs and overall bodily functioning. Periodic examinations and mass screenings, incorporating both ...
Highlights- A holistic feature fusion of facial, body & 3D embeddings, including the correlation between them, optimal feature combination and individual importance in estimating the weight is insightfully discussed.
- This paper is the first to ...
- ArticleDecember 2024
Feature Balance Method for Multi-modal Entity Alignment
AbstractMulti-modal Entity Alignment (MMEA) aims to establish correlations between modalities such as images and texts to align equivalent entities across different multi-modal knowledge graphs, thereby enhancing knowledge graph coverage and addressing ...
- research-articleDecember 2024
A lightweight Transformer-based visual question answering network with Weight-Sharing Hybrid Attention
AbstractRecent advances show that Transformer-based models and object detection-based models play an indispensable role in VQA. However, object detection-based models have significant limitations due to their redundant and complex detection box ...
- research-articleDecember 2024
Discriminative Feature Enhancement Network for few-shot classification and beyond
Expert Systems with Applications: An International Journal (EXWA), Volume 255, Issue PDhttps://doi.org/10.1016/j.eswa.2024.124811AbstractFew-shot classification aims to recognize query samples from novel classes given scarce labeled data, which remains a challenging problem in machine learning. This paper proposes a Discriminative Feature Enhancement Network (DFENet) to ...
Highlights- DFENet introduces a novel approach for few-shot classification, using visual and semantic knowledge.
- The neural decoding-based attention module efficiently assigns attention weights to keys, inspired by cognitive neuroscience.
- A ...
- research-articleNovember 2024
C2BG-Net: Cross-modality and cross-scale balance network with global semantics for multi-modal 3D object detection
AbstractMulti-modal 3D object detection is instrumental in identifying and localizing objects within 3D space. It combines RGB images from cameras and point-clouds data from lidar sensors, serving as a fundamental technology for autonomous driving ...
- research-articleNovember 2024
AutoAMS: Automated attention-based multi-modal graph learning architecture search
AbstractMulti-modal attention mechanisms have been successfully used in multi-modal graph learning for various tasks. However, existing attention-based multi-modal graph learning (AMGL) architectures heavily rely on manual design, requiring huge effort ...
Highlights- AutoAMS searches for attention-based multimodal graph learning (AMGL) architectures.
- AM search space supports finding multi-modal attention and graph learning components.
- A search objective using unsupervised and task-specific ...
- research-articleNovember 2024
Text-guided Graph Temporal Modeling for few-shot video classification
Engineering Applications of Artificial Intelligence (EAAI), Volume 137, Issue PAhttps://doi.org/10.1016/j.engappai.2024.109076AbstractLarge-scale pre-trained models and graph neural networks have recently demonstrated remarkable success in few-shot video classification tasks. However, they generally suffer from two key limitations: i) the temporal relations between adjacent ...
- research-articleOctober 2024JUST ACCEPTED
Multimodal Consistency Suppression Factor for Fake News Detection
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3699959Recent multi-modal fake news detection methods often use the consistency between textual and visual contents to determine the truth or fake of a news information. Higher levels of textual-visual consistency typically lead to a greater likelihood of ...
- ArticleOctober 2024
ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading
AbstractGlaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, ...
- ArticleJanuary 2025
I2M2Net: Inter/Intra-modal Feature Masking Self-distillation for Incomplete Multimodal Skin Lesion Diagnosis
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 WorkshopsPages 3–13https://doi.org/10.1007/978-3-031-77610-6_1AbstractMultimodal learning has demonstrated promising advantages over single-modal approaches in the diagnosis of skin lesions. However, these methods often suffer from significant accuracy degradation when encountering missing modalities, hindering ...
- ArticleOctober 2024
Leveraging IHC Staining to Prompt HER2 Status Prediction from HE-Stained Histopathology Whole Slide Images
AbstractThe development of artificial intelligence has significantly impacted the predictive analysis of molecular biomarkers, which is crucial for targeted cancer therapy. Traditional assessment of HER2 in breast cancer utilizes both Hematoxylin and ...
- ArticleOctober 2024
Knowledge-Driven Subspace Fusion and Gradient Coordination for Multi-modal Learning
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024Pages 263–273https://doi.org/10.1007/978-3-031-72083-3_25AbstractMulti-modal learning plays a crucial role in cancer diagnosis and prognosis. Current deep learning based multi-modal approaches are often limited by their abilities to model the complex correlations between genomics and histopathology data, ...
- ArticleOctober 2024
MoRA: LoRA Guided Multi-modal Disease Diagnosis with Missing Modality
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024Pages 273–282https://doi.org/10.1007/978-3-031-72384-1_26AbstractMulti-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge ...
- ArticleOctober 2024
Temporal Neighboring Multi-modal Transformer with Missingness-Aware Prompt for Hepatocellular Carcinoma Prediction
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024Pages 79–88https://doi.org/10.1007/978-3-031-72378-0_8AbstractEarly prediction of hepatocellular carcinoma (HCC) is necessary to facilitate appropriate surveillance strategy and reduce cancer mortality. Incorporating CT scans and clinical time series can greatly increase the accuracy of predictive models. ...
- ArticleOctober 2024
Refining Intraocular Lens Power Calculation: A Multi-modal Framework Using Cross-Layer Attention and Effective Channel Attention
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024Pages 754–763https://doi.org/10.1007/978-3-031-72378-0_70AbstractSelecting the appropriate power for intraocular lenses (IOLs) is crucial for the success of cataract surgeries. Traditionally, ophthalmologists rely on manually designed formulas like “Barrett” and “Hoffer Q” to calculate IOL power. However, these ...