Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2024
Dynamic Weighted Gating for Enhanced Cross-Modal Interaction in Multimodal Sentiment Analysis
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 21, Issue 1Article No.: 38, Pages 1–19https://doi.org/10.1145/3702996Advancements in Multimodal Sentiment Analysis (MSA) have predominantly focused on leveraging the interdependence of text, acoustic, and visual modalities to enhance sentiment prediction. However, efficiently integrating these modalities remains a ...
- research-articleNovember 2024
Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data
ICMI '24: Proceedings of the 26th International Conference on Multimodal InteractionPages 22–32https://doi.org/10.1145/3678957.3685717Stress detection in real-world settings presents significant challenges due to the complexity of human emotional expression influenced by biological, psychological, and social factors. While traditional methods like EEG, ECG, and EDA sensors provide ...
- short-paperNovember 2024
GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping
SIGSPATIAL '24: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information SystemsPages 565–568https://doi.org/10.1145/3678717.3691268Machine Learning (ML) for Mineral Prospectivity Mapping (MPM) remains a challenging problem as it requires the analysis of associations between large-scale multi-modal geospatial data and a few historical mineral commodity observations (positive labels). ...
- research-articleOctober 2024
Multimodal Blockwise Transformer for Robust Sentiment Recognition
MRAC '24: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective ComputingPages 88–92https://doi.org/10.1145/3689092.3689399The MER-NOISE challenges participants to classify emotions from multimodal data, specifically audio and visual, with added noise. In this paper, we present a solution for the NOISE track of the MER2024 competition, which focuses on the robustness of ...
- short-paperOctober 2024
MuSe '24: The 5th Multimodal Sentiment Analysis Challenge and Workshop: Social Perception & Humor
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 10–11https://doi.org/10.1145/3689062.3695939The 5th Multimodal Sentiment Analysis Challenge (MuSe), a workshop in conjunction with ACM Multimedia '24, is focused on Multimodal Machine Learning in the domain of Affective Computing. Two different sub-challenges are proposed: Social Perception Sub-...
-
- short-paperOctober 2024
Multimodal Humor Detection and Social Perception Prediction
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 60–64https://doi.org/10.1145/3689062.3689376The parallel audio-visual-text data contains vast amount of information. Thus it is essential to develop machine learning algorithms that can utilise them efficiently. In this work, we investigated unimodal and multimodal solutions for MuSe Humor and ...
- research-articleOctober 2024
The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition
- Shahin Amiriparian,
- Lukas Christ,
- Alexander Kathan,
- Maurice Gerczuk,
- Niklas Müller,
- Steffen Klug,
- Lukas Stappen,
- Andreas König,
- Erik Cambria,
- Björn W. Schuller,
- Simone Eulitz
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 1–9https://doi.org/10.1145/3689062.3689088The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of ...
- research-articleOctober 2024
Social Perception Prediction for MuSe 2024: Joint Learning of Multiple Perceptions
- Zhuofan Wen,
- Hailiang Yao,
- Shun Chen,
- Haiyang Sun,
- Mingyu Xu,
- Licai Sun,
- Zheng Lian,
- Bin Liu,
- Fengyu Zhang,
- Siyuan Zhang,
- Jianhua Tao
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 52–59https://doi.org/10.1145/3689062.3689087In this paper, we present our unique method for the MuSe 2024 Perception sub-challenge. In the Perception sub-challenge, 21 labeled social perceptions data are given, 16 social perceptions are required to be predicted. Joint learning is crucial for our ...
- research-articleOctober 2024
Modality Weights Based Fusion Model for Social Perception Prediction in Video, Audio, and Text
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 12–19https://doi.org/10.1145/3689062.3689085Social perception is a crucial psychological concept that explains how we understand and interpret others and their behaviors. It encompasses the complex process of discerning individual characteristics, intentions, and emotions, significantly ...
- research-articleOctober 2024
Feature-wise Optimization and Performance-weighted Multimodal Fusion for Social Perception Recognition
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and HumorPages 28–35https://doi.org/10.1145/3689062.3689082Automatic social perception recognition is a new task to mimic the measurement of human traits, which was previously done by humans via questionnaires. We evaluated unimodal and multimodal systems to predict agentive and communal traits from the LMU-ELP ...
- research-articleOctober 2024
Multimodal Contextual Interactions of Entities: A Modality Circular Fusion Approach for Link Prediction
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8374–8382https://doi.org/10.1145/3664647.3681696Link prediction aims to infer missing valid triplets to complete knowledge graphs, with recent inclusion of multimodal information to enrich entity representations. Existing methods project multimodal information into a unified embedding space or learn ...
- research-articleOctober 2024
CIEASR:Contextual Image-Enhanced Automatic Speech Recognition for Improved Homophone Discrimination
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 915–924https://doi.org/10.1145/3664647.3681665Automatic Speech Recognition (ASR) models pre-trained on large-scale speech datasets have achieved significant breakthroughs compared with traditional methods. However, mainstream pre-trained ASR models encounter challenges in distinguishing homophones, ...
- research-articleOctober 2024
Bridging Gaps in Content and Knowledge for Multimodal Entity Linking
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 9311–9320https://doi.org/10.1145/3664647.3681661Multimodal Entity Linking (MEL) aims to address the ambiguity in multimodal mentions and associate them with Multimodal Knowledge Graphs (MMKGs). Existing works primarily focus on designing multimodal interaction and fusion mechanisms to enhance the ...
- research-articleOctober 2024
Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4341–4348https://doi.org/10.1145/3664647.3681633Multimodal emotion recognition in conversation (MERC) seeks to identify the speakers' emotions expressed in each utterance, offering significant potential across diverse fields. The challenge of MERC lies in balancing speaker modeling and context ...
- research-articleOctober 2024
Seeing Beyond Words: Multimodal Aspect-Level Complaint Detection in Ecommerce Videos
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 243–252https://doi.org/10.1145/3664647.3681595Complaints are pivotal expressions within e-commerce communication, yet the intricate nuances of human interaction present formidable challenges for AI agents to grasp comprehensively. While recent attention has been drawn to analyzing complaints within ...
- research-articleOctober 2024
GLoMo: Global-Local Modal Fusion for Multimodal Sentiment Analysis
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1800–1809https://doi.org/10.1145/3664647.3681527Multimodal Sentiment Analysis (MSA) has witnessed remarkable progress and gained increasing attention in recent decade. However, current MSA methodologies primarily rely on global representations extracted from different modalities, such as the mean of ...
- research-articleOctober 2024
Virtual Visual-Guided Domain-Shadow Fusion via Modal Exchanging for Domain-Specific Multi-Modal Neural Machine Translation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4227–4235https://doi.org/10.1145/3664647.3681525Incorporating domain-specific visual information into text poses one of the critical challenges for domain-specific multi-modal neural machine translation (DMNMT). While most existing DMNMT methods often borrow multi-modal fusion frameworks from multi-...
- research-articleOctober 2024
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3867–3876https://doi.org/10.1145/3664647.3681416Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the proliferation of diverse multimodal social media content including text, and images multimodal stance ...
- research-articleOctober 2024
Medical Report Generation via Multimodal Spatio-Temporal Fusion
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4699–4708https://doi.org/10.1145/3664647.3681377Medical report generation aims at automating the synthesis of accurate and comprehensive diagnostic reports from radiological images. The task can significantly enhance clinical decision-making and alleviate the workload on radiologists. Existing works ...
- research-articleOctober 2024
PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1943–1951https://doi.org/10.1145/3664647.3681253The recent advancements in cross-modal transformers have demonstrated their superior performance in RGB-D segmentation tasks by effectively integrating information from both RGB and depth modalities. However, existing methods often overlook the varying ...