research-article

Temporal Sentiment Localization: Listen and Look in Untrimmed Videos

Authors:

Zhicheng Zhang,

Jufeng YangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 199 - 208

https://doi.org/10.1145/3503161.3548007

Published: 10 October 2022 Publication History

Get Access

Abstract

Video sentiment analysis aims to uncover the underlying attitudes of viewers, which has a wide range of applications in real world. Existing works simply classify a video into a single sentimental category, ignoring the fact that sentiment in untrimmed videos may appear in multiple segments with varying lengths and unknown locations. To address this, we propose a challenging task, i.e., Temporal Sentiment Localization (TSL), to find which parts of the video convey sentiment. To systematically investigate fully- and weakly-supervised settings for TSL, we first build a benchmark dataset named TSL-300, which is consisting of 300 videos with a total length of 1,291 minutes. Each video is labeled in two ways, one of which is frame-by-frame annotation for the fully-supervised setting, and the other is single-frame annotation, i.e., only a single frame with strong sentiment is labeled per segment for the weakly-supervised setting. Due to the high cost of labeling a densely annotated dataset, we propose TSL-Net in this work, employing single-frame supervision to localize sentiment in videos. In detail, we generate the pseudo labels for unlabeled frames using a greedy search strategy, and fuse the affective features of both visual and audio modalities to predict the temporal sentiment distribution. Here, a reverse mapping strategy is designed for feature fusion, and a contrastive loss is utilized to maintain the consistency between the original feature and the reverse prediction. Extensive experiments show the superiority of our method against the state-of-the-art approaches.

Supplementary Material

MP4 File (mm22-fp1067.mp4)

Presentation video In this work, we aim to understand better the sentiment conveyed in untrimmed videos, in granularity of frame, for applications in real-world scenarios. We propose a novel task, i.e., Temporal Sentiment Analysis (TSL), to locate and classify sentiment simultaneously. Then, we present a video sentiment analysis dataset for fully- and weakly-supervised settings. To compact the challenges, we propose a weakly-supervised temporal sentiment localization framework. The dataset and code are available at https://github.com/nku-zhichengzhang/TSL300.

Download
253.82 MB

References

[1]

R Baragash and H Aldowah. 2021. Sentiment analysis in higher education: a systematic mapping review. In Journal of Physics: Conference Series.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

WS4ABSA: An NMF-Based Weakly-Supervised Approach for Aspect-Based Sentiment Analysis with Application to Online Reviews

Joint sentiment/topic model for sentiment analysis

Weakly supervised activity analysis with spatio-temporal localisation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations