With the popularity of smart devices and online social media platforms, people are expressing their views in various modalities like text, images, and audio. Thus, recent research in sentiment analysis is no more limited to one modality of information only, rather it compiles all the available modalities to predict more correct sentiment. Multimodal sentiment analysis (MSA) is the process of extracting sentiment from various modalities such as text, images, and audio. Existing research works predict the sentiment of individual modalities independently and these predictions leverage the final sentiment. This paper presents an MSA approach for obtaining the final sentiment of an image-text tweet using multimodal decision-level fusion by incorporating features of individual modalities and inter-modal semantic relations. A dataset is prepared from an existing benchmark MSA dataset by annotating the final sentiment to tweets as a whole after assessing all the modalities. The proposed approach is experimented on this dataset and compared with state-of-the-art MSA methods. The in-depth analysis of the comparison results shows that the proposed approach outperforms existing methods in terms of accuracy, and F1-score.
Data Availibility Statement
We have collected our data from MVSA-Single (MVSA-S) dataset which is publicly available and the detail is also mentioned in Section 3. Collected data is further annotated to fit in our current research work which may be available on request to the corresponding author.
