Abstract
In recent years, with the development of social media, people are more and more inclined to upload text, pictures and videos on the platform to express their personal emotions, thus the number of short videos is increasing and becoming the first choice for people to socialize. Unlike the traditional way, people can convey their personal emotions and opinions through media other than words, such as video images, etc. for external information. Therefore, the expression and analysis of emotions is not only through text, but also through the analysis of emotional needs in images and videos, and the research scholars have customized products for individual users. Compared with pure text content, video information can more intuitively express users' happiness, anger and sorrow, thus short video-related applications have gained more and more popularity among Internet users in recent years. However, not all short videos on social networking sites can accurately express users' emotions, and related text information can more accurately assist sentiment analysis and thus improve accuracy. However, short video sentiment analysis based on video frame images is inaccurate in some scenarios, such as when expressing tears of joy, the sentiment expressed by the user's facial expression and voice are different, which will cause errors in the analysis of sentiment. As a result, researchers began to consider multimodal sentiment analysis to reduce the impact of the above scenarios on short video sentiment analysis. This paper focuses on proposing a sentiment analysis method for short videos. We first propose a residual attention model to make full use of the information in audio to classify the emotions contained in them. Then the text information in the dataset is classified by feature extraction. The key to extract features from text information is not only to retain the semantic information of the text, but also to explore the potential emotional information in the text, so as to ensure the integrity of the text information features. The experiments show that the sentiment analysis model proposed in this paper is more superior than the baselines.