1. Introduction
In the modern communication system, people use different social media platforms (e.g., Facebook, Twitter, Instagram, and Flickr) to express their opinions on various issues and activities of their daily life. In these platforms, users can share visual content with the textual one to communicate with others. It is easier to express emotions intuitively through images [
1]. There is a percept that
“A picture is worth a thousand words”. From
Figure 1, we can understand how an image can be able to deduce an individual’s sentiment without any text. In
Figure 1a, the cat is in a happy mood as it is enjoying the fruit. On the other hand,
Figure 1b represents the storm forecast. Thus, visual sentiment analysis has become a part of our daily lives [
2].
The accurate prediction of users’ sentiment by using their uploaded images on social media has become an important research challenge [
3,
4]. However, the image data may contain inconsistent data, missing data, or duplicated data which leads to various types of uncertainty (e.g., ignorance, incompleteness, imprecision, ambiguity, and vagueness). These uncertainties can obstruct prediction accuracy. For image classification, deep learning methods are widely used as they can represent the accurate and robust features of images [
5,
6]. In addition, to handle different types of uncertainty in image data, the Belief Rule-Based Expert System (BRBES) is more applicable [
7,
8]. Since an integrated model performs better than a stand-alone model [
9,
10,
11], we propose the integration of a deep learning method with BRBES to improve the prediction accuracy in visual sentiment analysis. This is the key contribution of our research work.
As a Deep Learning method can process raw data directly, it is used effectively to solve various classification and regression problems. However, as a data-driven approach, it has the limitation of addressing different types of uncertainty [
12]. On the contrary, as a knowledge-driven approach, BRBES can address various types of uncertainty (e.g., ignorance, incompleteness, imprecision, ambiguity, and vagueness) [
13]. However, it cannot integrate the associative memory in its inference procedure. For example, we use multiplication and division operators to calculate the activation weight of a rule in the BRBES inference framework. However, these provide incorrect activation values of each rule. To solve this issue, a Deep Neural Network (DNN) model can be used to calculate the rule activation weight by providing more accurate values. Therefore, as an integrated framework of the Deep Learning method within the BRBES inference framework, our proposed system provides the exact value of rule activation weights that results in an accurate prediction of sentiment under uncertainty.
BRBES is composed of a set of rules and provides results based on those rules. The rules consist of the antecedent and consequent parts. The antecedent part of a rule is based on the input, and the consequent part contains the output. Generally, there are two types of BRBES: one is Conjunctive BRB [
14] and another is Disjunctive BRB [
15]. The AND logical operator is utilized to connect each antecedent attribute in the Conjunctive BRB, where the OR logical operator is used in the Disjunctive one. For the AND logical operator, Conjunctive BRB takes more time in computation and constitutes a large number of rules in the rule base. In our experiments, we use the Disjunctive one as it needs less time for computation and has a low number of rules. We explore a total of eight belief rule base for BRBES that takes less computational time.
In this paper, we address the following research questions: (1) Why we use the Deep Learning model for visual sentiment analysis? (2) What is the advantage of utilizing BRBES in our proposed system? (3) Why and how we combine Deep Learning with BRBES? We compose the remainder of this paper as follows:
Section 2 surveys related work on visual sentiment analysis.
Section 3 provides an overview of our proposed BRB-DL approach.
Section 4 discusses the procedure of experiments.
Section 5 reports the experimental results and evaluation of BRB-DL compared to different models such as SVM, Naive Bias Classifier, Decision Tree classifier, VGG16, VGG19, and ResNet50. Finally,
Section 6 concludes the paper with some future plans.
2. Related Work
This section presents a literature review on visual sentiment analysis. Siersdorfer and Hare [
16] mainly focused on the bag-of-visual word representation and color distribution of images. They estimated the polarity of sentiment in images by extracting the discriminative sentiment-related features and deployed a machine learning approach. Machajdik and Hanbury [
17] considered a method that extracted and combined low-level features of images. These features were used for emotion classification. They considered awe, amusement, contentment, excitement as positive emotions and anger, fear, disgust, and sadness as negative ones.
To generate a large-scale Visual Sentiment Ontology (VSO), Borth et al. [
18] represented a method based on psychological theories and web mining. They proposed SentiBank that was a visual concept detector library. The research work was tested on a dataset of 807 artistic photographs depicting eight emotions, including amusement, awe, contentment, excitement, anger, disgust, fear, and sadness. Moreover, Chen et al. [
19] introduced DeepSentiBank for detecting the emotion of an image. Vasavi and Aditi [
20] adopted a Deep Learning approach to predict emotions depicted in images. They conducted their experiment on a popular Flickr Image Dataset and predicted five emotions of images including love, happiness, violence, fear, and sadness.
However, there are some data available that contain various kinds of noise. To diminish the noise of large-scale image data, You et al. [
5] offered a progressive CNN (PCNN) model. In addition, to reduce over-fitting in visual sentiment analysis, Islam and Zhang [
4] adopted the transfer learning approach. In this study, they utilized the hyper-parameters learned from a Deep Convolutional Neural Network (DCNN). Wang et al. [
21] rendered a visual sentiment analysis framework where an adjective and a noun are jointly learned by using deep neural networks. To train a visual sentiment classifier, Vadicamo et al. [
22] applied the sentiment polarity of the textual contents and proposed a cross-media learning approach. In addition, Campos et al. [
6] trained an AlexNet model adapted for visual sentiment prediction.
Fengjiao and Aono [
23] considered a merged method where both hand-crafted and CNN features were incorporated. They employed hand-crafted features to extract the local visual information and CNN models to get the global visual information. To label emotions of painting images, Tan et al. [
24] proposed a method where the painting features were considered. They developed a classification model based on VGG16 and ResNet50. Moreover, Paolanti et al. [
25] analyzed the sentiment of social images related to cultural heritage and compared them among VGG16, ResNet, and Inception models. Recently, Chowdhury et al. [
26] adopted the strategy of the ensemble of transfer learning models and employed three pre-trained deep CNN models including VGG16, Xception, and MobileNet. A summary of the prior research on visual sentiment analysis is shown in
Table 1. None of them applied an integrated approach to Deep Learning and BRBES. However, in our proposed method, we focus on the integration of a Deep Learning method with a BRBES inference framework. Our proposed method helps to predict the sentiment of images effectively.
3. Proposed Framework
In this research, an integrated model of Convolutional Neural Network (CNN) [
27] and Belief Rule Base (BRB) is developed to classify the visual sentiments. The system flow chart is illustrated in
Figure 2.
From
Figure 2, it can be seen that the integrated model first fetches data from the dataset and send it to the data augmentation section. After augmentation, it is sent to preprocessing steps. In the preprocessing steps, the image is resized into a 150 × 150 shape. After that, the RGB image is converted to a Gray Scale image. Then, the processed image is sent to the CNN model. The result of the CNN model is then fed into the BRB model which predicts the final sentiment label of the image.
3.1. Convolutional Neural Network Model
The architecture of the Convolutional Neural Network (CNN) is shown in
Figure 3.
According to
Figure 3, the model is constituted with five convolution layers where there are 16, 32, 64, 128, and 256 filters with 2 × 2 kernel size. ReLU activation function is used in each convolution layer. Mathematically, it can be shown as Equation (
1):
The input shape of this model is (150, 150, 1), where the first 150 refers to the height of the input image and the second 150 implies the width of the input image. Finally, 1 signifies that the image is a Gray Scale image. A max pooling layer with 2 × 2 pool size is introduced in each convolution layer. The max pooling layer decreases the number of total parameters by selecting the highest value from a rectified feature map. Thus, it can lessen the data size. Along with the max pooling layer, ReLu, and dropout layer are also included in each convolution layer. ReLu works for activating the parameters while the dropout layer deactivates the neurons randomly so that it can avoid overfitting. The global Average Pooling layer is introduced in the last layer that is perfect for feeding into the dense layer. Since the model is classifying eight sentiments, the output layer has eight nodes. Therefore, Softmax is used as an activation function that can be shown as Equation (
2):
Here,
z is the input vector,
ei is the standard exponential function of
i where
i ∈
z. The input vector
z is the output of Fully Connected (FC) layer of the CNN model. The FC layer produces raw prediction values which are known as logits [
28]. Logits are real numbers (−∞ to +∞). The softmax activation function turns these logits into the probabilities of each class.
The Adam optimizer has been used to optimize the integrated model. As a loss function, Categorical Cross-entropy is used to reduce the validation loss. The architecture of the CNN model is shown in
Table 2.
The input image is an array of pixels. The convolution layer consists of multiple kernels with multiple weights. The variation of the kernel weight helps to manipulate different scales of the images. These kernels are used to extract features from the input image. The features of an image (edges, interest points, etc.) provide very rich information on the content. When a kernel is slid over the input image, it produces a feature map for different pixels. This operation is performed based on the weights of the kernel and the neighboring pixels. This feature map is then passed through the ReLu activation function, which increases the nonlinearity by converting the negative values to zero of the feature map. The pooling layer merges the features which are semantically similar into one. The max pooling layer computes the maximum value from the portion of the feature map covered by the pooling layer. For the image segmentation, the layers extract two types of features (full region feature and foreground feature) for each region. Thus, the convolution layer and the max pooling layer generate different feature maps for different images. These feature maps are used to train and validate the model.
3.2. Belief Rule-Based Expert System
A belief rule is an extended form of IF THEN rules. It consists of the antecedent part and consequent part. The antecedent part contains the antecedent attributes and the consequent part that takes the consequent attributes. Referential values are utilized by the antecedent attributes and the belief degrees are connected with the consequent attributes. The relation can be shown as Equation (
3):
where
I1,
I2, ...,
ITk are the antecedent attributes of
kth rule (
k = 1, 2, ...,
L).
Q1,
Q2, ...,
QTk are the referential values.
O1,
O2, ...,
On are the referential values of the consequent attribute and
1,
2, ...,
n are the belief degree for each referential value, and
where attribute weights are
, and the rule weight is
.
Generally, the group of belief rules is considered as the Belief Rule Base (BRB). In a Belief Rule-Based Expert System (BRBES), it helps to generate the initial knowledge base, and Evidential Reasoning (ER) provides services as an inference engine. Some of the knowledge representation parameters are rule weight [
29], belief degrees [
30], and attribute weight [
31]. These are used to identify uncertainty in data. The inference procedure includes input transformation [
32], rule activation [
29], belief update [
33], and rule aggregation [
34]. The working process of a BRBES is shown in
Figure 4.
The process of the calculating activation weight,
, in disjunctive BRB is shown in Equation (
4):
where
is the matching degree and
is the rule weight. The process of belief degree update is shown in Equation (
5):
The original belief degree is represented by the
, where
is the updated belief degree. Rule aggregation is calculated using Equation (
6):
where
is the ER (Evidential Reasoning) aggregated belief degree. The outputs of the rule aggregation process are some fuzzy values [
7]. The process of calculating the crisp value [
8] from these fuzzy outputs is shown in Equation (
8):
Here,
u(
Si) is the utility score for each referential value, while
is ER aggregated belief degree.
Figure 5 illustrates the Belief Rule Base Tree of our experiment. X2 which is a root node of this tree represents the “Overall Sentiment Score”. In BRB, such node is related to the consequent attribute of the rule. As mentioned earlier, this consequent attribute consists of a number of referential values, each associated with belief degree related to overall sentiment.
Considering an output from the CNN model: Anger = 0.0, Fear = 0.0, Joy = 0.0, Love = 0.0, Sadness = 0.8, and Surprise = 0.2. This output is the input ([0.0, 0.0, 0.0, 0.0, 0.8, 0.2]) for the BRB. Therefore, the matching degrees for this input are shown in
Table 3.
Activation weight for this experiment is calculated with Equation (
4). The rule weight (
) is considered 1 for our experiment [
35]. Hence,
. The values of all activation weight are shown in
Table 4.
Equation (
5) is used to update the belief degrees. The initial belief degrees for this experiment are presented in
Table 5. Since all antecedent attributes are used to define this rule base,
in this experiment [
36]. Therefore,
. In the same process, we have calculated the value of
to
. Equations (
6) and (
7) are used to calculate the aggregated belief degrees. In this experiment, the calculated aggregated belief degrees for positive, neutral, and negative are shown in
Table 6.
3.3. Integrated Framework
Our proposed integrated approach is used to predict the sentiment label and class of an image. To select an image file from the directory, we use a method named filedialog.askopenfilename() from tkinter package. Since the user selects the image from the file gallery, the image may not have a specific size all the time. Therefore, this may reduce the accuracy of the model. Hence, the image is converted to a grayscale image and resized into a 150 × 150 dimension by performing interpolation to up-size or down-size N-dimensional images. This operation is done with the help of the default function of the scikit-image [
37] library. After that, the processed image is convoluted by each of the convolution layers that are used to develop the integrated framework. The filters of each layer create a map of different features. This map is then sent to the Max-pooling layer to select the greatest pixel value in a pooling window. The output map of the max pooling layer is delivered to the trained hidden layers where matrix chain multiplication is performed using optimized weight. The output is forwarded to the output layer.
The Softmax activation function helps the output layer by calculating the possibility of an image is allied to a specific class. In our experiment, “Anger”, “Joy”, “Love”, “Surprise”, “Fear,” and “Sadness” classes are used as the referential values. As the antecedent attribute, “Sentiment Class” is considered in BRBES. As the corresponding referential value of the antecedent attribute, the probability of each class is used. Moreover, the consequent attribute is the “Overall Sentiment Score” with referential values “Positive”, “Neutral”, and “Negative”.The utility score for these referential values is chosen as “1.0”, “0.5”, and “0.0”, respectively. The belief rule used for this integrated system is shown in
Table 5. The inference procedure is directed, and the final results are calculated using these belief rules. The process of calculating matching degree in BRB is shown in Algorithm 1.
Algorithm 1 Process of calculating matching degree. |
- 1:
procedureMatchingDegree - 2:
- 3:
- 4:
- 5:
- 6:
- 7:
- 8:
loop: - 9:
if then - 10:
- 11:
- 12:
goto loop. - 13:
close; - 14:
close;
|