Sentiment analysis method of consumer comment text based on BERT and hierarchical attention in e-commerce big data environment

Wanjun Chang; Mingdong Zhu

doi:10.1515/jisys-2023-0025

Open Access Published by De Gruyter December 31, 2023

Sentiment analysis method of consumer comment text based on BERT and hierarchical attention in e-commerce big data environment

Wanjun Chang and Mingdong Zhu

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2023-0025

Abstract

This study proposes an emotional analysis method of consumer comment text based on Bidirectional Encoder Representations from Transformers (BERT) and hierarchical attention. First, using the BERT pre-training model, the left and right contextual information is fused to enhance the semantic representation of words and generate dynamic word vectors containing contextual semantics. Second, the bidirectional long short-term memory network is used to obtain the sequence feature matrix, and the sentence representation and the text representation are obtained using the two-layer long short-term memory. Finally, the local attention mechanism and the global attention mechanism are introduced into the sentence representation layer and the text representation layer, respectively, and the text emotion of consumer comments is classified by softmax. Experiments show that the accuracy of the proposed method in Laptop data set is 93.01% and that in Restaurant data set is 92.45%. Therefore, the performance of the proposed method in the emotional analysis of consumer comment text is significantly better than that of the comparison method.

Keywords: e-commerce big data; BERT; BiLSTM; hierarchical attention; deep learning

1 Introduction

Internet-related industries have become the backbone of China’s economy and have gradually changed China’s economic and social form [1,2]. The Internet has gradually become an interactive platform for network users to disseminate information, share knowledge, and daily consumption [3]. More and more people pay attention to social hotspots and express their views through the Internet [4,5]. Compared with shopping in physical stores, online shopping cannot directly contact with goods and cannot compare the advantages and disadvantages of goods on the spot [6]. People can only judge whether the goods meet their own needs according to the commodity information described by online merchants. In addition, there are too many pieces of commodity information in the shopping platform, which often makes consumers lose in a large amount of commodity information [7,8]. At the same time of online shopping, consumers usually comment on or score the products they buy, which directly or indirectly reflects consumers’ preferences for commodities or certain attributes of commodities. These comments play an important role in guiding the shopping choices of other potential consumers. Since there are tens of thousands or even millions of comments on commodities, it is impossible to collect and sort information manually [9,10,11,12].

In the process of emotional polarity identification, the classification results obtained by the document-level and sentence-level methods are not detailed enough to clearly and concretely analyze the user’s emotional tendencies in all aspects, resulting in the user’s inability to accurately obtain the information that the user wants to know, thus affecting the formulation of effective action decisions [13,14,15,16].

Different from the traditional standardized text, consumer comments are often short and concise, with sparse data, often omitting the main idea, limited contextual information, and strong domain [17,18]. The traditional emotional dictionary is often unable to be directly applied. It is precisely because of these characteristics of online comments that the task of online consumer comment emotional propensity analysis becomes more complex [19,20].

Through the research, it is found that there are two main defects in convolutional neural network (CNN)-based text emotion analysis. First, word vectors that are usually trained with words as a unit cannot distinguish polysemy. If the text word vector representation is not accurate enough, overfitting is easy to occur during training, which directly affects the text classification effect. Second, traditional CNN extracts a maximum value from sentence features and does not conduct multisegment analysis for sentence structure. This study proposes an emotional analysis method of consumer comment text based on Bidirectional Encoder Representations from Transformers (BERT) and hierarchical attention. The innovation lies in:

Based on the combination of BERT pre-training model and bidirectional long short-term memory (BiLSTM), a text emotion analysis model of consumer reviews is constructed. Through the BERT pre-training model and the sequence feature matrix obtained by BiLSTM, the utilization rate of text data is effectively improved.
The local attention mechanism is introduced in the sentence representation layer to capture the important words in each sentence, and the global attention mechanism is introduced in the text representation layer to give different sentence weights, which effectively improves the extraction ability of the model.

2 Related works

Nowadays, the mainstream text emotion analysis methods mainly include rule-based method, machine learning method, and deep neural network method. Zhou et al. [21] proposed a diversity-constrained Boltzmann machine method based on CNN. Li et al. [22] proposed a Chinese system opinion summary algorithm based on CNN. This model uses CNN to automatically mine relevant features for emotion analysis and calculates the semantic relationship between features through a mixed ranking function. Wang et al. [23] proposed a joint model combining recurrent neural network and conditional random field, which was integrated into a unified framework for extracting aspect words and opinion words. Wang et al. [24] proposed an emotion analysis method based on attention-based LSTM with aspect embedding (ATAE-LSTM), which connects the target embedding with words and allows the target to participate in the calculation of attention weight. Ma et al. [25] input the word vector representation of the target word and context into LSTM, extract the relevant features, and then calculate the target word and background weight, respectively, in combination with the attention mechanism. And these two outputs are serially input to the emotion classification layer to obtain the emotion classification results. Zhang et al. [26] proposed a sentence-level neural network model, which refers to the two-way gated loop unit to splice words in the text. Chen et al. [27] proposed recurrent attention memory, which adopts multiple-attention mechanism on the memory constructed by BiLSTM and combines the attention results with the gating unit nonlinearly. Jamal et al. [28] used term frequency-reverse document frequency and deep learning for emotional analysis. Liu et al. [29] proposed an electroencephalogram emotion recognition model on the basis of the attention mechanism and a pre-trained convolution capsule network to recognize various emotions more effectively. Tang et al. [30] proposed a deep memory network model, which mainly adds an external memory network to store important information.

3 Proposed model

3.1 BERT pre-training language model

The structure of BERT is shown in Figure 1, where E 1 , E 2 , … , E N represent the input characters, and the corresponding vector T 1 , T 2 , … , T N is generated after multi-layer bidirectional transformer training.

Figure 1

Structure of the BERT model.

BERT only uses the encoder part of the transformer, as shown in Figure 2. The position information of the input sequence reflects the logical structure of the sequence. Therefore, the location code is added to the input layer. The vector fused with position information first passes through the multi-head attention mechanism layer, which essentially repeats self-attention for many times to learn information from different angles and achieve the purpose of enriching semantics. Then, the results are input into the feedforward neural network layer to add nonlinear changes, and finally, the vector representation is obtained. Transformer also introduces residual connection and layer normalization. Residual connection is used to avoid memory bias in information transmission, and layer normalization is used to accelerate the convergence of the model. Masking language model randomly blocks a certain proportion of words, forcing the model to learn the blocked words through the global context, thus achieving the effect of bidirectional coding. Next-sentence prediction can mine the logical relationship between sentences by judging whether the latter sentence is the reasonable next sentence of the previous sentence.

Figure 2

Transformer encoder model structure.

3.2 BiLSTM

In LSTM, C t and C t − 1 are the memory units, h t and h t − 1 are the hidden units, i t is the input gate, f t is the forgetting gate, o t is the output gate, and X n m t is the value of n characteristics in the m time period on the t day:

(1) C t = f t × C t − 1 + i t × C t ′ ,

(2) C t ′ = tanh ( W c [ h t − 1 , X n m ( t − 1 ) ] + b c ) ,

(3) f t = σ ( W f [ h t − 1 , X n m t ] + b f ) ,

(4) i t = σ ( W i [ h t − 1 , X n m t ] + b i ) ,

(5) o t = σ ( W o [ h t − 1 , X n m t ] + b o ) ,

(6) h t = o t × tanh ( C t ) ,

(7) σ ( ⋅ ) = 1 1 + e − ( ⋅ ) ,

where W c , W f , W i , and W o are the weights of different units, and b c , b f , b i , and b o are the corresponding offset coefficients.

The BiLSTM model consists of two LSTM networks superimposed up and down, as shown in Figure 3.

Figure 3

BiLSTM network model.

The calculation process of BiLSTM is as follows:

(8) h → t = f ( W → ⋅ x t + W → ⋅ h → t − 1 + b → ) ,

(9) h ← t = f ( W ← ⋅ x t + W ← ⋅ h ← t − 1 + b ← ) ,

(10) y t = g ( U ⋅ [ h ← ; h → ] + c ) ,

where W → and W ← represents the hidden-layer parameters, h ← t and h → t represent the output results, b → and b ← represent the offset value, and y t represents the output.

3.3 Attention mechanism

The attention mechanism simulates the brain’s attention resource allocation mechanism, calculates the weights of different feature vectors, and achieves higher-quality feature extraction, as shown in Figure 4.

Figure 4

Basic structure of attention mechanism.

The calculation of attention mechanism is as follows:

(11) α i = exp ( V i tanh ( W i h i + b i ) ) ∑ j = 1 n e j ,

(12) y = ∑ i = 1 n α i h i ,

where h i represents the initial state, and V i tanh ( W i h i + b i ) represents the energy value of h i ; V i and W i represent the weight coefficient matrices; b i represents an offset vector; α i represents the weight corresponding to h i .

3.4 Model structure

This study proposes a consumer comment text sentiment analysis model combining BERT and hierarchical attention, as shown in Figure 5.

Figure 5

Proposed model structure.

(1) Input layer

In the aspect-level emotion classification task, the position of the target word determines the emotional tendency it wants to express. If the position changes slightly, the emotion classification result is likely to have a large error. BERT pre-training model can better solve the aforementioned problems. The input representation of this model is a vector e composed of the sum of word vector e e m b , segment vector e s and position vector e p of each word. Similarly, a computer review is taken as an example, “the computer is good, but the outlook is bad.” The specific representation of each embedded layer is shown in Figure 6.

Figure 6

Diagram of input sequence representation.

The word vector generated by the BERT contains position coding information. This coding method enables the model to learn word order when processing text. The word vector dimension set in this study is d e m b , the position vector dimension and the segment vector dimension are d p and d s , respectively, and the dimension sizes of the three are the same. The position vector representation is obtained by encoding the position of each word using sin and cos functions of different frequencies, as shown in formulas (13) and (14):

(13) PE ( pos , 2 i ) = sin ( pos / 10 , 000 2 i / d p ) ,

(14) PE ( pos , 2 i + 1 ) = cos ( pos / 10 , 000 2 i / d p ) ,

where pos represents the pos word and i represents the i dimension in embedding. The period of the position function varies between [ 2 π , 10 , 000 × 2 π ] , and the value of each position code is composed of the values of sin and cos functions of different periods, which will generate unique position information.

(2) The calculation of feature extraction layer is mainly completed in four steps:

The vectorized sentence sequence x i 1 , x i 2 , … , x i t is used as the input of BiLSTM to extract the deep-level feature of sentences. BiLSTM is composed of two parts, namely, forward LSTM and reverse LSTM, which encode sentence semantics more comprehensively through the context information. The calculation process of vector representation h i t for each word is shown in formulas (15)–(17):
(15) h → i t = LSTM → ( x i t ) ,

(16) h ← i t = LSTM ← ( x i t ) ,

(17) h i t = [ h → i t , h ← i t ] .
The local attention mechanism is used to capture words in sentences that contribute more to emotional semantics. First, select a sliding window with length L = [ p i − D , p i + D ] = 1 + 2 D , where p i represents the center word and D represents the set context window size. The weight α i t of each word is obtained by calculating the similarity between the center word and other words in the window.
(18) p i = s ⋅ sigmoid ( v p T tanh ( W p h i t ) ) ,

(19) α i = ali ( h i t , h ¯ L ) e − ( s − p i ) 2 2 σ 2 ,

(20) ali ( h i t , h ¯ L ) = exp ( h i t , h ¯ L ) ∑ L exp ( h i t , h ¯ L ) ,
where s represents the length of the sentence, v p T and W p are the model parameters used to predict the position, σ = D / 2 and h ¯ L represent the hidden-layer state in the context window except for the center word. Finally, the word vector representation h i t and the corresponding weight α i t are weighted.
(21) S i = ∑ t h i t α i t .
The text features are constructed based on sentence vectors. The sentence vector S i is input into the BiLSTM network to deeply mine the logical association between sentences and realize the feature extraction of the whole text. The calculation process of the vector representation G i of the entire text information is shown in formulas (22)–(24):
(22) G → i = LSTM → ( S i ) ,

(23) G ← i = LSTM ← ( S i ) ,

(24) G i = [ G → i , G ← i ] .
The global attention mechanism is applied to the output of the previous layer to highlight the important sentences in the whole text. First, the implicit representation H i of G i is obtained by the single-layer perceptron, and then, the weight α i of each sentence is obtained by calculating the similarity between H i and the context vector U s . The context vector U s is obtained by random initialization and is trained as a parameter of the model.

(25) H i = tanh ( W s G + i b s ) ,

(26) α i = exp ( H i T U s ) ∑ i exp ( H i T U s ) ,

(27) D = ∑ i S i α i ,

where W s represents the weight matrix and b s represents the offset.

(3) Output layer

Aspect-level emotion classification is a classification problem. This study uses a fully connected layer as the output network to classify emotions. First, the output O of the sentence attention layer is taken as the input of the full connection layer, and then, the vector output from the full connection layer is normalized by the softmax function. y ˆ is the output.

(28) y ˆ = soft max ( W O + b ) .

This study uses the supervised learning framework to train the model and uses the back-propagation method. Adam optimization method is used to optimize the parameters, and the minimum cross-entropy loss function is used.

(29) loss = − ∑ i ∈ D ∑ i ∈ C y ˆ i j log y i j + λ ∥ θ ∥ 2 ,

where C is the number of categories, C in this study is 3, y ˆ is the prediction category, y is the actual category of the data, and λ ∥ θ ∥ 2 is the cross-regular term.

4 Experiment and analysis

4.1 Experimental setup

The hardware conditions of this experiment are as follows: 64-bit Windows 7 operating system, Intel Core i7-5500u 2.40 GHz dual core CPU, 8 GB memory, Keras development environment, JetBrains PyCharm development tool, and python development language.

In this experiment, two public data sets of Sem Eval2014 Task4, namely, the Laptop and Restaurant data sets, were used to conduct the aspect-level emotion analysis. Training samples and test samples have been labeled into three categories: machine, passive, and neutral. The data set itself determines that this task is the task of multiple classifications, as shown in Table 1.

Table 1

Statistical information of experimental data set

	Data set	Positive	Negative	Neutral
Laptop	Training set	1,000	426	98
Laptop	Test set	365	144	779
Restaurant	Training set	1,169	568	700
Restaurant	Test set	478	246	790

4.2 Parameter setting

The input layer adopts the pre-trained Chinese model “BERT-Base, Chinese” published by Google. The feature extraction layer is mainly composed of BiLSTM and attention. The number of hidden layer nodes of the two BiLSTM is 128, and the context window D of the local attention mechanism is 5. For model training, set the batch size as 32, the learning rate as 0.0001, the maximum sequence length as 140, and the optimizer as Adam. Different dropout values will affect the output results of the model. In order to set a reasonable dropout value, multiple sets of experiments were conducted on Laptop data, and the experimental results are shown in Figure 7. From Figure 7, when the dropout value is set to 0.6, all indicators are the highest. Therefore, the dropout parameter is set to 0.6.

Figure 7

Diagram of input sequence representation.

4.3 Evaluation criteria

Four evaluation indicators are used to evaluate the algorithm: precision, accuracy, recall rate, and F1 value:

(30) Precision = TP TP + FP ,

(31) Recall = TP TP + FN ,

(32) F = 2 × Precision × Recall Precision + Recall ,

(33) Accuracy = TP + TN TP + FN + FP + FN .

4.4 Comparative experiment of different word embedding models

This article compares the embedding models of BERT, word2vec, and Glove in sentiment analysis tasks, and the final classification accuracy is shown in Table 2. From the table, using the BERT model has better classification performance than Word2vec and Glove, with improvements in F1 values and accuracy. The BERT pre-training model can simultaneously use information from both the front and back directions, better solving the problem of single semantic feature representation of words.

Table 2

Comparison of experimental results of different word embedding models (%)

Word embedding model	Laptop		Restaurant
Word embedding model	F1	Accuracy	F1	Accuracy
Word2vec	87.13	89.65	88.46	90.75
Glove	86.25	90.72	87,54	91.07
BERT	92.88	93.54	93.10	91.66

4.5 Comparative experiment on different text emotion analysis methods

The comparison results between the proposed method and previous studies [24] and [28] are shown in Tables 3 and 4 and Figure 8. The precision, recall, and F1 of the proposed text emotion analysis method in the Laptop data set are 93.01, 92.76, and 92.88% respectively, and the precision, recall, and F1 in the Restaurant data set are 92.45, 93.75, and 93.10%, respectively, which are higher than the comparison methods. In the emotion analysis task, the proposed method uses the BERT pre-training model, fuses the left and right contextual information, and uses BiLSTM to obtain the sequence feature matrix, which effectively improves the utilization rate of text data.

Table 3

Emotional analysis results of laptop data (%)

Model	Precision	Recall	F1
Ref. [24]	83.45	84.50	83.97
Ref. [28]	91.63	90.41	91.02
Proposed method	93.01	92.76	92.88

Table 4

Emotional analysis results of restaurant data (%)

Model	Precision	Recall	F1
Ref. [24]	84.20	83.15	83.67
Ref. [28]	90.36	91.14	90.75
Proposed method	92.45	93.75	93.10

Figure 8

Comparison of F1 values of different methods.

4.6 Comparison of training time for different models to complete one iteration

The time performance of different neural network models under the same conditions was analyzed. This article conducts comparative experiments using the same word vector matrix in the same environment, recording the time required for different models to complete an iteration on two data sets. The experimental results are shown in Table 5. From Table 5, the proposed model completes one iteration slower than that in Wang et al. [24] model and faster than that in Jamal et al. [28]. The ATAE-LSTM model proposed in Wang et al. [24] is relatively simple compared to the model proposed, resulting in a shorter training time. The model proposed in Jamal et al. [28] uses a hybrid model of term frequency-inverse document frequency and deep learning model for sentiment analysis, which is relatively complex. The proposed model can effectively reduce the training time of the model by using pre-training the BERT model to integrate the left and right contextual information.

Table 5

Training time for completing one iteration of different network models

Model	Laptop data	Restaurant data
Ref. [24]	26.2	26.6
Ref. [28]	31.6	32.1
Proposed method	27.4	28.0

5 Conclusion

In this study, we propose an emotional analysis method of consumer comments based on BERT and hierarchical attention. Using the BERT pre-training model, and through BiLSTM, the sequence characteristic matrix is obtained. Finally, sentence representation and text representation can be obtained using the two-layer LSTM. This article introduces the attention mechanism of paying attention to important information and diluting other unimportant information. Experiments show that this method is superior to the comparison method.

In addition, this article does not consider the attribute factors of the commodities in the comments, and some attributes of the commodities may have greater analytical value. It is also one of the research directions in the future to analyze emotional tendency in combination with commodity attributes. The future research can continue to improve and optimize on the basis of the research results of the proposed model, or find a more appropriate method to locate phrases in the text.

Funding information: This work was supported by Training Program for Young Key Teachers in Colleges and Universities of Henan Province (2020GGJS263).
Author contributions: Wanjun Chang: Conceived and designed the study, collected and analyzed the data, and wrote the manuscript. Mingdong Zhu: Assisted in the data analysis, conducted literature review, and contributed to the manuscript preparation. All authors have read and approved the final version of the manuscript.
Conflict of interest: The authors declare that they have no conflicts of interest to report regarding this study.
Data availability statement: The data included in this study are available without any restriction.

References

[1] Chaffar S, Inkpen D. Using a heterogeneous dataset for emotion analysis in text. Canadian Conference on Artificial Intelligence, 2011. Berlin, Heidelberg: Springer; 2011. p. 62–7.10.1007/978-3-642-21043-3_8Search in Google Scholar

[2] Aman S, Szpakowicz S. Identifying expressions of emotion in text. International Conference on Text, Speech and Dialogue, 2007. Berlin, Heidelberg: Springer; 2007. p. 196–205.10.1007/978-3-540-74628-7_27Search in Google Scholar

[3] Yadollahi A, Shahraki AG, Zaiane OR. Current state of text sentiment analysis from opinion to emotion mining. ACM Comput Surv (CSUR). 2017;50(2):1–33.10.1145/3057270Search in Google Scholar

[4] Sailunaz K, Alhajj R. Emotion and sentiment analysis from Twitter text. J Comput Sci. 2019;36(2):101–10.10.1016/j.jocs.2019.05.009Search in Google Scholar

[5] Acheampong FA, Wenyu C, Nunoo-Mensah H. Text-based emotion detection: Advances, challenges, and opportunities. Eng Rep. 2020;2(7):12189–97.10.1002/eng2.12189Search in Google Scholar

[6] Tripathi V, Joshi A, Bhattacharyya P. Emotion analysis from text: A survey. Cent Indian Lang Technol Surv. 2016;11(8):66–9.Search in Google Scholar

[7] Nandwani P, Verma R. A review on sentiment analysis and emotion detection from text. Soc Netw Anal Min. 2021;11(1):1–19.10.1007/s13278-021-00776-6Search in Google Scholar PubMed PubMed Central

[8] Rout JK, Choo KKR, Dash AK, Bakshi S, Jena SK, Williams KL. A model for sentiment and emotion analysis of unstructured social media text. Electron Commer Res. 2018;18(1):181–99.10.1007/s10660-017-9257-8Search in Google Scholar

[9] Balahur A, Hermida JM, Montoyo A. Detecting implicit expressions of emotion in text: A comparative analysis. Decis Support Syst. 2012;53(4):742–53.10.1016/j.dss.2012.05.024Search in Google Scholar

[10] Li W, Xu H. Text-based emotion classification using emotion cause extraction. Expert Syst Appl. 2014;41(4):1742–9.10.1016/j.eswa.2013.08.073Search in Google Scholar

[11] Strapparava C, Mihalcea R. Learning to identify emotions in text. Proceedings of the 2008 ACM symposium on Applied computing; 2008. p. 1556–60.10.1145/1363686.1364052Search in Google Scholar

[12] Bhowmick PK, Basu A, Mitra P. Reader perspective emotion analysis in text through ensemble based multi-label classification framework. Comput Inf Sci. 2009;2(4):64–74.10.5539/cis.v2n4p64Search in Google Scholar

[13] Zhao H, Ning YE, Wang R. Improved cross-corpus speech emotion recognition using deep local domain adaptation. Chin J Electron. 2023;32(3):1–7.10.23919/cje.2021.00.196Search in Google Scholar

[14] Hakak N, Mohd M, Kirmani M, Mohd M. Emotion analysis: A survey. 2017 International Conference on Computer, Communications and Electronics (COMPTELIX). IEEE; 2017. p. 397–402.10.1109/COMPTELIX.2017.8004002Search in Google Scholar

[15] Park SH, Bae BC, Cheong YG. Emotion recognition from text stories using an emotion embedding model. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE; 2020. p. 579–83.10.1109/BigComp48618.2020.00014Search in Google Scholar

[16] Mohsen AM, Idrees AM, Hassan HA. Emotion analysis for opinion mining from text: a comparative study. Int J e-Collab. 2019;15(1):38–58.10.4018/IJeC.2019010103Search in Google Scholar

[17] Kumar Y, Mahata D, Aggarwal S, Chugh A, Maheshwari R, Shah R. Bhaav-a text corpus for emotion analysis from Hindi stories. arXiv Preprint arXiv. 2019;2(3):123–31.Search in Google Scholar

[18] Manoharan S. Geospatial and social media analytics for emotion analysis of theme park visitors using text mining and gis. J Inf Technol. 2020;2(2):100–7.10.36548/jitdw.2020.2.003Search in Google Scholar

[19] Zhang Y, Fu J, She D, Zhang Y, Wang S, Yang J. Text emotion distribution learning via multi-task convolutional neural network. Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), 2018; 2018. p. 4595–601.10.24963/ijcai.2018/639Search in Google Scholar

[20] Segura Navarrete A, Martinez-Araneda C, Vidal-Castro C, Rubio-Manzano C. A novel approach to the creation of a labelling lexicon for improving emotion analysis in text. Electron Libr. 2021;6(12):1123–31.10.1108/EL-04-2020-0110Search in Google Scholar

[21] Zhou Y, Xu R, Gui L. A sequence level latent topic modeling method for sentiment analysis via cnn based diversified restrict Boltzmann machine. 2016 International Conference on Machine Learning and Cybernetics (ICMLC). Vol. 1. Issue 9. IEEE; 2016. p. 356–61.10.1109/ICMLC.2016.7860927Search in Google Scholar

[22] Li Q, Jin Z, Wang C, Zeng DD. Mining opinion summarizations using convolutional neural networks in Chinese microblogging systems. Knowl Syst. 2016;10(7):289–300.10.1016/j.knosys.2016.06.017Search in Google Scholar

[23] Wang W, Pan SJ, Dahlmeier D, Xiao X. Recursive neural conditional random fields for aspect-based sentiment analysis. arXiv Preprint arXiv. 2016;4(9):2311–21.10.18653/v1/D16-1059Search in Google Scholar

[24] Wang Y, Huang M, Zhu X, Zhao L. Attention-based for aspect-level sentiment classification. Proceedings of the 2016 conference on empirical methods in natural language processing; 2016. p. 606–15.10.18653/v1/D16-1058Search in Google Scholar

[25] Ma D, Li S, Zhang X, Wang H. Interactive attention networks for aspect-level sentiment classification. arXiv Preprint arXiv. 2017;4(1):4068–74.10.24963/ijcai.2017/568Search in Google Scholar

[26] Zhang M, Zhang Y, Vo DT. Gated neural networks for targeted sentiment analysis. Thirtieth AAAI Conference on Artificial Intelligence, 2016; 2016. p. 3087–93.10.1609/aaai.v30i1.10380Search in Google Scholar

[27] Chen P, Sun Z, Bing L, Yang W. Recurrent attention network on memory for aspect sentiment analysis. Proceedings of the 2017 conference on empirical methods in natural language processing; 2017. p. 452–61.10.18653/v1/D17-1047Search in Google Scholar

[28] Jamal N, Xianqiao C, Al-Turjman F, Ullah F. A deep learning–based approach for emotions classification in big corpus of imbalanced tweets. Trans Asian Low-Resour Lang Inf Process. 2021;20(3):1–16.10.1145/3410570Search in Google Scholar

[29] Liu S, Wang Z, An Y. EEG emotion recognition based on the attention mechanism and pre-trained convolution capsule network. Knowl Syst. 2023;265(8):110372.10.1016/j.knosys.2023.110372Search in Google Scholar

[30] Tang D, Qin B, Liu T. Aspect level sentiment classification with deep memory network. arXiv Preprint arXiv. 2016;7(3):1068–74.10.18653/v1/D16-1021Search in Google Scholar

Received: 2023-02-23

Revised: 2023-08-04

Accepted: 2023-10-11

Published Online: 2023-12-31

This work is licensed under the Creative Commons Attribution 4.0 International License.

Sentiment analysis method of consumer comment text based on BERT and hierarchical attention in e-commerce big data environment

Abstract

1 Introduction

2 Related works

3 Proposed model

3.1 BERT pre-training language model

3.2 BiLSTM

3.3 Attention mechanism

3.4 Model structure

4 Experiment and analysis

4.1 Experimental setup

4.2 Parameter setting

4.3 Evaluation criteria

4.4 Comparative experiment of different word embedding models

4.5 Comparative experiment on different text emotion analysis methods

4.6 Comparison of training time for different models to complete one iteration

5 Conclusion

References

Journal and Issue

Articles in the same Issue