Are Your Comments Positive? A Self-Distillation Contrastive Learning Method for Analyzing Online Public Opinion
Abstract
:1. Introduction
- We have constructed a multi-scenario sentiment analysis dataset to facilitate subsequent research on public opinion analysis.
- We propose a simple self-distillation framework to alleviate model confusion, enhancing the training and generalization capabilities of the model.
- We introduce the prototypical supervised contrastive learning module to mitigate the heterogeneity introduced by multiple scenarios in the dataset.
- Chapter 2: This chapter describes the related work, providing a comprehensive overview of the existing literature and identifying the research gaps that this study aims to address.
- Chapter 3: This chapter explains the process of constructing the public opinion dataset.
- Chapter 4: This chapter outlines the proposed framework and methodology, detailing the theoretical foundations, design considerations, and implementation specifics.
- Chapter 5: This chapter presents the experimental results of our methodology, including performance evaluations, comparative analyses with baseline methods, and further in-depth analyses.
- Chapter 6: This chapter concludes the paper, summarizing the key findings, discussing the implications of the results, and suggesting directions for future research.
2. Related Work
2.1. Sentiment Analysis
2.2. Self-Knowledge Distillation
2.3. Prototype Contrastive Learning
3. Public Opinion Dataset Construction
4. Method
4.1. The Framework
4.2. Self-Distillation from the Dual EMA Model
4.2.1. Revisit of Knowledge Distillation and Self-Knowledge Distillation
4.2.2. Self-Distillation from the Dual EMA Model
4.3. Prototypical Supervised Contrastive Learning
4.4. Training Objectives
5. Experiments
5.1. Evaluation Indexes
5.2. Experiments Setting
5.3. Comparison with Other Methods
- Word2Vec-BiLSTM [40]: This method uses word2vec for word embeddings, followed by a BILSTM for further feature extraction and classification.
- Word2Vec-TextCNN [41]: This method uses word2vec for word embeddings, followed by a TextCNN for further feature extraction and classification.
- BERT-BiLSTM: This method uses BERT for word embeddings followed by BILSTM for further sequence feature aggregation classification.
- BERT-RCNN: This method uses BERT for word embeddings followed by a Recurrent Convolutional Neural Network (RCNN) [43] for further sequence feature aggregation and classification.
- BERT-TextCNN: This method uses BERT for word embeddings followed by a TextCNN for further sequence feature aggregation and classification.
- BERT-DPCNN: This method uses BERT for word embeddings followed by a Deep Pyramid Convolutional Neural Network (DPCNN) [44] for further sequence feature aggregation and classification.
5.4. Ablation Study
- Each of our modules shows some performance improvement, which proves the effectiveness of each module of our approach.
- Although there has been some improvement, the prototypical supervised contrastive learning module’s performance only marginally improved (0.10%, 0.23%). This suggests limited progress from a representation standpoint.
- The self-distillation module has a more significant performance boost (0.70%, 0.17%) than the contrastive learning module. This demonstrates the significance of using soft labeling for model learning and emphasizes the motivation behind the research in this paper. In addition to this, we observe that two EMA models have a 0.17% performance improvement over a single EMA model, which indicates that two EMA models can provide different soft label knowledge for model training.
5.5. Further Analysis
5.5.1. Case Study
5.5.2. Error Analysis
- Negative adverbs + positive adjectives. Identifying double negation cases can be challenging with the current method. This is because the cognitive power of the baseline method is limited due to parameter constraints. we will try to solve the problem by using models with more parameters such as Llama [46].
- Double emotional expression. In some cases, a sentence may contain multiple emotions. To address this problem, we have used a soft labeling method. However, there may still be some errors made by the model which require further exploration to resolve.
- Some words in our dataset are not in the encoder’s word list, which can cause confusion in model training. Subsequent research will further update the word lists to address such issues.
- Some of the data are mislabeled. As illustrated in Figure 4 (4), the instance is intended to convey a negative emotion but is labeled as positive. This inconsistency may have arisen due to differences between users’ words and actions during the evaluation process. Fortunately, our model is still able to function effectively despite the labeling error.
5.5.3. Visualization
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Turney, P.D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv 2002, arXiv:cs/0212032. [Google Scholar]
- Nasukawa, T.; Yi, J. Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd International Conference on Knowledge Capture, Sanibel Island, FL, USA, 23–25 October 2003; pp. 70–77. [Google Scholar]
- Taboada, M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M. Lexicon-based methods for sentiment analysis. Comput. Linguist. 2011, 37, 267–307. [Google Scholar] [CrossRef]
- Feldman, R. Techniques and applications for sentiment analysis. Commun. ACM 2013, 56, 82–89. [Google Scholar] [CrossRef]
- Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up? Sentiment classification using machine learning techniques. arXiv 2002, arXiv:cs/0205070. [Google Scholar]
- Barbosa, L.; Feng, J. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the Coling 2010: Posters, Beijing, China, 23–27 August 2010; pp. 36–44. [Google Scholar]
- Zhao, W.; Guan, Z.; Chen, L.; He, X.; Cai, D.; Wang, B.; Wang, Q. Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans. Knowl. Data Eng. 2017, 30, 185–197. [Google Scholar] [CrossRef]
- Vateekul, P.; Koomsubha, T. A study of sentiment analysis using deep learning techniques on Thai Twitter data. In Proceedings of the 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), Khon Kaen, Thailand, 13–15 July 2016; pp. 1–6. [Google Scholar]
- Gao, Z.; Feng, A.; Song, X.; Wu, X. Target-dependent sentiment classification with BERT. IEEE Access 2019, 7, 154290–154299. [Google Scholar] [CrossRef]
- Singh, M.; Jakhar, A.K.; Pandey, S. Sentiment analysis on the impact of coronavirus in social life using the BERT model. Soc. Netw. Anal. Min. 2021, 11, 33. [Google Scholar] [CrossRef] [PubMed]
- Pang, B.; Lee, L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. arXiv 2004, arXiv:cs/0409058. [Google Scholar]
- Turney, P.D.; Littman, M.L. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. (Tois) 2003, 21, 315–346. [Google Scholar] [CrossRef]
- Kang, H.; Yoo, S.J.; Han, D. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 2012, 39, 6000–6010. [Google Scholar] [CrossRef]
- Brueckner, R.; Schulter, B. Social signal classification using deep blstm recurrent neural networks. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cheng, J.; Li, P.; Ding, Z.; Zhang, S.; Wang, H. Sentiment classification of Chinese microblogging texts with global RNN. In Proceedings of the 2016 IEEE First International Conference on Data Science in Cyberspace (DSC), Changsha, China, 13–16 June 2016; pp. 653–657. [Google Scholar]
- Cao, D.; Huang, Y.; Li, H.; Zhao, X.; Zhao, Q.; Fu, Y. Text Sentiment Classification Based on LSTM-TCN Hybrid Model and Attention Mechanism. In Proceedings of the 4th International Conference on Computer Science and Application Engineering, Sanya, China, 20–22 October 2020. [Google Scholar] [CrossRef]
- Basiri, M.E.; Nemati, S.; Abdar, M.; Cambria, E.; Acharya, U.R. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gener. Comput. Syst. 2021, 115, 279–294. [Google Scholar] [CrossRef]
- Cheng, Y.; Yao, L.; Xiang, G.; Zhang, G.; Tang, T.; Zhong, L. Text Sentiment Orientation Analysis Based on Multi-Channel CNN and Bidirectional GRU with Attention Mechanism. IEEE Access 2020, 8, 134964–134975. [Google Scholar] [CrossRef]
- Wadawadagi, R.; Pagi, V. Sentiment analysis with deep neural networks: Comparative study and performance assessment. Artif. Intell. Rev. 2020, 53, 6155–6195. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Hoang, M.; Bihorac, O.A.; Rouces, J. Aspect-based sentiment analysis using bert. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland, 30 September–2 October 2019; pp. 187–196. [Google Scholar]
- Li, X.; Bing, L.; Zhang, W.; Lam, W. Exploiting BERT for end-to-end aspect-based sentiment analysis. arXiv 2019, arXiv:1910.00883. [Google Scholar]
- Yan, C.; Liu, J.; Liu, W.; Liu, X. Research on public opinion sentiment classification based on attention parallel dual-channel deep learning hybrid model. Eng. Appl. Artif. Intell. 2022, 116, 105448. [Google Scholar] [CrossRef]
- Qin, Y.; Shi, Y.; Hao, X.; Liu, J. Microblog Text Emotion Classification Algorithm Based on TCN-BiGRU and Dual Attention. Information 2023, 14, 90. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Kim, K.; Ji, B.; Yoon, D.; Hwang, S. Self-knowledge distillation with progressive refinement of targets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6567–6576. [Google Scholar]
- Liang, J.; Li, L.; Bing, Z.; Zhao, B.; Tang, Y.; Lin, B.; Fan, H. Efficient one pass self-distillation with zipf’s label smoothing. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 104–119. [Google Scholar]
- Shen, Y.; Xu, L.; Yang, Y.; Li, Y.; Guo, Y. Self-distillation from the last mini-batch for consistency regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11943–11952. [Google Scholar]
- Hahn, S.; Choi, H. Self-knowledge distillation in natural language processing. arXiv 2019, arXiv:1908.01851. [Google Scholar]
- Liu, Y.; Shen, S.; Lapata, M. Noisy self-knowledge distillation for text summarization. arXiv 2020, arXiv:2009.07032. [Google Scholar]
- Zhao, Q.; Yu, C.; Huang, J.; Lian, J.; An, D. Sentiment analysis based on heterogeneous multi-relation signed network. Mathematics 2024, 12, 331. [Google Scholar] [CrossRef]
- Rozado, D.; Hughes, R.; Halberstadt, J. Longitudinal analysis of sentiment and emotion in news media headlines using automated labelling with Transformer language models. PLoS ONE 2022, 17, e0276367. [Google Scholar] [CrossRef]
- Li, J.; Zhou, P.; Xiong, C.; Hoi, S.C. Prototypical contrastive learning of unsupervised representations. arXiv 2020, arXiv:2005.04966. [Google Scholar]
- Zhang, Y.; Lai, G.; Zhang, M.; Zhang, Y.; Liu, Y.; Ma, S. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Queensland, Australia, 11 July 2014; pp. 83–92. [Google Scholar]
- Wang, P.; Han, K.; Wei, X.S.; Zhang, L.; Wang, L. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 943–952. [Google Scholar]
- Loshchilov, I.; Hutter, F. Fixing Weight Decay Regularization in Adam. 2018. Available online: https://openreview.net/forum?id=rk6qdGgCZ (accessed on 22 June 2024).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Wolf, T.; Chaumond, J.; Debut, L.; Sanh, V.; Delangue, C.; Moi, A.; Cistac, P.; Funtowicz, M.; Davison, J.; Shleifer, S.; et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar]
- Lestari, V.B.; Utami, E. Combining Bi-LSTM and Word2vec Embedding for Sentiment Analysis Models of Application User Reviews. Indones. J. Comput. Sci. 2024, 13. [Google Scholar] [CrossRef]
- Zhang, Y.; Wallace, B. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv 2015, arXiv:1510.03820. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
- Johnson, R.; Zhang, T. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 562–570. [Google Scholar]
- Kokhlikyan, N.; Miglani, V.; Martin, M.; Wang, E.; Alsallakh, B.; Reynolds, J.; Melnikov, A.; Kliushkina, N.; Araya, C.; Yan, S.; et al. Captum: A unified and generic model interpretability library for pytorch. arXiv 2020, arXiv:2009.07896. [Google Scholar]
- Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; Casas, D.d.L.; Hendricks, L.A.; Welbl, J.; Clark, A.; et al. Training compute-optimal large language models. arXiv 2022, arXiv:2203.15556. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
E-Commerce | Group Buying | Social Media | Movies | Overall |
---|---|---|---|---|
108,140 | 69,310 | 14,289 | 50,000 | 241,739 |
Prediction Classification | |||
Positive sentiment | Negative sentiment | ||
Ground Truth | Positive sentiment | TP | FN |
Negative sentiment | FP | TN |
Method | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|
Word2Vec-BiLSTM | 81.45 ± 1.38 | 80.73 ± 1.82 | 79.94 ± 2.43 | 80.79 ± 3.12 |
Word2Vec-CNN | 79.13 ± 2.49 | 79.48 ± 2.32 | 79.64 ± 1.95 | 79.04 ± 3.06 |
BERT | 85.15 ± 01.07 | 87.35 ± 1.14 | 86.23 ± 0.42 | 85.15 ± 1.39 |
BERT-BiLSTM | 83.43 ± 1.73 | 88.28 ± 1.35 | 86.37 ± 0.39 | 85.38 ± 0.87 |
BERT-TextCNN | 86.74 ± 0.76 | 85.62 ± 3.65 | 86.04 ± 0.68 | 86.14 ± 0.76 |
BERT-RCNN | 83.89 ± 1.24 | 87.93 ± 0.57 | 86.17 ± 0.14 | 85.96 ± 0.21 |
BERT-DPCNN | 86.07 ± 1.04 | 86.01 ± 2.34 | 86.06 ± 0.23 | 85.89 ± 0.84 |
Ours | 86.99 ± 0.79 | 87.63 ± 1.02 | 87.44 ± 0.27 | 87.14 ± 0.32 |
Method | F1 | |
---|---|---|
BERT | 86.11 | - |
Single-branch | 86.21 | +0.10 |
Dual-branch | 86.34 | +0.23 |
Single-EMA | 87.04 | +0.70 |
Dual-EMA | 87.21 | +0.17 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, D.; Shi, L.; Wang, B.; Xu, H.; Huang, W. Are Your Comments Positive? A Self-Distillation Contrastive Learning Method for Analyzing Online Public Opinion. Electronics 2024, 13, 2509. https://doi.org/10.3390/electronics13132509
Zhou D, Shi L, Wang B, Xu H, Huang W. Are Your Comments Positive? A Self-Distillation Contrastive Learning Method for Analyzing Online Public Opinion. Electronics. 2024; 13(13):2509. https://doi.org/10.3390/electronics13132509
Chicago/Turabian StyleZhou, Dongyang, Lida Shi, Bo Wang, Hao Xu, and Wei Huang. 2024. "Are Your Comments Positive? A Self-Distillation Contrastive Learning Method for Analyzing Online Public Opinion" Electronics 13, no. 13: 2509. https://doi.org/10.3390/electronics13132509