Abstract
With the growing availability and popularity of online reviews, the sentiment analysis arises in response to the requirement of organizing useful information in speed. Feature selection directly affects the representation of online reviews and brings a lot of challenges to the domain of sentiment analysis. However, little attention has been paid to feature selection of Chinese online reviews so far. Therefore, we are motivated to explore the effects of feature selection on sentiment analysis of Chinese online reviews. Firstly, N-char-grams and N-POS-grams are selected as the potential sentimental features. Then, the improved Document Frequency method is used to select feature subsets, and the Boolean Weighting method is adopted to calculate feature weight. At last, experiments based on online reviews of mobile phone are conducted, and Chi-square test is carried out to test the significance of experimental results. The results suggest that sentiment analysis of Chinese online reviews obtains higher accuracy when taking 4-POS-grams as features. Besides that, low order N-char-grams can achieve a better performance than high order N-char-grams when taking N-char-grams as features. Furthermore, the improved document frequency achieves significant improvement in sentiment analysis of Chinese online reviews.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69:14–23
Forman C, Ghose A, Wiesenfeld B (2008) Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets. Inf Syst Res 19(3):291–313
Greaves F, Ramirez D, Millett C, Darzi A, Donaldson L (2013) Harnessing the cloud of patient experience: using social media to detect poor quality healthcare. BMJ Qual Saf 22(3):251–255
Yang L, Xu LD, Shi ZZ (2012) An enhanced dynamic hash trie algorithm for lexicon search. Enterpr Inf Syst 6(4):419–432
Li HX, Xu LD, Wang JY, Mo ZW (2003) Feature space theory in data mining: transformations between extensions and intensions in knowledge representation. Expert Syst 20(2):60–71
Ye Q, Lin B, Li YJ (2005) Sentiment classification for chinese reviews: a comparison between SVM and semantic approaches. In: proceedings of the 4th international conference on machine learning and cybernetics. NY, USA: IEEE Press, pp 2341–2346
Xie ZX, Xu Y (2014) Sparse group LASSO based uncertain feature selection. Int J Mach Learn Cybern 5(2):201–210
Subrahmanya N, Shin YC (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619
Wei P, Ma PJ, Hu QH, Su XH (2014) Comparative analysis on margin based feature selection algorithms. Int J Mach Learn Cybern 5(3):339–367
Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12–21
Huang C (1997) Word segmentation issues in chinese information processing. Applied linguistics (in Chinese), p 1
Zhao H, Huang C, Li M (2006) An improved chinese word segmentation system with conditional random field. In: proceedings of the 5th SIGNAN workshop on Chinese language processing. Sydney, Australia, pp 162–165
Gao J, Li M, Wu A, Huang C (2005) Chinese word segmentation and named entity recognition: a pragmatic approach. Comput Linguist 31(4):531–574
Zhang D (2013) An evolutionary approach to automatic chinese text segmentation. In: ninth international conference on natural computation
Abbasi A, Chen H, Thoms S, Fu T (2008) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180
Ghiassi M, Skinner J, Zimbra D (2013) Twitter brand sentiment analysis: a hybrid system using N-gram analysis and dynamic artificial neural network. Expert Syst Appl 40(16):6266–6282
Remus R, Rill S (2013) Data-driven vs. dictionary-based word n-gram feature induction for sentiment analysis. In: 25th international conference of the German-Society-for -Computational-Linguistics-and-Language-Technology (GSCL). Darmstadt, Germany, pp 25–27
Pang B, Lee L, Vaithyanathan S (2002) Sentiment classification using machine learning techniques. In: proceedings of the conference on empirical methods in natural language processing, Philadelphia, US, pp 79–86
Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews. In: proceedings of the 21st national conference on artificial intelligence (AAAI-06), Boston, USA, pp 1265–1270
Ng V, Dasgupta S, Arifin N (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: proceedings of the COLING/ACL main conference poster sessions, Association for Computational Linguistics, Morristown, NJ, USA, pp 611–618
Turney P (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of review. In: proceedings of the 40th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Morristown, NJ, USA, pp 417–424
Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 412–418
Ng V, Dasgupta S, Arifin SMN (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews.In: proceedings conference computational linguistics, association for computational linguistics, pp 611–618
Ng HT, Goh WB, Low KL (1997) Feature selection, perceptron learning and a usability case study for text categorization. In: proceedings of the 20th annual Int’l ACM SIGIR conference on research and development in information retrieval, pp 67–73
Liu X (2011) Sentiment polarity classification on chinese reviews based on statistic natural language. Master’s Degree Thesis, Tongji University
Wang HW, Yin P, Yao JN (2013) Text feature selection for sentiment classification of chinese online reviews. J Exp Theor Artif Intell 25(4):425–439
Rückstieß T, Osendorfer C, Smagt PVD (2013) Minimizing data consumption with sequential online feature selection. Int J Mach Learn Cybern 4(3):235–243
Xia HS, Peng LY (2009) SVM-based comments classification and mining of virtual community: for case of sentiment classification of hotel reviews. In: proceedings of the Int’l symposium on intelligent information systems and applications, pp 507–511
Phienthrakul T, Kijsirikul B, Takamura H, Okumura M (2009) Sentiment classification with support vector machines and multiple kernel functions. Lect Notes Computer Sci 58:583–592
Ye Q, Zhang ZQ, Law R (2009) Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst Appl 36(3):6527–6535
Moraes R, Valiati JF, Gaviao N, Wilson P (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633
Wan X (2011) Bilingual co-training for sentiment classification of chinese product reviews. Comput Linguist 37(3):587–616
Acknowledgments
This work is partially supported by the NSFC Grant 70971099 and 71371144, the fundamental research funds for the Central Universities (1200219198), and Shanghai Philosophy and Social Science Planning Projects (2013BGL004).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, L., Wang, H. & Gao, S. Sentimental feature selection for sentiment analysis of Chinese online reviews. Int. J. Mach. Learn. & Cyber. 9, 75–84 (2018). https://doi.org/10.1007/s13042-015-0347-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0347-4