Abstract
Supervised machine learning and opinion lexicon are the most frequent approaches for opinion mining, but they require considerable effort to prepare the training data and to build the opinion lexicon, respectively. In this paper, a novel unsupervised clustering approach is proposed for opinion mining. Three swarm algorithms based on Particle Swarm Optimization are evaluated using three corpora with different levels of complexity with respect to size, number of opinions, domains, languages, and class balancing. K-means and Agglomerative clustering algorithms, as well as, the Artificial Bee Colony and Cuckoo Search swarm-based algorithms were selected for comparison. The proposed swarm-based algorithms achieved better accuracy using the word bigram feature model as the pre-processing technique, the Global Silhouette as optimization function, and on datasets with two classes: positive and negative. Although the swarm-based algorithms obtained lower result for datasets with three classes, they are still competitive considering that neither labeled data, nor opinion lexicons are required for the opinion clustering approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abbasi A, Hassan A, Dhar M (2014) Benchmarking Twitter sentiment analysis tools. In: Proceedings of LREC-2014, the ninth international conference on language resources and evaluation, March, pp 823–829
Balazs JA, Velásquez JD (2016) Opinion mining and information fusion: a survey. Inf Fusion 27:95–110. https://doi.org/10.1016/j.inffus.2015.06.002
Cagnina L, Errecalde M, Ingaramo D, Rosso P (2014) An efficient particle swarm optimization approach to cluster short texts. Inf Sci 265:36–49. https://doi.org/10.1016/j.ins.2013.12.010
Cagnina LC, Errecalde ML, Ingaramo DA (2008) A discrete particle swarm optimizer for clustering short-text corpora. In: Proceedings of international conference on bioinspired optimization methods and their applications, BIOMA 2008, pp 1–10
Coletta LFS, d Silva NFF, Hruschka ER, Hruschka ER (2014) Combining classification and clustering for tweet sentiment analysis. In: Brazilian conference on intelligent systems, pp 210–215. https://doi.org/10.1109/BRACIS.2014.46
ComScore (2016) Comscore: cross-platform measurement company. http://www.comscore.com/
Cornwell B (2015) Linkage criteria for agglomerative hierarchical clustering. Cambridge University Press, Cambridge, pp 270–274. https://doi.org/10.1017/CBO9781316212530.011 Structural Analysis in the Social Sciences,
Cui X, Potok TE (2005) Document clustering analysis based on hybrid pso+k-means algorithm. Special issue, pp 27–33
Cui X, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings 2005 IEEE swarm intelligence symposium, pp 185–191. https://doi.org/10.1109/SIS.2005.1501621
Evangelista TR, Padilha TPP (2013) Monitoramento de Posts Sobre Empresas de E-Commerce em Redes Sociais Utilizando Análise de Sentimentos. Brazilian Workshop on Social Network Analysis and Mining (BraSNAM)
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89. https://doi.org/10.1145/2436256.2436274
Filho TMS, Pimentel BA, Souza RM, Oliveira AL (2015) Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst Appl 42:6315–6328. https://doi.org/10.1016/j.eswa.2015.04.032
Fouladgar N, Lotfi S (2016) A novel approach for optimization in dynamic environments based on modified cuckoo search algorithm. Soft Comput 20(7):2889–2903
Go A, Bhayani R, Huang L (2010) Twitter Sentiment classification using distant supervision. Tech rep
Huang Y (2016) Conceptually categorizing geographic features from text based on latent semantic analysis and ontologies. Ann GIS 22(2):113–127
Ingaramo D, Errecalde M, Cagnina L, Rosso P (2009) Particle swarm optimization for clustering short-text corpora. Front Artif Intell Appl 196(1):3–19. https://doi.org/10.3233/978-1-60750-010-0-3
Ingaramo D, Errecalde M, Cagnina L, Rosso P (2011) A particle swarm optimizer to cluster parallel Spanish–English short-text corpora. CEUR Workshop Proc 824(Icl):43–48
Kamel N, Ouchen I, Baali K (2016) A sampling-PSO-K-means algorithm for document clustering. Adv Intell Syst Comput 388:405–411. https://doi.org/10.1007/978-3-319-23207-2
Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. J Global Optim 39(3):459–471
Karol S, Mangat V (2013) Evaluation of text document clustering approach based on particle swarm optimization. Cent Eur J Comp Sci 3(2):69–90. https://doi.org/10.2478/s13537-013-0104-2
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: IEEE international conference on systems, man, and cybernetics, pp 4–8
Kushal D, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW, pp 519–528
Li G, Liu F (2010) A clustering-based approach on sentiment analysis. In: International conference on intelligent systems and knowledge engineering (ISKE), pp 331–337. https://doi.org/10.1109/ISKE.2010.5680859
Li G, Liu F (2012) Application of a clustering method on sentiment analysis. J Inf Sci 38(2):127–139. https://doi.org/10.1177/0165551511432670
Li G, Liu F (2014) Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions. Appl Intell 40(3):441–452. https://doi.org/10.1007/s10489-013-0463-3
Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis. Min Text Data Chapter 1:415–463
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297
Marine-Roig E, Anton Clavé S (2015) Tourism analytics with massive user-generated content: a case study of Barcelona. J Destin Mark Manag. https://doi.org/10.1016/j.jdmm.2015.06.004
Marques-lucena C, Sarraipa J (2015) Framework for customers sentiment analysis. Adv Intell Syst Comput. https://doi.org/10.1007/978-3-319-11313-5
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113. https://doi.org/10.1016/j.asej.2014.04.011
Mostafa MM (2013) More than words: social networks text mining for consumer brand sentiments. Expert Syst Appl 40(10):4241–4251. https://doi.org/10.1016/j.eswa.2013.01.019
Owoputi O, Connor BO, Dyer C, Gimpel K, Schneider N (2012) Part-of-speech tagging for Twitter: word clusters and other advances. Carnegie Mellon University, Tech rep
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of LREC, pp 1320–1326. https://doi.org/10.1371/journal.pone.0026624
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, ACL ’05, pp 115–124. https://doi.org/10.3115/1219840.1219855
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(12):1–135. https://doi.org/10.1561/1500000011
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP) (July), pp 79–86
Premalatha K, Natarajan A (2009) Discrete PSO with GA operators for document clustering. Int J Recent Trends Eng 1(1):20–24
Premalatha K, Natarajan AM (2010) Hybrid PSO and GA models for document clustering. Int J Adv Soft Comput Its Appl 2(3):302–320
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2015.06.015
Sarkar S, Roy A, Purkayastha B (2013) Application of particle swarm optimization in data clustering: a survey. Int J Comput Appl 65(25):38–46
Sarkar S, Roy A, Purkayastha B (2014a) Clustering of documents using particle swarm optimization and semantics information. Int J Comput Sci Inf Technol 5(3):4175–4180
Sarkar S, Roy A, Purkayastha BS (2014b) A comparative analysis of particle swarm optimization and K-means algorithm for text clustering using Nepali Wordnet. Int J Nat Lang Comput (IJNLC) 3(3):83–92
Souza E, Alves T, Teles I, Oliveira ALI, Gusmão C (2016a) TOPIE: an open-source opinion mining pipeline to analyze consumers sentiment in Brazilian Portuguese. In: Computational processing of the Portuguese language: 12th international conference, PROPOR 2016, Tomar, Portugal, July 13–15, 2016, Proceedings. Springer International Publishing, pp 95–105
Souza E, Oliveira ALI, Silva A, Oliveira G, Santos D (2016b) An unsupervised particle swarm optimization approach for opinion clustering. In: Brazilian conference on intelligent systems, pp 307–312. https://doi.org/10.1109/BRACIS.2016.54
Teles V, Santos D, Souza E (2016) Uma Análise Comparativa de Técnicas Supervisionadas para Mineração de Opinião de Consumidores Brasileiros no Twitter. In: XIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC-2016), pp 217–228
Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126
Wu ST, Li Y, Xu Y, Pham B, Chen P (2004) Automatic pattern-taxonomy extraction for web mining. In: IEEE/WIC/ACM international conference on web intelligence, 2004. WI 2004. Proceedings. IEEE, pp 242–248
Yang XS, Deb S (2009) Cuckoo search via lévy flights. In: World congress on nature and biologically inspired computing, 2009. NaBIC 2009. IEEE, pp 210–214
Zhang Y, Xiong X, Zhang Q (2013) An improved self-adaptive PSO algorithm with detection function for multimodal function optimization problems. Math Probl Eng 2013(2013):716952. https://doi.org/10.1155/2013/716952
Acknowledgements
Ellen Souza and Alisson Silva are supported by FACEPE (Pernambuco Research Foundation). Adriano L. I. Oliveira, Diego Santos, and Gustavo Oliveira are supported by CNPq (Brazilian Council for Scientific and Technological Development).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Souza, E., Santos, D., Oliveira, G. et al. Swarm optimization clustering methods for opinion mining. Nat Comput 19, 547–575 (2020). https://doi.org/10.1007/s11047-018-9681-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-018-9681-2