Abstract
Politics is a commonly used domain in Opinion Mining applications, in which opinions may change over time. Nevertheless, the usual approaches for Opinion Mining are not able to deal with the characteristics and the challenges brought by continuous data streams; so, an alternative is the use of techniques such as Active Learning, which labels selected data rather than the entire data set. The Active Learning approach requires the choice of a sampling strategy to select the most valuable instances. However, no study has performed an analysis in order to identify the best strategies for Opinion Mining. In this sense, we evaluated eight Active Learning sampling strategies, from which Entropy achieved the best results. In addition, due to the lack of publicly available stream data sets written in Portuguese, we created and evaluated corpora from Twitter and Facebook about the 2018 Brazilian presidential elections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aldoğan, D., Yaslan, Y.: A comparison study on active learning integrated ensemble approaches in sentiment analysis. Comput. Electr. Eng. 57, 311–323 (2017). https://doi.org/10.1016/j.compeleceng.2016.11.015
Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–236 (2017)
Aston, N., Liddle, J., Hu, W.: Twitter sentiment in data streams with perceptron. J. Comput. Commun. 2(03), 11 (2014)
Aston, N., Munson, T., Liddle, J., Hartshaw, G., Livingston, D., Hu, W.: Sentiment analysis on the social networks using stream algorithms. J. Data Anal. Inf. Process. 2(02), 60 (2014)
Balazs, J.A., Velásquez, J.D.: Opinion mining and information fusion: a survey. Inf. Fusion 27, 95–110 (2016). https://doi.org/10.1016/j.inffus.2015.06.002
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Danka, T., Horvath, P.: modAL: a modular active learning framework for Python (2018). https://github.com/cosmic-cortex/modAL, arXiv at https://arxiv.org/abs/1805.00979
Firmino Alves, A.L., Baptista, C.D.S., Firmino, A.A., Oliveira, M.G.A.D., Paiva, A.C.D.: A comparison of SVM versus Naive-Bayes techniques for sentiment analysis in tweets: a case study with the 2013 FIFA confederations cup. In: Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, pp. 123–130 (2014)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford vol. 1, no. 12 (2009)
Guerra, P.C., Meira Jr., W., Cardie, C.: Sentiment analysis on evolving social streams: how self-report imbalances can help. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 443–452 (2014). https://doi.org/10.1145/2556195.2556261
Kranjc, J., Smailović, J., Podpečan, V., Grčar, M., Žnidaršič, M., Lavrač, N.: Active learning for sentiment analysis on data streams: methodology and workflow implementation in the ClowdFlows platform. Inf. Process. Manag. 51(2), 187–203 (2015). https://doi.org/10.1016/j.ipm.2014.04.001
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1994)
Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13
Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2(1–2), 1–135 (2008)
Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl.-Based Syst. 89, 14–46 (2015)
Saleiro, P., Sarmento, L., Rodrigues, E.M., Soares, C., Oliveira, E.: Learning word embeddings from the portuguese twitter stream: a study of some practical aspects. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds.) EPIA 2017. LNCS (LNAI), vol. 10423, pp. 880–891. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65340-2_71
Sanders, N.J.: Twitter sentiment corpus (2011)
Silva, I.S., Gomide, J., Veloso, A., Meira Jr., W., Ferreira, R.: Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 475–484 (2011). https://doi.org/10.1145/2009916.2009981
Smailović, J., Grčar, M., Lavrač, N., Žnidaršič, M.: Stream-based active learning for sentiment analysis in the financial domain. Inf. Sci. 285(C), 181–203 (2014). https://doi.org/10.1016/j.ins.2014.04.034
Souza, E., et al.: Characterising text mining: a systematic mapping review of the Portuguese language. IET Softw. 12(2), 49–75 (2018). https://doi.org/10.1049/iet-sen.2016.0226
Souza, E., Vitório, D., Castro, D., Oliveira, A.L.I., Gusmão, C.: Characterizing opinion mining: a systematic mapping study of the Portuguese language. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 122–127. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_12
Wagner, S., Zimmermann, M., Ntoutsi, E., Spiliopoulou, M.: Ageing-based multinomial Naive Bayes classifiers over opinionated data streams. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 401–416. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_25
Wang, D., Feng, S., Wang, D., Yu, G.: Detecting opinion drift from Chinese web comments based on sentiment distribution computing. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8180, pp. 72–81. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41230-1_6
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996). https://doi.org/10.1023/A:1018046501280
Yang, Y., Loog, M.: A benchmark and comparison of active learning for logistic regression. Pattern Recogn. 83, 401–415 (2018). https://doi.org/10.1016/j.patcog.2018.06.004
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 757–762, October 2007. https://doi.org/10.1109/ICDM.2007.101
Zimmermann, M., Ntoutsi, E., Spiliopoulou, M.: Incremental active opinion learning over a stream of opinionated documents. arXiv preprint arXiv:1509.01288 (2015)
Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with evolving streaming data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 597–612. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_39
Acknowledgment
Douglas Vitório and Adriano L. I. Oliveira are supported by CNPq (Brazilian Council for Scientific and Technological Development).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vitório, D., Souza, E., Oliveira, A.L.I. (2019). Evaluating Active Learning Sampling Strategies for Opinion Mining in Brazilian Politics Corpora. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_57
Download citation
DOI: https://doi.org/10.1007/978-3-030-30244-3_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30243-6
Online ISBN: 978-3-030-30244-3
eBook Packages: Computer ScienceComputer Science (R0)