Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Evaluating Active Learning Sampling Strategies for Opinion Mining in Brazilian Politics Corpora

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11805))

Included in the following conference series:

Abstract

Politics is a commonly used domain in Opinion Mining applications, in which opinions may change over time. Nevertheless, the usual approaches for Opinion Mining are not able to deal with the characteristics and the challenges brought by continuous data streams; so, an alternative is the use of techniques such as Active Learning, which labels selected data rather than the entire data set. The Active Learning approach requires the choice of a sampling strategy to select the most valuable instances. However, no study has performed an analysis in order to identify the best strategies for Opinion Mining. In this sense, we evaluated eight Active Learning sampling strategies, from which Entropy achieved the best results. In addition, due to the lack of publicly available stream data sets written in Portuguese, we created and evaluated corpora from Twitter and Facebook about the 2018 Brazilian presidential elections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://miningbrgroup.com.br/index.php/resources/.

  2. 2.

    https://www.nltk.org.

  3. 3.

    https://scikit-learn.org.

  4. 4.

    docs.scipy.org/doc/scipy-0.14.0/reference/index.html.

  5. 5.

    http://docs.orange.biolab.si/3/data-mining-library/index.html.

References

  1. Aldoğan, D., Yaslan, Y.: A comparison study on active learning integrated ensemble approaches in sentiment analysis. Comput. Electr. Eng. 57, 311–323 (2017). https://doi.org/10.1016/j.compeleceng.2016.11.015

    Article  Google Scholar 

  2. Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–236 (2017)

    Article  Google Scholar 

  3. Aston, N., Liddle, J., Hu, W.: Twitter sentiment in data streams with perceptron. J. Comput. Commun. 2(03), 11 (2014)

    Article  Google Scholar 

  4. Aston, N., Munson, T., Liddle, J., Hartshaw, G., Livingston, D., Hu, W.: Sentiment analysis on the social networks using stream algorithms. J. Data Anal. Inf. Process. 2(02), 60 (2014)

    Google Scholar 

  5. Balazs, J.A., Velásquez, J.D.: Opinion mining and information fusion: a survey. Inf. Fusion 27, 95–110 (2016). https://doi.org/10.1016/j.inffus.2015.06.002

    Article  Google Scholar 

  6. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)

    Article  Google Scholar 

  7. Danka, T., Horvath, P.: modAL: a modular active learning framework for Python (2018). https://github.com/cosmic-cortex/modAL, arXiv at https://arxiv.org/abs/1805.00979

  8. Firmino Alves, A.L., Baptista, C.D.S., Firmino, A.A., Oliveira, M.G.A.D., Paiva, A.C.D.: A comparison of SVM versus Naive-Bayes techniques for sentiment analysis in tweets: a case study with the 2013 FIFA confederations cup. In: Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, pp. 123–130 (2014)

    Google Scholar 

  9. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford vol. 1, no. 12 (2009)

    Google Scholar 

  10. Guerra, P.C., Meira Jr., W., Cardie, C.: Sentiment analysis on evolving social streams: how self-report imbalances can help. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 443–452 (2014). https://doi.org/10.1145/2556195.2556261

  11. Kranjc, J., Smailović, J., Podpečan, V., Grčar, M., Žnidaršič, M., Lavrač, N.: Active learning for sentiment analysis on data streams: methodology and workflow implementation in the ClowdFlows platform. Inf. Process. Manag. 51(2), 187–203 (2015). https://doi.org/10.1016/j.ipm.2014.04.001

    Article  Google Scholar 

  12. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1994)

    Chapter  Google Scholar 

  13. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13

    Chapter  Google Scholar 

  14. Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  15. Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl.-Based Syst. 89, 14–46 (2015)

    Article  Google Scholar 

  16. Saleiro, P., Sarmento, L., Rodrigues, E.M., Soares, C., Oliveira, E.: Learning word embeddings from the portuguese twitter stream: a study of some practical aspects. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds.) EPIA 2017. LNCS (LNAI), vol. 10423, pp. 880–891. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65340-2_71

    Chapter  Google Scholar 

  17. Sanders, N.J.: Twitter sentiment corpus (2011)

    Google Scholar 

  18. Silva, I.S., Gomide, J., Veloso, A., Meira Jr., W., Ferreira, R.: Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 475–484 (2011). https://doi.org/10.1145/2009916.2009981

  19. Smailović, J., Grčar, M., Lavrač, N., Žnidaršič, M.: Stream-based active learning for sentiment analysis in the financial domain. Inf. Sci. 285(C), 181–203 (2014). https://doi.org/10.1016/j.ins.2014.04.034

    Article  Google Scholar 

  20. Souza, E., et al.: Characterising text mining: a systematic mapping review of the Portuguese language. IET Softw. 12(2), 49–75 (2018). https://doi.org/10.1049/iet-sen.2016.0226

    Article  Google Scholar 

  21. Souza, E., Vitório, D., Castro, D., Oliveira, A.L.I., Gusmão, C.: Characterizing opinion mining: a systematic mapping study of the Portuguese language. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 122–127. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_12

    Chapter  Google Scholar 

  22. Wagner, S., Zimmermann, M., Ntoutsi, E., Spiliopoulou, M.: Ageing-based multinomial Naive Bayes classifiers over opinionated data streams. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 401–416. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_25

    Chapter  Google Scholar 

  23. Wang, D., Feng, S., Wang, D., Yu, G.: Detecting opinion drift from Chinese web comments based on sentiment distribution computing. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8180, pp. 72–81. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41230-1_6

    Chapter  Google Scholar 

  24. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996). https://doi.org/10.1023/A:1018046501280

    Article  Google Scholar 

  25. Yang, Y., Loog, M.: A benchmark and comparison of active learning for logistic regression. Pattern Recogn. 83, 401–415 (2018). https://doi.org/10.1016/j.patcog.2018.06.004

    Article  Google Scholar 

  26. Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 757–762, October 2007. https://doi.org/10.1109/ICDM.2007.101

  27. Zimmermann, M., Ntoutsi, E., Spiliopoulou, M.: Incremental active opinion learning over a stream of opinionated documents. arXiv preprint arXiv:1509.01288 (2015)

  28. Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with evolving streaming data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 597–612. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_39

    Chapter  Google Scholar 

Download references

Acknowledgment

Douglas Vitório and Adriano L. I. Oliveira are supported by CNPq (Brazilian Council for Scientific and Technological Development).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas Vitório .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vitório, D., Souza, E., Oliveira, A.L.I. (2019). Evaluating Active Learning Sampling Strategies for Opinion Mining in Brazilian Politics Corpora. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30244-3_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30243-6

  • Online ISBN: 978-3-030-30244-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics