Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Text Mining and Visualization of Political Party Programs Using Keyword Extraction Methods: The Case of Portuguese Legislative Elections

  • Conference paper
  • First Online:
Information for a Better World: Normality, Virtuality, Physicality, Inclusivity (iConference 2023)

Abstract

Extracting keywords from textual data is a crucial step for text analysis. One such process may involve a considerable amount of time when done manually. In this paper, we show how keyword extraction techniques can be used to untap texts of political nature. To accomplish this objective, we conduct a case-study on top of 16 Portuguese (PT) political party programs made available in the context of the legislative elections that took place in 30th of January 2022. Our contributions are two-fold. At the level of resources, we make available a curated dataset and a python notebook that systematizes the process of transforming text into quantitative data and into visual aspects. At the methodological level, we propose to extend the keyword extraction algorithm used in this study to extract the most relevant keywords, not only from individual political party programs, but also across the entire collection of documents. A further contribution is the case-study itself, which calls attention to the fact that such solutions may be of interest not only to common people, but also to journalists or politicians alike. Broadly, we demonstrate how the discussion and the analysis that stems from the results obtained may foster the political science research by making available large-scale processing of documents with marginal costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.publico.pt/2022/01/18/politica/noticia/peso-empresas-justica-programas-ps-psd-1992182.

  2. 2.

    https://github.com/rncampos/PT-LegislativeElections2022.

  3. 3.

    https://github.com/LIAAD/yake.

  4. 4.

    https://archive.org/details/GeneralIndex.

  5. 5.

    https://voyant-tools.org.

  6. 6.

    https://github.com/JasonKessler/scattertext.

  7. 7.

    https://drive.google.com/file/d/1X3UEGoPl9WZb2I2JXAoDhihE6l79pkFn/view?usp=sharing.

  8. 8.

    For a matter of comprehensiveness, we have translated the keywords from Portuguese to English, meaning that some of the keywords may end up being formed by more than 3 terms.

References

  1. Bougouin, A., Boudin, F., Daille, B.: TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP’13), pp. 443–551 (2013). https://aclanthology.org/I13-1062

  2. Britzolakis, A., Kondylakis, H., Papadakis, N.: AthPPA: A Data Visualization Tool for Identifying Political Popularity over Twitter. Information 12, 8 (July 2021). https://doi.org/10.3390/info12080312

  3. van Aggelen, A., Hollink, L., Kemman, M., Kleppe, M., Beunders, H.: The debates of the European Parliament as Linked Open Data. Semantic Web – Interoperability, Usability, Applicability 8(2), 271–281 (December 2016). https://doi.org/10.3233/SW-160227

  4. Kaal, A.R., Maks, I., van Elfrinkhof, A.M.E.: From Text to Political Positions: Text analysis across disciplines. John Benjamins, Amsterdam (2014)

    Book  Google Scholar 

  5. Gomes, D., Cruz, D., Miranda, J., Costa, M., Fontes, S.: Search the Past with the Portuguese Web Archive. In: Proceedings of the 22nd International Conference on World Wide Web (WWW'13), pp. 321–324. ACM, Rio de Janeiro, Brazil (2013). https://doi.org/10.1145/2487788.2487934

  6. Mahata, D., Kuriakose, J., Shah, R.R., Zimmermann, R.: Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (NAACL-HLT’18), pp. 634–639. ACL (2018). https://aclanthology.org/N18-2100

  7. Papagiannopoulou, E., Tsoumakas, G.: A Review of Keyphrase Extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 20(2), e1339 (2020). https://doi.org/10.1002/widm.1339

  8. Gilardi, F., Wüest, B.: Using Text-as-Data Methods in Comparative Policy Analysis. In: Guy Peters, B., Fontaine, G. (eds.) Handbook of Research Methods and Applications in Comparative Policy Analysis, pp. 203–217. Edward Elgar Publishing, Cheltenham (April 2020). https://doi.org/10.4337/9781788111195.00019

  9. Baumgartner, F.R., Breunig, C., Grossman, E.: Comparative Policy Agendas: Theory, Tools. Data. Oxford University Press, USA (2019)

    Book  Google Scholar 

  10. Glavas, G., Nanni, F., Ponzetto, S.: Computational Analysis of Political Texts: Bridging Research Efforts Across Communities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pp. 18–23. ACL, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-4004

  11. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on Digital Libraries, pp. 254–255 (August 1999). https://doi.org/10.1145/313238.313437

  12. Spasic, I., Nenadic, G. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 8(3), e17984 (March 2020). https://doi.org/10.2196/17984

  13. Dilay, I., Dilai, M.: Automatic Extraction of Keywords in Political Speeches. In: Proceedings of the IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT’20), pp. 291–294 (2020). https://doi.org/10.1109/CSIT49958.2020.9322011

  14. Grimmer, J., Stewart, B.M:. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21(3), 267–297 (2013). Cambridge University Press. https://doi.org/10.1093/pan/mps028

  15. Wilkerson, J., Casas, A.: Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges. Annual Review of Political Science 20(1), 529–544 (May 2017). https://doi.org/10.1146/annurev-polisci-052615-025542

  16. Jaidka, K., Ahmed, S., Skoric, M., Hilbert, M.: Predicting elections from social media: a three-country, three-method comparative study. Asian Journal of Communication 29(3), 252–273 (March 2018). https://doi.org/10.1080/01292986.2018.1453849

  17. Belcastro, L., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Learning Political Polarization on Social Media Using Neural Networks. IEEE Access 8, 47177–47187 (March 2020). https://doi.org/10.1109/ACCESS.2020.2978950

  18. Laver, M., Benoit, K., Garry, J.: Extracting Policy Positions from Political Texts Using Words as Data. American Political Science Review 97(2), 311–331 (August 2003). https://doi.org/10.1017/S0003055403000698

  19. Öztürk, N., Ayvaz, S.: Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis. Telematics and Informatics 35(1), 136–147 (April 2018). https://doi.org/10.1016/j.tele.2017.10.006

  20. Meng, R., Mahata, D., Boudin, F.: From Fundamentals to Recent Advances: A Tutorial on Keyphrasification. 2022. In: Hagen, M., et al. (eds.) Advances in Information Retrieval. ECIR'22, Stavanger, Norway. April 10 - 14). Lecture Notes in Computer Science, vol 13186, pp. 582–588. Springer (2022)

    Google Scholar 

  21. Campos, R., Jorge, A., Jatowt, A., Bhatia, S., Litvak, M.: The 5th International Workshop on Narrative Extraction from Texts: Text2Story 2022. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 552–556. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_68

    Chapter  Google Scholar 

  22. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Information Sciences 509, 257–289 (January 2020). https://doi.org/10.1016/j.ins.2019.09.013

  23. El-Beltagy, S.R., Rafea, A.: KP-Miner: A keyphrase extraction system for English and Arabic documents. Information Systems 34(1), 132–144 (March 2009). https://doi.org/10.1016/j.is.2008.05.002

  24. Menini, S., Tonelli, S.: Agreement and Disagreement: Comparison of Points of View in the Political Domain. In: Proceedings of the the 26th International Conference on Computational Linguistics: Technical Papers (Coling’16). Osaka, Japan, pp. 2461–270 (2016). https://aclanthology.org/C16-1232

  25. Burst, T., et al.: Manifesto Corpus. Version: 2021-1. WZB Berlin Social Science Center, Berlin (2021). Retrieved 21 March 2022 from https://manifesto-project.wzb.eu/information/documents/corpus

Download references

Acknowledgments

Ricardo Campos and Alípio Jorge were financed by the ERDF - European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01–0145-FEDER-03185). This funding fits under the research line of the Text2Story project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Campos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Campos, R., Jatowt, A., Jorge, A. (2023). Text Mining and Visualization of Political Party Programs Using Keyword Extraction Methods: The Case of Portuguese Legislative Elections. In: Sserwanga, I., et al. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. iConference 2023. Lecture Notes in Computer Science, vol 13971. Springer, Cham. https://doi.org/10.1007/978-3-031-28035-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28035-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28034-4

  • Online ISBN: 978-3-031-28035-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics