Abstract
Extracting keywords from textual data is a crucial step for text analysis. One such process may involve a considerable amount of time when done manually. In this paper, we show how keyword extraction techniques can be used to untap texts of political nature. To accomplish this objective, we conduct a case-study on top of 16 Portuguese (PT) political party programs made available in the context of the legislative elections that took place in 30th of January 2022. Our contributions are two-fold. At the level of resources, we make available a curated dataset and a python notebook that systematizes the process of transforming text into quantitative data and into visual aspects. At the methodological level, we propose to extend the keyword extraction algorithm used in this study to extract the most relevant keywords, not only from individual political party programs, but also across the entire collection of documents. A further contribution is the case-study itself, which calls attention to the fact that such solutions may be of interest not only to common people, but also to journalists or politicians alike. Broadly, we demonstrate how the discussion and the analysis that stems from the results obtained may foster the political science research by making available large-scale processing of documents with marginal costs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
For a matter of comprehensiveness, we have translated the keywords from Portuguese to English, meaning that some of the keywords may end up being formed by more than 3 terms.
References
Bougouin, A., Boudin, F., Daille, B.: TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP’13), pp. 443–551 (2013). https://aclanthology.org/I13-1062
Britzolakis, A., Kondylakis, H., Papadakis, N.: AthPPA: A Data Visualization Tool for Identifying Political Popularity over Twitter. Information 12, 8 (July 2021). https://doi.org/10.3390/info12080312
van Aggelen, A., Hollink, L., Kemman, M., Kleppe, M., Beunders, H.: The debates of the European Parliament as Linked Open Data. Semantic Web – Interoperability, Usability, Applicability 8(2), 271–281 (December 2016). https://doi.org/10.3233/SW-160227
Kaal, A.R., Maks, I., van Elfrinkhof, A.M.E.: From Text to Political Positions: Text analysis across disciplines. John Benjamins, Amsterdam (2014)
Gomes, D., Cruz, D., Miranda, J., Costa, M., Fontes, S.: Search the Past with the Portuguese Web Archive. In: Proceedings of the 22nd International Conference on World Wide Web (WWW'13), pp. 321–324. ACM, Rio de Janeiro, Brazil (2013). https://doi.org/10.1145/2487788.2487934
Mahata, D., Kuriakose, J., Shah, R.R., Zimmermann, R.: Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (NAACL-HLT’18), pp. 634–639. ACL (2018). https://aclanthology.org/N18-2100
Papagiannopoulou, E., Tsoumakas, G.: A Review of Keyphrase Extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 20(2), e1339 (2020). https://doi.org/10.1002/widm.1339
Gilardi, F., Wüest, B.: Using Text-as-Data Methods in Comparative Policy Analysis. In: Guy Peters, B., Fontaine, G. (eds.) Handbook of Research Methods and Applications in Comparative Policy Analysis, pp. 203–217. Edward Elgar Publishing, Cheltenham (April 2020). https://doi.org/10.4337/9781788111195.00019
Baumgartner, F.R., Breunig, C., Grossman, E.: Comparative Policy Agendas: Theory, Tools. Data. Oxford University Press, USA (2019)
Glavas, G., Nanni, F., Ponzetto, S.: Computational Analysis of Political Texts: Bridging Research Efforts Across Communities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pp. 18–23. ACL, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-4004
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on Digital Libraries, pp. 254–255 (August 1999). https://doi.org/10.1145/313238.313437
Spasic, I., Nenadic, G. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 8(3), e17984 (March 2020). https://doi.org/10.2196/17984
Dilay, I., Dilai, M.: Automatic Extraction of Keywords in Political Speeches. In: Proceedings of the IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT’20), pp. 291–294 (2020). https://doi.org/10.1109/CSIT49958.2020.9322011
Grimmer, J., Stewart, B.M:. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21(3), 267–297 (2013). Cambridge University Press. https://doi.org/10.1093/pan/mps028
Wilkerson, J., Casas, A.: Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges. Annual Review of Political Science 20(1), 529–544 (May 2017). https://doi.org/10.1146/annurev-polisci-052615-025542
Jaidka, K., Ahmed, S., Skoric, M., Hilbert, M.: Predicting elections from social media: a three-country, three-method comparative study. Asian Journal of Communication 29(3), 252–273 (March 2018). https://doi.org/10.1080/01292986.2018.1453849
Belcastro, L., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Learning Political Polarization on Social Media Using Neural Networks. IEEE Access 8, 47177–47187 (March 2020). https://doi.org/10.1109/ACCESS.2020.2978950
Laver, M., Benoit, K., Garry, J.: Extracting Policy Positions from Political Texts Using Words as Data. American Political Science Review 97(2), 311–331 (August 2003). https://doi.org/10.1017/S0003055403000698
Öztürk, N., Ayvaz, S.: Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis. Telematics and Informatics 35(1), 136–147 (April 2018). https://doi.org/10.1016/j.tele.2017.10.006
Meng, R., Mahata, D., Boudin, F.: From Fundamentals to Recent Advances: A Tutorial on Keyphrasification. 2022. In: Hagen, M., et al. (eds.) Advances in Information Retrieval. ECIR'22, Stavanger, Norway. April 10 - 14). Lecture Notes in Computer Science, vol 13186, pp. 582–588. Springer (2022)
Campos, R., Jorge, A., Jatowt, A., Bhatia, S., Litvak, M.: The 5th International Workshop on Narrative Extraction from Texts: Text2Story 2022. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 552–556. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_68
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Information Sciences 509, 257–289 (January 2020). https://doi.org/10.1016/j.ins.2019.09.013
El-Beltagy, S.R., Rafea, A.: KP-Miner: A keyphrase extraction system for English and Arabic documents. Information Systems 34(1), 132–144 (March 2009). https://doi.org/10.1016/j.is.2008.05.002
Menini, S., Tonelli, S.: Agreement and Disagreement: Comparison of Points of View in the Political Domain. In: Proceedings of the the 26th International Conference on Computational Linguistics: Technical Papers (Coling’16). Osaka, Japan, pp. 2461–270 (2016). https://aclanthology.org/C16-1232
Burst, T., et al.: Manifesto Corpus. Version: 2021-1. WZB Berlin Social Science Center, Berlin (2021). Retrieved 21 March 2022 from https://manifesto-project.wzb.eu/information/documents/corpus
Acknowledgments
Ricardo Campos and Alípio Jorge were financed by the ERDF - European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01–0145-FEDER-03185). This funding fits under the research line of the Text2Story project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Campos, R., Jatowt, A., Jorge, A. (2023). Text Mining and Visualization of Political Party Programs Using Keyword Extraction Methods: The Case of Portuguese Legislative Elections. In: Sserwanga, I., et al. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. iConference 2023. Lecture Notes in Computer Science, vol 13971. Springer, Cham. https://doi.org/10.1007/978-3-031-28035-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-28035-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28034-4
Online ISBN: 978-3-031-28035-1
eBook Packages: Computer ScienceComputer Science (R0)