Abstract
Renewed efforts at empirically distinguishing between different forms of political regimes leave out the cultural dimension. In this article, we demonstrate how modern computational tools can be used to fill this gap. We employ web-scraping techniques to generate a data set of speeches by heads of government in European democracies and autocratic regimes around the globe. Our data set includes 4740 speeches delivered between 1999 and 2019 by 40 political leaders of 27 countries. By scaling the results of a dictionary application, we show how, in comparative terms, liberal or illiberal the leaders present themselves to their national and international audience. In order to gauge whether our liberalness scale reveals meaningful distinctions, we perform a series of validity tests: criterion validity, qualitative hand-coding, unsupervised topic modeling, and network analysis. All tests suggest that our liberalness scale does capture meaningful differences between political regimes despite the large heterogeneity of our data.
Similar content being viewed by others
Notes
Boese (2019) provides a thorough comparison of V-Dem with Freedom House and Polity and highlights these and other advantages of the most recent V-Dem data set.
Likewise, if public communication by an autocrat continuously emphasizes liberal political norms and values, then, we argue, it undermines the persistence of the illiberal autocratic regime.
With this, we, obviously, do not claim that political leadership (and its rhetoric) is identical to the political regime (and its institutional practices). Instead, we follow those in the literature who argue that the former is prone to have an impact on the longer-term viability of the latter (Diamond 1999; Linz and Stepan 1996; Merkel 1998; Higley and Burton 2006).
We refer to a rather broad understanding of political speeches here which occasionally includes also political statements, e.g. from press conferences, and other more spontaneously produced documents.
This is demonstrated by the fact that among liberal democracies economic models with varying economic liberties can be found.
See “Appendix” section for the complete list of our dictionary terms.
Another constraint of the model is that it does not account for differences in rhetoric over time. Yet, this is rather a problem of data availability since we do not have enough speeches in each case for estimates per year.
The replication files, including robustness tests for different pre-processing strategies, are available at: https://dataverse.harvard.edu/dataverse/sfm.
For this formal distinction between democracy and autocracy we rely on the most recent data of the V-Dem Project (Coppedge et al. 2018; Lührmann et al. 2018a) who classify autocracies as regimes in which no de-facto multiparty, or free and fair elections, or Dahl’s institutional prerequisites are not minimally fulfilled (Lührmann et al. 2018b).
Concerning the current Polish prime minister Mateusz Morawiecki and Molodiva’s Pavel Filip, we cannot make clear classifications because the confidence intervals of their point estimates cross the zero line (cf. Fig. 1). The same is true for Edi Rama from Albania—yet, in his case the confidence interval overlaps only by a very small margin, suggesting that he is rather to be seen as an illiberal than liberal speaker.
For an exploratory study on different language styles among autocrats, see Maerz (2019).
We heavily pre-processed our corpus before applying the unsupervised techniques (removal of stop words, infrequently used terms, punctuation, numbers, stemming, lowercase) to remove irrelevant words and treat words with similar properties as identical. As recommended by Denny and Spirling (2018) and illustrated in our “Appendix” section, we conducted detailed robustness tests to make sure that none of these preprocessing steps uncontrollably alters the STM results. All operations were done in R (2019, v. 3.5.2.) with the STM package (Roberts et al. 2015, v. 1.3.3.). The replication files are available here: https://dataverse.harvard.edu/dataverse/sfm.
Since the model measures relative topic proportions, the rather isolated position of Orbán in this regard is not a consequence of his comparatively large share of speeches in the corpus. Other speakers have similarly isolated positions despite their rather small amount of speeches (e.g. Emmanuel Macron on ‘Collective Memory’ in Fig. 5, “Appendix” section, or Kim Jong Un on ‘Juche, Military’ in Fig. 4).
The order of the speakers in these plots is based on their scores on our dictionary scale, the horizontal lines around the point estimates refer to the 95% confidence interval for the relative proportions of each speaker.
The way Orbán’s government attacks George Soros, a financier and philanthropist known for his pro-migration and liberal opinions, is a case sui generis in the European Union. Orban’s most recent election campaign has been broadly understood as an anti-Jewish and anti-Muslim manifesto, for example by Cohen (2018) in the Guardian.
Lucas et al. (2015, Appendix E) provide more details about the graph estimation procedure which we have adopted here.
Details can be found in the material available at https://dataverse.harvard.edu/dataverse/sfm.
Further information also here: http://www.mjdenny.com/getting_started_with_preText.html.
References
Adcock, R., Collier, D.: Measurement validity: a shared standard for qualitative and quantitative research. Am. Polit. Sci. Rev. 95(3), 529–546 (2001)
Arat, Z.F.: Democracy and Human Rights in Developing Countries. Lynne Rienner, Boulder (1991)
Benoit, K., Nulty, P., Obeng, A., Wang, H., Lauderdale, B., Lowe, W.: Quanteda package (2019). Retrieved from https://cran.r-project.org/web/packages/quanteda/quanteda.pdf
Bocskor, Á.: Anti-immigration discourses in Hungary during the ‘Crisis’ year: The Orbân Government’s ‘National Consultation’ Campaign of 2015. Sociology 52(3), 551–568 (2018)
Boese, V.A.: How (not) to measure democracy. Int. Area Stud. Rev. (2019)
Bogaards, M.: De-democratization in Hungary: diffusely defective democracy. Democratization (2018). https://doi.org/10.1080/13510347.2018.1485015
Bollen, K.A.: Issues in the comparative measurement of political democracy. Am. Sociol. Rev. 45(3), 370–390 (1980)
Bozóki, A., Hegedűs, D.: An externally constrained hybrid regime: Hungary in the European Union. Democratization (2018). https://doi.org/10.1080/13510347.2018.1455664
Chang, J., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. Adv. Neural Inf. Process. Syst. 22, 288–296 (2009)
Cianetti, L., Dawson, J., Hanley, S.: Rethinking “democratic backsliding” in Central and Eastern Europe—looking beyond Hungary and Poland. East Eur. Polit. 34(3), 243–256 (2018). https://doi.org/10.1080/21599165.2018.1491401
Cohen, N.: In Hungary, the exploitation of a mythical enemy is poisoning politics. The Guardian, 31 Mar. 2018. (2018) Retrieved from https://www.theguardian.com/commentisfree/2018/mar/31/in-hungary-exploitation-of-mythical-enemy-is-poisoning-politics
Coppedge, et al.: V-dem country-year dataset v8, Varieties of Democracy (V-Dem) Project (2018). https://doi.org/10.23696/vdemcy18
Dahl, R.A.: Polyarchy: participation and opposition. Yale University Press, New Haven (1971)
Dahl, R.A.: Democracy and its critics. Yale University Press, New Haven (1989)
Denny, M.J., Spirling, A.: Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Polit. Anal. 26, 168–189 (2018)
de Vries, E., Schoonvelde, M., Schumacher, G.: No longer lost in translation. Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications, Political Analysis, Online First (2018). Retrieved from https://doi.org/10.1017/pan.2018.26
Diamond, L.J.: Developing Democracy—Toward Consolidation. The John Hopkins University Press, Baltimore (1999)
Dowell, N.M., Windsor, L.C., Graesser, A.C.: Computational linguistics analysis of leaders during crises in authoritarian regimes. Dyn. Asymmetric Confl. 9(01–03), 1–12 (2015). https://doi.org/10.1080/17467586.2015.1038286
Dukalskis, A.: The Authoritarian Public Sphere—Legitimation and Autocratic Power in North Korea, Burma, and China. Routledge, London (2017)
Dukalskis, A., Gerschewski, J.: What autocracies say (and what citizens hear): proposing four mechanisms of autocratic legitimation. Contemp. Polit. 23, 251–268 (2017)
Easton, D.: A Systems Analysis of Political Life. Wiley, New York (1965)
Fauve, A.: Global Astana: nation branding as a legitimization tool for authoritarian regimes. Cent. Asian Surv. 34, 110–124 (2015). https://doi.org/10.1080/02634937.2015.1016799
FreedomHouse: Freedom house on Hungary in 2019 (2019). Retrieved from https://freedomhouse.org/report/freedom-world/2019/hungary
Geddes, B., Frantz, E., Wright, J.: Autocratic breakdown and regime transitions: a new data set. Perspect. Polit. 12(02), 313–331 (2014)
Gerschewski, J.: The three pillars of stability: legitimation, repression, and cooptation in autocratic regimes. Democratization 20(1), 13–38 (2013)
Greene, D., O’Callaghan, D., Cunningham, P.: How many topics? Stability analysis for topic models. Lecture Notes in Computer Science, 8724 LNAI (PART 1), pp. 498–513 (2014)
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(03), 267–297 (2013)
Hanley, S., Vachudova, M.A.: Understanding the illiberal turn: democratic backsliding in the Czech Republic*. East Eur. Polit. 34(3), 276–296 (2018). https://doi.org/10.1080/21599165.2018.1493457
Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007). https://doi.org/10.1080/19312450709336664
Higley, J., Burton, M.: Elite Foundations of Liberal Democracy. Rowman and Littlefield Publishers, Oxford (2006)
Inglehart, R., Welzel, C.: Modernization, Cultural Change, and Democracy: The Human Development Sequence. Cambridge University Press, Cambridge (2005)
Kirsch, H., Welzel, C.: Democracy misunderstood: authoritarian notions of democracy around the globe. Soc. Forces (2018). https://doi.org/10.1093/sf/soy114
Krekó, P., Enyedi, Z.: Orbán ’ s laboratory of illiberalism. J. Democr. 29(3), 39–51 (2018)
Langer, A.I., Sagarzazu, I.: Are all policy decisions equal? Explaining the variation in media coverage of the UK budget. Policy Stud. J. 45(2), 337–358 (2015). https://doi.org/10.1111/psj.12119
Laver, M., Benoit, K., Garry, J.: Extracting policy positions from political texts using words as data. Am. Polit. Sci. Rev. 97(2), 311–331 (2003)
Linz, J.J., Stepan, A.C.: Problems of Democratic Transition and Consolidation: Southern Europe, South America, and Post-communist Europe. Johns Hopkins University Press, Baltimore (1996)
Lowe, W.: Yoshikoder: cross-platform multilingual content analysis. Java Software Version 0.6.5. (2015). Retrieved from http://yoshikoder.sourceforge.net/index.html
Lowe, W., Benoit, K., Mikhaylov, S., Laver, M.: Scaling policy preferences from coded political texts. Legis. Stud. Q. 36, 123–155 (2011)
Lucas, C., Nielsen, R.A., Roberts, M.E., Stewart, B.M., Storer, A., Tingley, D.: Computer-assisted text analysis for comparative politics. Polit. Anal. 23, 254–277 (2015)
Lührmann, A., Mechkova, V., Dahlum, S., Maxwell, L., Petrarca, C.S., Sigman, R., Staffan, I.: State of the world 2017: Autocratization and exclusion? Democratization (2018a). https://doi.org/10.1080/13510347.2018.1479693
Lührmann, A., Tannenberg, M., Lindberg, S.I.: Regimes of the world (RoW): opening new avenues for the comparative study of political regimes. Polit. Gov. 6(1), 60 (2018b)
Maerz, S.F.: Ma’naviyat in Uzbekistan: An ideological extrication from its soviet Past? J. Polit. Ideol. 23, 205–222 (2018a). https://doi.org/10.1080/13569317.2018.1419448
Maerz, S.F.: The many faces of authoritarian persistence. A set-theory perspective on the survival strategies of authoritarian regimes. Gov. Oppos. (2018b). https://doi.org/10.7910/DVN/DZ7JLC
Maerz, S.F.: Simulating pluralism: the language of democracy in hegemonic authoritarianism. Polit. Res. Exch. (2019). https://doi.org/10.1080/2474736X.2019.1605834
Maerz, S.F., Puschmann, C.: Text as data and automated content analysis for conflict research: a literature survey. In: Computational Conflict Research (Springer Nature as Part of Their Computational Social Sciences Series) (forthcoming) (2019)
Makarychev, A., Yatsyk, A.: Entertain and govern: from sochi 2014 to FIFA 2018. Probl. Post Communism 65(2), 115–128 (2018)
Megoran, N.: Framing Andijon, narrating the nation: Islam Karimov’s account of the events of 13 May 2005. Cent. Asian Surv. 27(1), 15–31 (2008). https://doi.org/10.1080/02634930802213965
Merkel, W.: The consolidation of post-autocratic democracies: a multi-level model. Democratization 5(3), 33–67 (1998). https://doi.org/10.1080/13510349808403572
Munzert, S., Rubba, C., Meissner, P., Nyhuis, D.: Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley, London (2015)
Nelson, L.K., Burk, D., Knudsen, M., McCall, L.: The future of coding: a comparison of hand-coding and three types of computer-assisted text analysis methods. Sociol. Methods Res. (2018). https://doi.org/10.1177/0049124118769114
Omelicheva, M.Y.: Authoritarian legitimation: assessing discourses of legitimacy in Kazakhstan and Uzbekistan. Cent. Asian Surv. 35(4), 481–500 (2016). https://doi.org/10.1080/02634937.2016.1245181
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2019). Retrieved from http://www.r-project.org
Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S.K., Rand, D.G.: Structural topic models for open-ended survey responses. Am. J. Polit. Sci. 58(4), 1064–1082 (2014)
Roberts, M.E., Stewart, B.M., Tingley, D., Benoit, K.: Package ‘stm’ (2015). Retrieved from https://cran.r-project.org/web/packages/stm/stm.pdf
Roberts, M.E., Stewart, B.M., Tingley, D.: stm: R package for structural topic models. J. Stat. Softw. 1, 12 (2018)
Scheppele, K.L.: The rule of law and the Frankenstate: why governance checklists do not work. Governance 26(4), 559–562 (2013). https://doi.org/10.1111/gove.12049
Schumpeter, J.A.: Capitalism, Socialism and Democracy, 1st edn. Harper Brothers, New York (1942)
Stanley, B.: Confrontation by default and confrontation by design: strategic and institutional responses to Poland’s populist coalition government. Democratization 23(2), 263–282 (2016)
Vanhanen, T.: Global trends of democratization in the 1990s: a statistical analysis, Berlin (1994)
Welzel, C.: Freedom Rising: Human Empowerment and the Quest for Emancipation. Cambridge University Press, Cambridge (2013)
Windsor, L., Dowell, N., Graesser, A.: The language of autocrats: leaders’ language in natural disaster crises. Risk Hazards Crisis Public Policy 5(4), 446–467 (2015)
Windsor, L., Dowell, N., Windsor, A., Kaltner, J.: Leader language and political survival strategies. Int. Interact. 44(2), 321–336 (2017). https://doi.org/10.1080/03050629.2017.1345737
Zakaria, F.: The rise of illiberal democracy. Foreign Aff. 76(6), 22–43 (1997)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Dictionary terms
1.2 Testing the effects of text cleaning procedures
Typically, text-as-data approaches based on the bag-of-words assumption rely on heavy preprocessing of the texts. Different options for preprocessing exist: capital letters can be converted into small letters, stopwords removed, words be lemmed or stemmed etc. Preprocessing decisions can, and often do, influence the results of text analysis. It therefore is paramount to test how robust the results obtained are against equally plausible preprocessing strategies.
Based on our theoretical conceptualization of illiberal and liberal rhetoric in the speeches of heads of government, we choose the following combination of preprocessing steps: Removing punctuation (P), removing numbers (N), put to lowercase (L), stemming (S), remove stopwords (W), and remove infrequently used terms (I). Concerning lowercase, we do not expect huge effects on the results since this is a rather basic procedure. Because all our 4740 speeches were automatically collected from the Internet by using webscraping techniques, we expect most texts to include numbers and other (foreign) signs with no relation to the original text of the speech (e.g. hyperlinks, frames in foreign languages, etc.). This is why we deemed it necessary to erase punctuation and numbers. We also removed infrequently used terms and an individually compiled list of stopwords (e.g. foreign language letters and strange words which we considered not relevant for the analysis).Footnote 23
In the following, we investigate the potential of our preprocessing strategy—P–N–L–S–W–I—to significantly affect our results. In essence, the diagnostic procedure suggested by Denny and Spirling tests how much documents ‘move’ depending on the applied preprocessing features. It does so by calculating the so-called preText Score which is the result of comparing pairwise distances between documents for a number of preprocessing specifications (cf. Denny and Spirling 2018). For this, we draw a random sample of not preprocessed documents from our corpus (500, as recommended by Denny and Spirling, p. 185) and perform several diagnostics to measure the potential effects of different (combinations of) preprocessing steps. All operations are done in R with Denny and Spirling’s package preText.Footnote 24
The two core functions of this package give us, first, a plot illustrated in Fig. 6. It displays the distance scores (preText Scores) for a number of combinations of preprocessing steps. Higher scores mean higher effects on results. The plot shows that our chosen combination of preprocessing features—P–N–L–S–W–I, marked with a green line—is in the medium range if compared to other preprocessing specifications. This indicates that our chosen text preprocessing can be expected to have a comparatively not so big effect on the results.
A second plot, Fig. 7, shows regression coefficients for each single text cleaning feature. Here negative coefficients indicate that a step tends to reduce the unusualness of the results, positive coefficients indicate that applying the step is likely to produce more unusual results for our corpus. The plot shows that it is particularly the feature of removing stopwords which has high potential to reduce the unusualness of our results and the removing of punctuation that has high potential to increase the unusualness of our results. Yet, as explained above, we assume that this unusualness in the not yet preprocessed documents is caused by the web-scraping procedure, thus consists of frequent foreign language letters, word fragments and punctuation which are not relevant for our analysis of illiberal and liberal language styles. That is why we can safely remove such stopwords and punctuation during preprocessing despite the expected effect.
Overall and based on these diagnostics and robustness tests we can conclude that our combined text cleaning procedures have relatively low effects on the results of our corpus’ analysis and that based on our preprocessing theory, we can accept the expectedly high effects of removing stopwords for our corpus.
1.3 Additional material for a 14-topic STM
Rights and permissions
About this article
Cite this article
Maerz, S.F., Schneider, C.Q. Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government. Qual Quant 54, 517–545 (2020). https://doi.org/10.1007/s11135-019-00885-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-019-00885-7