Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government

Maerz, Seraphine F.; Schneider, Carsten Q.

doi:10.1007/s11135-019-00885-7

Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government

Published: 11 May 2019

Volume 54, pages 517–545, (2020)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

2552 Accesses
11 Citations
48 Altmetric
Explore all metrics

Abstract

Renewed efforts at empirically distinguishing between different forms of political regimes leave out the cultural dimension. In this article, we demonstrate how modern computational tools can be used to fill this gap. We employ web-scraping techniques to generate a data set of speeches by heads of government in European democracies and autocratic regimes around the globe. Our data set includes 4740 speeches delivered between 1999 and 2019 by 40 political leaders of 27 countries. By scaling the results of a dictionary application, we show how, in comparative terms, liberal or illiberal the leaders present themselves to their national and international audience. In order to gauge whether our liberalness scale reveals meaningful distinctions, we perform a series of validity tests: criterion validity, qualitative hand-coding, unsupervised topic modeling, and network analysis. All tests suggest that our liberalness scale does capture meaningful differences between political regimes despite the large heterogeneity of our data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Telegram channels covering Russia’s invasion of Ukraine: a comparative analysis of large multilingual corpora

Article 03 January 2024

The Internet in China: New Methods and Opportunities

Evolving linguistic divergence on polarizing social media

Article Open access 15 March 2024

Notes

Boese (2019) provides a thorough comparison of V-Dem with Freedom House and Polity and highlights these and other advantages of the most recent V-Dem data set.
Likewise, if public communication by an autocrat continuously emphasizes liberal political norms and values, then, we argue, it undermines the persistence of the illiberal autocratic regime.
With this, we, obviously, do not claim that political leadership (and its rhetoric) is identical to the political regime (and its institutional practices). Instead, we follow those in the literature who argue that the former is prone to have an impact on the longer-term viability of the latter (Diamond 1999; Linz and Stepan 1996; Merkel 1998; Higley and Burton 2006).
We refer to a rather broad understanding of political speeches here which occasionally includes also political statements, e.g. from press conferences, and other more spontaneously produced documents.
To formally distinguish between both regime types, we rely on the most recent V-Dem data and their regime typology (Coppedge et al. 2018; Lührmann et al. 2018b, see also Sect. 4.1).
de Vries et al. (2018) illustrate in various tests that for bag-of-words text models findings generated from human-translated and machine-translated texts highly overlap. Lucas et al. (2015) provide a showcase of how to preprocess and manage multilingual texts in R.
This is demonstrated by the fact that among liberal democracies economic models with varying economic liberties can be found.
See “Appendix” section for the complete list of our dictionary terms.
Another constraint of the model is that it does not account for differences in rhetoric over time. Yet, this is rather a problem of data availability since we do not have enough speeches in each case for estimates per year.
The replication files, including robustness tests for different pre-processing strategies, are available at: https://dataverse.harvard.edu/dataverse/sfm.
For this formal distinction between democracy and autocracy we rely on the most recent data of the V-Dem Project (Coppedge et al. 2018; Lührmann et al. 2018a) who classify autocracies as regimes in which no de-facto multiparty, or free and fair elections, or Dahl’s institutional prerequisites are not minimally fulfilled (Lührmann et al. 2018b).
Concerning the current Polish prime minister Mateusz Morawiecki and Molodiva’s Pavel Filip, we cannot make clear classifications because the confidence intervals of their point estimates cross the zero line (cf. Fig. 1). The same is true for Edi Rama from Albania—yet, in his case the confidence interval overlaps only by a very small margin, suggesting that he is rather to be seen as an illiberal than liberal speaker.
Available here: http://www.kormany.hu/en/the-prime-minister/the-prime-minister-s-speeches/prime-minister-viktor-orban-s-speech-at-the-final-fidesz-election-campaign-event.
Available here: https://www.bundeskanzlerin.de/bkin-en/news/towards-a-cosmopolitan-and-diverse-germany-1141998.
For an exploratory study on different language styles among autocrats, see Maerz (2019).
We heavily pre-processed our corpus before applying the unsupervised techniques (removal of stop words, infrequently used terms, punctuation, numbers, stemming, lowercase) to remove irrelevant words and treat words with similar properties as identical. As recommended by Denny and Spirling (2018) and illustrated in our “Appendix” section, we conducted detailed robustness tests to make sure that none of these preprocessing steps uncontrollably alters the STM results. All operations were done in R (2019, v. 3.5.2.) with the STM package (Roberts et al. 2015, v. 1.3.3.). The replication files are available here: https://dataverse.harvard.edu/dataverse/sfm.
Vladimir Putin has the largest share and smallest 95% confidence interval in this topic, cf. Fig. 4 in the “Appendix” section.
Since the model measures relative topic proportions, the rather isolated position of Orbán in this regard is not a consequence of his comparatively large share of speeches in the corpus. Other speakers have similarly isolated positions despite their rather small amount of speeches (e.g. Emmanuel Macron on ‘Collective Memory’ in Fig. 5, “Appendix” section, or Kim Jong Un on ‘Juche, Military’ in Fig. 4).
The order of the speakers in these plots is based on their scores on our dictionary scale, the horizontal lines around the point estimates refer to the 95% confidence interval for the relative proportions of each speaker.
The way Orbán’s government attacks George Soros, a financier and philanthropist known for his pro-migration and liberal opinions, is a case sui generis in the European Union. Orban’s most recent election campaign has been broadly understood as an anti-Jewish and anti-Muslim manifesto, for example by Cohen (2018) in the Guardian.
For the specifics of illiberal and autocratic language styles, see Dowell et al. (2015), Windsor et al. (2015, 2017) and Maerz (2019).
Lucas et al. (2015, Appendix E) provide more details about the graph estimation procedure which we have adopted here.
Details can be found in the material available at https://dataverse.harvard.edu/dataverse/sfm.
Further information also here: http://www.mjdenny.com/getting_started_with_preText.html.

References

Adcock, R., Collier, D.: Measurement validity: a shared standard for qualitative and quantitative research. Am. Polit. Sci. Rev. 95(3), 529–546 (2001)
Article Google Scholar
Arat, Z.F.: Democracy and Human Rights in Developing Countries. Lynne Rienner, Boulder (1991)
Google Scholar
Benoit, K., Nulty, P., Obeng, A., Wang, H., Lauderdale, B., Lowe, W.: Quanteda package (2019). Retrieved from https://cran.r-project.org/web/packages/quanteda/quanteda.pdf
Bocskor, Á.: Anti-immigration discourses in Hungary during the ‘Crisis’ year: The Orbân Government’s ‘National Consultation’ Campaign of 2015. Sociology 52(3), 551–568 (2018)
Article Google Scholar
Boese, V.A.: How (not) to measure democracy. Int. Area Stud. Rev. (2019)
Bogaards, M.: De-democratization in Hungary: diffusely defective democracy. Democratization (2018). https://doi.org/10.1080/13510347.2018.1485015
Article Google Scholar
Bollen, K.A.: Issues in the comparative measurement of political democracy. Am. Sociol. Rev. 45(3), 370–390 (1980)
Article Google Scholar
Bozóki, A., Hegedűs, D.: An externally constrained hybrid regime: Hungary in the European Union. Democratization (2018). https://doi.org/10.1080/13510347.2018.1455664
Article Google Scholar
Chang, J., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. Adv. Neural Inf. Process. Syst. 22, 288–296 (2009)
Google Scholar
Cianetti, L., Dawson, J., Hanley, S.: Rethinking “democratic backsliding” in Central and Eastern Europe—looking beyond Hungary and Poland. East Eur. Polit. 34(3), 243–256 (2018). https://doi.org/10.1080/21599165.2018.1491401
Article Google Scholar
Cohen, N.: In Hungary, the exploitation of a mythical enemy is poisoning politics. The Guardian, 31 Mar. 2018. (2018) Retrieved from https://www.theguardian.com/commentisfree/2018/mar/31/in-hungary-exploitation-of-mythical-enemy-is-poisoning-politics
Coppedge, et al.: V-dem country-year dataset v8, Varieties of Democracy (V-Dem) Project (2018). https://doi.org/10.23696/vdemcy18
Dahl, R.A.: Polyarchy: participation and opposition. Yale University Press, New Haven (1971)
Google Scholar
Dahl, R.A.: Democracy and its critics. Yale University Press, New Haven (1989)
Google Scholar
Denny, M.J., Spirling, A.: Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Polit. Anal. 26, 168–189 (2018)
Article Google Scholar
de Vries, E., Schoonvelde, M., Schumacher, G.: No longer lost in translation. Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications, Political Analysis, Online First (2018). Retrieved from https://doi.org/10.1017/pan.2018.26
Diamond, L.J.: Developing Democracy—Toward Consolidation. The John Hopkins University Press, Baltimore (1999)
Google Scholar
Dowell, N.M., Windsor, L.C., Graesser, A.C.: Computational linguistics analysis of leaders during crises in authoritarian regimes. Dyn. Asymmetric Confl. 9(01–03), 1–12 (2015). https://doi.org/10.1080/17467586.2015.1038286
Article Google Scholar
Dukalskis, A.: The Authoritarian Public Sphere—Legitimation and Autocratic Power in North Korea, Burma, and China. Routledge, London (2017)
Google Scholar
Dukalskis, A., Gerschewski, J.: What autocracies say (and what citizens hear): proposing four mechanisms of autocratic legitimation. Contemp. Polit. 23, 251–268 (2017)
Article Google Scholar
Easton, D.: A Systems Analysis of Political Life. Wiley, New York (1965)
Google Scholar
Fauve, A.: Global Astana: nation branding as a legitimization tool for authoritarian regimes. Cent. Asian Surv. 34, 110–124 (2015). https://doi.org/10.1080/02634937.2015.1016799
Article Google Scholar
FreedomHouse: Freedom house on Hungary in 2019 (2019). Retrieved from https://freedomhouse.org/report/freedom-world/2019/hungary
Geddes, B., Frantz, E., Wright, J.: Autocratic breakdown and regime transitions: a new data set. Perspect. Polit. 12(02), 313–331 (2014)
Article Google Scholar
Gerschewski, J.: The three pillars of stability: legitimation, repression, and cooptation in autocratic regimes. Democratization 20(1), 13–38 (2013)
Article Google Scholar
Greene, D., O’Callaghan, D., Cunningham, P.: How many topics? Stability analysis for topic models. Lecture Notes in Computer Science, 8724 LNAI (PART 1), pp. 498–513 (2014)
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(03), 267–297 (2013)
Article Google Scholar
Hanley, S., Vachudova, M.A.: Understanding the illiberal turn: democratic backsliding in the Czech Republic*. East Eur. Polit. 34(3), 276–296 (2018). https://doi.org/10.1080/21599165.2018.1493457
Article Google Scholar
Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007). https://doi.org/10.1080/19312450709336664
Article Google Scholar
Higley, J., Burton, M.: Elite Foundations of Liberal Democracy. Rowman and Littlefield Publishers, Oxford (2006)
Google Scholar
Inglehart, R., Welzel, C.: Modernization, Cultural Change, and Democracy: The Human Development Sequence. Cambridge University Press, Cambridge (2005)
Google Scholar
Kirsch, H., Welzel, C.: Democracy misunderstood: authoritarian notions of democracy around the globe. Soc. Forces (2018). https://doi.org/10.1093/sf/soy114
Article Google Scholar
Krekó, P., Enyedi, Z.: Orbán ’ s laboratory of illiberalism. J. Democr. 29(3), 39–51 (2018)
Article Google Scholar
Langer, A.I., Sagarzazu, I.: Are all policy decisions equal? Explaining the variation in media coverage of the UK budget. Policy Stud. J. 45(2), 337–358 (2015). https://doi.org/10.1111/psj.12119
Article Google Scholar
Laver, M., Benoit, K., Garry, J.: Extracting policy positions from political texts using words as data. Am. Polit. Sci. Rev. 97(2), 311–331 (2003)
Article Google Scholar
Linz, J.J., Stepan, A.C.: Problems of Democratic Transition and Consolidation: Southern Europe, South America, and Post-communist Europe. Johns Hopkins University Press, Baltimore (1996)
Google Scholar
Lowe, W.: Yoshikoder: cross-platform multilingual content analysis. Java Software Version 0.6.5. (2015). Retrieved from http://yoshikoder.sourceforge.net/index.html
Lowe, W., Benoit, K., Mikhaylov, S., Laver, M.: Scaling policy preferences from coded political texts. Legis. Stud. Q. 36, 123–155 (2011)
Article Google Scholar
Lucas, C., Nielsen, R.A., Roberts, M.E., Stewart, B.M., Storer, A., Tingley, D.: Computer-assisted text analysis for comparative politics. Polit. Anal. 23, 254–277 (2015)
Article Google Scholar
Lührmann, A., Mechkova, V., Dahlum, S., Maxwell, L., Petrarca, C.S., Sigman, R., Staffan, I.: State of the world 2017: Autocratization and exclusion? Democratization (2018a). https://doi.org/10.1080/13510347.2018.1479693
Article Google Scholar
Lührmann, A., Tannenberg, M., Lindberg, S.I.: Regimes of the world (RoW): opening new avenues for the comparative study of political regimes. Polit. Gov. 6(1), 60 (2018b)
Google Scholar
Maerz, S.F.: Ma’naviyat in Uzbekistan: An ideological extrication from its soviet Past? J. Polit. Ideol. 23, 205–222 (2018a). https://doi.org/10.1080/13569317.2018.1419448
Article Google Scholar
Maerz, S.F.: The many faces of authoritarian persistence. A set-theory perspective on the survival strategies of authoritarian regimes. Gov. Oppos. (2018b). https://doi.org/10.7910/DVN/DZ7JLC
Article Google Scholar
Maerz, S.F.: Simulating pluralism: the language of democracy in hegemonic authoritarianism. Polit. Res. Exch. (2019). https://doi.org/10.1080/2474736X.2019.1605834
Article Google Scholar
Maerz, S.F., Puschmann, C.: Text as data and automated content analysis for conflict research: a literature survey. In: Computational Conflict Research (Springer Nature as Part of Their Computational Social Sciences Series) (forthcoming) (2019)
Makarychev, A., Yatsyk, A.: Entertain and govern: from sochi 2014 to FIFA 2018. Probl. Post Communism 65(2), 115–128 (2018)
Article Google Scholar
Megoran, N.: Framing Andijon, narrating the nation: Islam Karimov’s account of the events of 13 May 2005. Cent. Asian Surv. 27(1), 15–31 (2008). https://doi.org/10.1080/02634930802213965
Article Google Scholar
Merkel, W.: The consolidation of post-autocratic democracies: a multi-level model. Democratization 5(3), 33–67 (1998). https://doi.org/10.1080/13510349808403572
Article Google Scholar
Munzert, S., Rubba, C., Meissner, P., Nyhuis, D.: Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley, London (2015)
Google Scholar
Nelson, L.K., Burk, D., Knudsen, M., McCall, L.: The future of coding: a comparison of hand-coding and three types of computer-assisted text analysis methods. Sociol. Methods Res. (2018). https://doi.org/10.1177/0049124118769114
Article Google Scholar
Omelicheva, M.Y.: Authoritarian legitimation: assessing discourses of legitimacy in Kazakhstan and Uzbekistan. Cent. Asian Surv. 35(4), 481–500 (2016). https://doi.org/10.1080/02634937.2016.1245181
Article Google Scholar
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2019). Retrieved from http://www.r-project.org
Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S.K., Rand, D.G.: Structural topic models for open-ended survey responses. Am. J. Polit. Sci. 58(4), 1064–1082 (2014)
Article Google Scholar
Roberts, M.E., Stewart, B.M., Tingley, D., Benoit, K.: Package ‘stm’ (2015). Retrieved from https://cran.r-project.org/web/packages/stm/stm.pdf
Roberts, M.E., Stewart, B.M., Tingley, D.: stm: R package for structural topic models. J. Stat. Softw. 1, 12 (2018)
Google Scholar
Scheppele, K.L.: The rule of law and the Frankenstate: why governance checklists do not work. Governance 26(4), 559–562 (2013). https://doi.org/10.1111/gove.12049
Article Google Scholar
Schumpeter, J.A.: Capitalism, Socialism and Democracy, 1st edn. Harper Brothers, New York (1942)
Google Scholar
Stanley, B.: Confrontation by default and confrontation by design: strategic and institutional responses to Poland’s populist coalition government. Democratization 23(2), 263–282 (2016)
Article Google Scholar
Vanhanen, T.: Global trends of democratization in the 1990s: a statistical analysis, Berlin (1994)
Welzel, C.: Freedom Rising: Human Empowerment and the Quest for Emancipation. Cambridge University Press, Cambridge (2013)
Book Google Scholar
Windsor, L., Dowell, N., Graesser, A.: The language of autocrats: leaders’ language in natural disaster crises. Risk Hazards Crisis Public Policy 5(4), 446–467 (2015)
Article Google Scholar
Windsor, L., Dowell, N., Windsor, A., Kaltner, J.: Leader language and political survival strategies. Int. Interact. 44(2), 321–336 (2017). https://doi.org/10.1080/03050629.2017.1345737
Article Google Scholar
Zakaria, F.: The rise of illiberal democracy. Foreign Aff. 76(6), 22–43 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bard College, Berlin, Germany
Seraphine F. Maerz
Department of Political Science, Central European University (CEU), Budapest, Hungary
Carsten Q. Schneider

Authors

Seraphine F. Maerz
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Q. Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seraphine F. Maerz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Dictionary terms

1.2 Testing the effects of text cleaning procedures

Typically, text-as-data approaches based on the bag-of-words assumption rely on heavy preprocessing of the texts. Different options for preprocessing exist: capital letters can be converted into small letters, stopwords removed, words be lemmed or stemmed etc. Preprocessing decisions can, and often do, influence the results of text analysis. It therefore is paramount to test how robust the results obtained are against equally plausible preprocessing strategies.

Based on our theoretical conceptualization of illiberal and liberal rhetoric in the speeches of heads of government, we choose the following combination of preprocessing steps: Removing punctuation (P), removing numbers (N), put to lowercase (L), stemming (S), remove stopwords (W), and remove infrequently used terms (I). Concerning lowercase, we do not expect huge effects on the results since this is a rather basic procedure. Because all our 4740 speeches were automatically collected from the Internet by using webscraping techniques, we expect most texts to include numbers and other (foreign) signs with no relation to the original text of the speech (e.g. hyperlinks, frames in foreign languages, etc.). This is why we deemed it necessary to erase punctuation and numbers. We also removed infrequently used terms and an individually compiled list of stopwords (e.g. foreign language letters and strange words which we considered not relevant for the analysis).^{Footnote 23}

In the following, we investigate the potential of our preprocessing strategy—P–N–L–S–W–I—to significantly affect our results. In essence, the diagnostic procedure suggested by Denny and Spirling tests how much documents ‘move’ depending on the applied preprocessing features. It does so by calculating the so-called preText Score which is the result of comparing pairwise distances between documents for a number of preprocessing specifications (cf. Denny and Spirling 2018). For this, we draw a random sample of not preprocessed documents from our corpus (500, as recommended by Denny and Spirling, p. 185) and perform several diagnostics to measure the potential effects of different (combinations of) preprocessing steps. All operations are done in R with Denny and Spirling’s package preText.^{Footnote 24}

The two core functions of this package give us, first, a plot illustrated in Fig. 6. It displays the distance scores (preText Scores) for a number of combinations of preprocessing steps. Higher scores mean higher effects on results. The plot shows that our chosen combination of preprocessing features—P–N–L–S–W–I, marked with a green line—is in the medium range if compared to other preprocessing specifications. This indicates that our chosen text preprocessing can be expected to have a comparatively not so big effect on the results.

A second plot, Fig. 7, shows regression coefficients for each single text cleaning feature. Here negative coefficients indicate that a step tends to reduce the unusualness of the results, positive coefficients indicate that applying the step is likely to produce more unusual results for our corpus. The plot shows that it is particularly the feature of removing stopwords which has high potential to reduce the unusualness of our results and the removing of punctuation that has high potential to increase the unusualness of our results. Yet, as explained above, we assume that this unusualness in the not yet preprocessed documents is caused by the web-scraping procedure, thus consists of frequent foreign language letters, word fragments and punctuation which are not relevant for our analysis of illiberal and liberal language styles. That is why we can safely remove such stopwords and punctuation during preprocessing despite the expected effect.

Overall and based on these diagnostics and robustness tests we can conclude that our combined text cleaning procedures have relatively low effects on the results of our corpus’ analysis and that based on our preprocessing theory, we can accept the expectedly high effects of removing stopwords for our corpus.

1.3 Additional material for a 14-topic STM

See Figs. 8, 9 and 10.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maerz, S.F., Schneider, C.Q. Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government. Qual Quant 54, 517–545 (2020). https://doi.org/10.1007/s11135-019-00885-7

Download citation

Published: 11 May 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11135-019-00885-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Telegram channels covering Russia’s invasion of Ukraine: a comparative analysis of large multilingual corpora

The Internet in China: New Methods and Opportunities

Evolving linguistic divergence on polarizing social media

Notes

References