SSRN Id3800218

How Does Government Expenditure Impact Sustainable
Development?
Studying the Multidimensional Link between Budgets and
Development Gaps
Omar A. Guerrero1,2 and Gonzalo Castañeda3
1
Department of Economics, UCL, United Kingdom
2
The Alan Turing Institute, United Kingdom
3
Centro de Investigación y Docencia Económica (CIDE), Mexico
Abstract
We develop a bottom-up causal framework to study the impact of public spending on high-
dimensional and inter-dependent policy spaces in the context of socioeconomic and environmen-
tal development. Using data across 140 countries, we estimate the indicator-country-specific
development gaps that will remain open in 2030. We find large heterogeneity in development
gaps, and non-linear responses to changes in the total amount of government expenditure. Im-
portantly, our method identifies bounds to how much a gap can be reduced by 2030 through sheer
increments in public spending. We show that these structural bottlenecks cannot be addressed
through expenditure on the existing government programs, but require novel micro-policies in-
tended to affect behaviors, technologies, and organizational practices. One particular set of
bottlenecks that stands out relates to the environmental issues contained in the Sustainable
Development Goals 14 and 15.
1
1 Introduction
In recent years, a vast literature on the Sustainable Development Goals (SDGs) and the possibility
of reaching them by 2030 has emerged. Some of these studies analyze specific SDGs and explore
projections of indicators for different micro-policy interventions (e.g., González-Pier et al. (2016);
Porciello et al. (2020); Boeren (2019); Sobczak et al. (2021); Mensi and Udenigwe (2021)), while
other focus on identifying synergies and trade-offs between different SDGs (indicators or targets)
(e.g., Fuso Nerini et al. (2019); Lusseau and Mancini (2019); McGowan et al. (2019); Pedercini
et al. (2019); Asadikia et al. (2021). This latter approach provides a more holistic evaluation of
policy measures attempting to improve the performance of specific SDGs. A third variant of studies
explores how the nature of the relationships between SDGs has changed over time and how likely
it is that trade-offs can successfully transform into synergies in the coming years (e.g., Machingura
and Lally (2017); Fader et al. (2018); Kroll et al. (2019); Amos and Lydgate (2020); Philippidis
et al. (2020)). Finally, a fourth set of studies makes use of expert advice or indicator trends to
decipher the extent to which the SDGs might achieve the goals set for 2030 (e.g. Luken et al.
(2020); Moyer and Hedden (2020); Pradhan et al. (2021); Benedek et al. (2021); Ionescu et al.
(2020)).
Two major points stand out from this succinct overview: (1) that a systemic perspective–
emphasizing interactions among SDGs–is critical for policy evaluation; and (2) that a comprehen-
sive understanding (quantitative) of how budgetary allocations impact SDG performance is almost
entirely absent. This paper focuses on the latter point and tried to fill this knowledge gap by de-
veloping a modeling framework to study policy prioritization in the context of the SDGs. Akenroye
et al. (2018) mention the importance of addressing the problem of policy prioritization and of lever-
aging existing budget resources for meeting these goals. Such funding frameworks are necessary
to analyze pressing questions related to the effectiveness of public funding on existing government
programs, for example: Do changes in the size and distribution of the budget (on existing pro-
grams) help, effectively, to close development gaps? What are the most and least sensitive SDGs
to such budgetary rearrangements? Can the commitments to the 2030 Agenda be met when there
is enough government spending? To what extent do structural factors hinder the effectiveness of
existing programs? From the perspective of governments, understanding how their expenditure
2
actions translate, at a systemic level, into effective policies is critical to guarantee the success of
any international development agenda.
In this paper, we develop a bottom-up computational model in which public expenditure gen-
erates development advancement (with various degrees of effectiveness). A bottom-up approach
to budgetary prioritization is necessary to properly account for political-economy factors that are
present in a multidimensional and interdependent policy space (Guerrero and Castañeda, 2020a;
Castañeda et al., 2018). One of the analytic benefits of this agent-based model (ABM) is the
ability of calibration using coarse-grained data1 of individual countries without needing to pool
cross-national data.2 We exploit this feature to study the sensitivity of country-specific indicators
to changes in public expenditure.
We study the feasibility of the SDGs across 140 countries using data from the 2020 edition of
the Sustainable Development Report (SDR) (Sachs et al., 2020).3 Our three main results are the
following. First, we provide estimates of the SDG gaps that might remain by 2030 if government
programs were to be kept unaltered.4 Second, we demonstrate that the sensitivity of these gaps
vary–in diverse and non-linear ways across countries and indicators–according to the amount of per
capita government expenditure. Third, we identify the maximum reduction that can be achieved
for the SDG gaps by 2030 through sheer expenditure increments. That is to say, there are strin-
gent ‘budgetary frontiers’ that cannot be overcome without addressing long-term structural factors
(redesigning the government programs). Altogether, our results provide quantitative and theoret-
ically sound insights into what makes the SDGs unfeasible from the perspective of government
expenditure and existing development strategies.5

1
In the ABM literature, this type of indicator data are considered coarse-grained since they do not provide
disaggregated information about the individual behaviors of the agents; something typically needed to calibrate
ABMs (e.g., microdata or administrative records).
2
While some studies use non-pooled country-level data to describe the structure of trade-offs and synergies (see
Pradhan et al. (2017) and references), the capacity to produce quantitative prospective analysis may be limited
because proper statistical power can only be achieved–under traditional statistical tools–with a large number of
observations (which can only be obtained by pooling cross-country data).
3
The SDR is produced by the Sustainable Development Solutions Network and the Bertelsmann Stiftung.
4
A government program is the set of policies that a government has in place to affect a specific development issue.
Funding or defunding these programs is a short/medium-term decision, while redesigning them is a long-term one
(which needs to address structural factors). Our model focuses on the former–short/medium-term decisions–so it is
assumed that the specific policies in place remain unchanged.
5
Although the third type of literature mentioned above argues for the need for structural changes to achieve SDGs
and and to break away from trade-offs, the meaning of structural bottleneck remains broad and often ambiguous in
terms of policy instruments. Our paper sheds new light by introducing a more nuanced concept of bottlenecks, one
with a direct link to government programs that can be directly affected through budgetary readjustments.
3
We structure the remainder of the paper in five more sections and a an appendix. In Section 2,
we present the methods employed: model description, network estimation, calibration procedure,
goodness of fit, and definition of SDG gaps. In section 3, we describe the sources of our database,
which includes time series for development indicators and government expenditure, and explain
how we geographically cluster the information for producing visualizations. Then, in section 4,
we show different figures describing the main results from our simulations. Section 5 compares
alternative methodologies (data-fitting and aggregated models) with our bottom-up computational
approach in the context of systemic policy analysis and budgetary allocations. Finally, in section
6, we finalize the paper with a brief summary of the model’s purpose and assumptions, and with a
suggestion on how to use the simulation results for country-specific policy guidelines.
2 Methods
Essentially, the proposed model is designed to study how different budgetary allocations affect the–
simultaneous and interdependent–evolution of a large set of development indicators. The model
takes as inputs a vector with initial conditions for the indicators, a network with their interdepen-
dencies, a budget size, the fraction of positive changes in the indicators (as a measure of variation),
and the final values they achieved in the last period of the sample. With this information, the
parameters are calibrated to (1) match the simulated and empirical indicators in their final ob-
servations, and (2) match the fraction of positive changes. Due to the interdependent dynamics
produced by the model, calibrating its parameters is not trivial. Nonetheless, we devise an efficient
method that yields a goodness of fit above 90% for most countries. Our model is a variant of
Castañeda et al. (2018); Guerrero and Castañeda (2020a), with the improvement of accounting
for the size of a government’s budget. Similar models have been successfully applied to study
ex-ante policy evaluation (Castañeda and Guerrero, 2019a), policy resilience (Castañeda and Guer-
rero, 2018), policy coherence (Guerrero and Castañeda, 2020b), public governance (Guerrero and
Castañeda, 2021), and sub-national development (Guerrero et al., 2021). Some of them have also
been used to in the provision of policy advice (Castañeda and Guerrero, 2019b,d,c; Sulmont et al.,
2021; Gobierno del Estado de México, 2020). While the full details of the model are provided in
Appendix A, here we provide an explain the mechanisms that are most salient for this study and
4
elaborate on the new calibration procedure.
2.1 Model description
The model consists of an agent representing the government or central authority in charge of
deciding how to spend a budget of size B. There are N policy issues, each one with a level of
development measured by an indicator. From these policy issues, n ≤ N can be directly impacted
through existing government programs, and we assume that there is one program for each one of
them. We call these types of policy issues instrumental, while the remaining N − n are considered
collateral. An issue may be collateral because there does not exist a policy instrument to directly
intervene it, and this may occur because the issue is too aggregate (e.g., GDP growth).6
In addition to the government agent, there are n policymaking agents (functionaries), one in
charge of each instrumental indicator7 . Thus, the problem of the central authority is to allocate
B resources across n policymakers in order to improve the N indicators. Policymakers, however,
may have different goals from those of the central authority or may just be inefficient. Therefore,
some of the allocated resources might end up diverted or wasted. Let us denote the allocation to
instrumental policy issue i as Pi , and the amount of resources that the policymaker uses effectively
as Ci ; we say that the latter is the contribution of the policymaker.
In an iterative process, the government agent reallocates it resources, prioritizing the most
laggard8 and the most efficient policy issues. In parallel, the policymakers try to maximize their
benefit by determining a level of Ci that shows proficiency to the central authority (for political
reputation) but that also benefits them through the wasted resources Pi − Ci . The determination
of Ci happens through a behavioral model of reinforcement learning (which has extensive empirical
validation), subjected to the monitoring of the government and to the corresponding penalties in
case it spots inefficiencies.
The quality of the procurement mechanisms aimed at minimizing inefficiencies vary across
countries according to empirical data on public governance, which we use as an input. With each
6
Other reasons include lack of capacity (e.g., cybersecurity), advanced level of development (e.g., extreme poverty
in some advanced economies), or lack of awareness (e.g., pollution and over-exploitation of natural resources in several
poor countries).
7
The model is flexible to accommodate multiple agents per indicator or indicators per agent. This, however,
requires detailed contextual information that we leave for country-specific studies.
8
Prioritizing laggard issues has been a promoted practice since the Millennium Development Project under the
assumption that laggard indicators reveal potential bottlenecks.
5
step, the contributions and the total incoming spillovers Si (which could be positive or negative)
determine the success probability γi of the policy aimed at issue i. If the policy succeeds, the amount
of improvement reflected in the indicator is proportional to the existing long-term structural factors,
which we capture explicitly in a parameter αi . Altogether, the model runs for T periods that can
be mapped into calendar time. Parameter B corresponds to the empirical budget that a given
country spent during the sampling period. Thus, in the calibration, the budget runs out after T
simulation periods, reflecting different spending capabilities across countries, and enabling the test
of potential effects from budgetary increments and reductions. We perform Monte Carlo simulations
to generate stable measures of the indicators and other variables of interest. The reader should
be aware that the model is calibrated and implemented for each country independently, so this
approach overcomes concerns about biases from grouping countries or indicators.
In the interest of clarity and space, we summarize the model in algorithm 1 and Figure 1. In
this section, we focus on the two equations that drive the dynamics of the indicators, and provide
the details of the remaining equations in Appendix A. These equations connect the outcomes of
the behavioral components with the spillover effects shaped by the network of interdependencies,
and establish a clear differentiation between short/mid-term and long-term dynamics.
Algorithm 1: Model pseudocode

1 foreach period t do
2 foreach public servant i do
3 receive public funds Pi,t ;
4 evaluate the benefits from the previous contribution Ci,t−1 ;
5 establish new contribution level Ci,t ;
6 foreach indicator i do
7 if the indicator is instrumental then
8 implement public policy using the resources Ci,t ;
9 receive the incoming spillovers Si,t ;
10 determine the probability of success γi,t according to Ci,t and Si,t ;
11 if the public policy is successful (with probability γi,t ) then
12 improve the indicator according to the long-term structural factors αi ;
13 the government monitors the policymakers through imperfect mechanisms;

14 the government penalizes those who are found being inefficient;
15 the policymakers receive the benefit from their chosen contributions;
16 the government updates the allocation profile P1,t , . . . , Pn,t ;
6
Figure 1: Structure of the model
INTERVENTIONS POLITICAL ECONOMY OUTCOMES
MACRO LEVEL
Institutional reforms: Development-indicator

- Monitoring of corruption spillovers dynamics
- Strength of the rule of law conditional dependencies network (empirically observable)
collateral instrumental
nodes nodes
gaps &
Government actions: signals
Policy priorities
- Development goals
(empirically unobservable)
- Fiscal rigidity
MICRO LEVEL functionaries’

contributions
Structural reforms:
Sector-level ineﬃciencies
- Network interventions
central ineﬃciency public (empirically unobservable)
- Growth factor change
authority social norm servants
adaptation learning
(resource allocation) (increasing benefits)
Notes: The left panel shows examples of policy interventions that could be implemented by manipulating some
of the model exogenous variables. All the interventions take place at the micro-level and exert a direct impact on
budgetary decisions. The panel at the center shows that the model establishes linkages between the micro and the
macro. At the micro-level, the central authority allocates budgetary resources, while policymakers implement the
government programs. At the macro level, the network of interdependencies produces spillover effects that condition
the evolution of the development indicators. In the upward causation component (right-vertical arrow), functionaries
make an effective use of some of the resources that they receive from the central government. In the downward
causation (left-vertical arrow), the overall dynamic produces reductions in the development gaps of the 2030 Agenda.
This channel also transmits signals reflecting certain misuse of resources, which causes the government to penalize
inefficient functionaries and reallocate resources. Moreover, the three circling arrows in the middle of the bottom
panel describe a horizontal causation mechanism responsible for the social norms of inefficiency guiding functionaries’
behavior. Finally, the left panel presents some of the outcomes that can be obtained from the model: the evolution
of the indicators, policy priorities, (allocation profiles), and sectoral inefficiencies.
Sources: Subsection 3.1 of Guerrero and Castañeda (2020a).
Now, let us define the evolution of indicator i as
Ii,t+1 = Ii,t + αi ξ(γi,t ), (1)
where parameter αi > 0 captures long-term structural factors.9 Parameter αi imposes a limit to
the growth that could be achieved in the short-term through sheer spending. For instance, let
9
Note that, if the indicator exceeds its theoretical maximum (if provided by the user), the model will assign zero
growth.
7
ξ(γi,t ) in equation 1 denote the outcome of a Bernoulli trial that can take values 1 (successful) or
0 (unsuccessful). This means that, if a positive event materializes, the indicator grows according
to αi . As previously mentioned, the probability of a successful trial is γi,t . Note that γi,t is an
endogenous variable of the model, so we proceed to explain how it is formed.
Recall that the total budget size across periods is B. This stock can be turned into flows
PT
by defining a disbursement schedule B1 , . . . , BT , such that t Bt = B. For simplicity, let us
assume that the disbursement schedule is homogeneous, so Bt = B ∀ t. Next, consider the
allocation profile P1,t , . . . , Pn,t that the central authority defines in period t. Under the homogeneous
disbursement schedule assumption, ni Pi,t = B holds, so the contributions of the policymakers are
P
in the same units as the budget. In order to map Ci,t into the success probability γi,t , we define
1 P
Ci,t + n j Cj,t
γi,t = β , (2)
1 + e−Si,t
where β is a normalizing constant10 and Si,t is the total amount of spillovers received by indicator
i in period t (this could be positive or negative).11 The spillovers are computed every simulation
P
period according to Si,t = j 1j,t Aj,i , where 1 is an indicator function that returns 1 if indicator j
grew in the previous period and 0 otherwise. The adjacency matrix A corresponds to the empirical
network of interlinkages, with each entry representing a conditional dependence from indicator j
to i. Importantly, these conditional dependencies do not represent causal links, but rather an em-
pirical regularity that the model takes into account (see Ospina-Forero et al. (2020) for a detailed
discussion on estimating SDG networks and the impossibility of interpreting them as causal net-
works). While the structure of the network represented by A is considered a long-term feature, the
actual realization of the spillovers is a short/mid-term phenomenon because it is the result of the
dynamics of the other indicators in the previous period.
Equation 2 represents the short/mid-term component of the model, while parameter αi from
equation 1 captures the long-term factors limiting the impact of public expenditure on the indi-
10
Importantly, if expenditure data at the level of each indicator were available, it could be used as an input for Pi ,
in which case βi could be indicator-specific and more intuitive in terms of returns to expenditure in specific policy
issues. Hence, while we use aggregate expenditure data in this paper, the model is flexible to allow various types of
disaggregated data.
11
The term Ci,t accounts for the expenditure contribution to an instrumental policy issue. For P a collateral issue,
Ci,t equals zero, so its success depends on the overall ‘financial health’ of the government n1 j Cj,t , and on the
spillovers Si,t . Therefore, we assume that public funding is a necessary but not sufficient condition for development.
8
cators. For example, a government may increase the funds allocated to train quantum-computing
engineers with the aim of strengthening this strategic area. While the number of engineers in this
field may indeed increase due to the availability of scholarships, they may leave for another country
or end up in unrelated jobs due to a lack of employment opportunities in the domestic labor market.
A labor-market-related structural factor, the demand for quantum-computing engineers, limits the
speed with which this sector can develop; such speed will be reflected in modest improvements of
the relevant indicators. Naturally, a structural reform could be seen as a change in αi , but its
interpretation proves difficult due to the multiple variables that are absorbed in this parameter;
this is a challenge that we leave for future work. Nevertheless, αi is informative about the limits
of sheer spending at the level of each indicator, something lacking in all other approaches. For
this reason, the model is consistent with the idea of analyzing budgetary changes over existing
government programs.
2.2 Networks
As we have previously explained, the structure of the interdependencies between indicators is
assumed to be a long-term feature, so the networks are exogenous inputs. As such, adjacency
matrices can be built for each country by following any preferred criteria. A popular approach
among development scholars is the qualitative approach of eliciting expert opinions. Unfortunately,
this strategy is not scalable for a large set of countries and indicators (and is difficult to use
in the case governments have severe time constraints). Ospina-Forero et al. (2020) provide a
comprehensive review of quantitative methods that may be suitable for estimating SDG networks.12
With this information in hand, our method of choice is the Bayesian approach of Sparse Gaussian
Bayesian Networks developed by Aragam et al. (2019) (and known as sparsebn). This procedure
has the distinctive advantages of working well with high-dimensional datasets, even if they have
short series, and producing adjacency matrices that try to minimize the number of links that may
be false positives (hence the “sparse” term in the name).13
Recall that the resulting networks should not be interpreted as causal relations, but as con-
ditional probabilities, which means that a link A → B does not imply that ∆A guarantees ∆B.
12
For instance, correlation thresholding is one of the methods commonly used in the literature (e.g., Warchold et al.
(2021); Putra et al. (2020)).
13
For more details on the network and its estimation procedure see the Appendix D.
9
This is the reason why spillovers affect the probability of success γi , and not the magnitude of the
outcome. Of course, like with any statistical method, sparsebn makes certain assumptions such
as a linear Gaussian structural equation model and no temporal dependence between observations.
The former is a standard assumption in causal Bayesian models. Temporal dependencies can be
partially removed by computing first differences of the series. Overall, we consider that these as-
sumptions are more reasonable than those made by alternative methods, and further arguments
are provided by (Ospina-Forero et al., 2020). Finally, the networks are estimated for each country
individually, an important improvement over the existing literature on SDG synergies and trade-offs
which tends to use pooled data.14
2.3 Calibration
The aim of the calibration method is two-fold, to assure (1) that the simulated dynamics of the
indicators start and end at the empirical levels, and (2) that the model’s average success probability
corresponds to the empirical fraction of positive first differences of the indicators.15 To achieve this,
we need to find the parameter vector α1 , . . . , αN , β that minimizes an error measure.
There are two features that characterize this calibration problem. First, the dynamics of the
indicators are interdependent. This means that if αi changes, the ‘speed’ of another indicator j may
be altered as well. Furthermore, these interdependencies are not obvious enough so that the model
can be written as a system of equations to be simultaneously solved (as one may think by looking
at equation 1). For instance, the fact that γi is endogenous renders homogeneous Markov chains
ineffective. The second feature is the computational cost of each evaluation. Since each simulation
may yield a different trajectory for the same indicator, stable metrics have to be obtained from
Monte Carlo simulations. This means that evaluating a given set of parameters involves several
independent runs.16
14
Naturally, the network plays a role in the model, so different topologies may influence some of the model’s
variables. In fact, Castañeda et al. (2018); Guerrero and Castañeda (2020a) show that removing the spillovers
alters the incentive structure of the policymaking agents, resulting lower variation of inefficiency across policy issues.
Nevertheless, for the variables of interest of this study (the SDG gaps) we find that our results are robust to different
networks. Appendix I provides detailed evidence.
15
Appendix E discusses how to deal with indicators that show final values that are lower than their initial conditions.
16
Heuristic optimization algorithms that can handle dynamic landscapes, such as simulated annealing and particle
swarm fail, arguably due to the sensitivity of the landscape and to the cost of each evaluation. Evolutionary approaches
such as differential evolution have also been ineffective due to similar reasons. Finally, Bayesian methods, such as the
Tree-structured Parzen Estimator, which perform well with expensive-evaluation models, do not work in this context
due to the high dimensionality of the solution space (and the sensitivity of the fitness landscape).
10
We develop a multi-objective gradient descent method that exploits the fact that each parameter
can be associated to a specific error. Let us define an indicator-specific error as eαi = Ii,−1 − I¯i,T ,
where Ii,−1 is the final empirical value of indicator i, and I¯i,T is the average final simulated value
of the same indicator across M independent Monte Carlo simulations. The corresponding error for
P
i,t,m γi,t,m
β is eβ = Γ − M ×T ×N ,
where Γ is the fraction of positive first differences across all indicators.
P
N
The calibration algorithm tries to minimize the average absolute error e = N 1+1 i |eαi | + |eβ | .
To minimize the error, first, we start with a proposed vector α1 , . . . , αN , β. Next, we perform a
set of M Monte Carlo simulations and compute the error vector eα1 , . . . , eαN , eβ . For each indicator
i, if eαi < 0 (meaning that the indicator grew too fast), then we multiply αi by a factor 1 − δαi .
If eαi > 0 (the indicator was too slow), then we multiply αi by 1 + δαi . The same logic applies to
β, which has a corresponding factor δβ . Ideally, we want that the mean error converges to zero as
we search the parameter space. We can generate such behavior by setting factors δα1 , . . . , δαN , δβ
that change in proportion to the errors. As it turns out, a factor that achieves this for indicator
i is δαi = |eαi |/(Ii,−1 − Ii,0 ), where Ii,0 and Ii,−1 are the empirical initial and final values of the
indicator, while δβ = |eβ | for β. Our simulations suggest zero-error convergence for a large enough
M .17 Thus, it can be run for several iterations until a certain threshold for the average error is
achieved. The calibration procedure for the model parameters is described in algorithm 2. As
the reader will notice, we bound the step factor (1 ± δ) by 1/2 or 3/2 as we have found that this
accelerates the convergence rate significantly.
2.3.1 Goodness of fit
For a single indicator i, the goodness of fit of its corresponding parameter αi is
eαi
GoFαi = 1 − , (3)
Ii,−1 − Ii,0
which takes values in the interval (−η, 1], where η is the lower bound induced by the theoretical
maximum of the indicator. If no theoretical maximum exists, then the lower bound is − inf.
The basic idea behind GoFαi is that, in a good fit, the error eαi should represent a small fraction
of the historical gap that needs to be closed in a simulation (Ii,−1 −Ii,0 ). Errors where the simulated
17
The resulting parameter vector is robust across different calibrations using random initial parameters.
11
Algorithm 2: Calibration pseudocode
1 initialize vector α1 , . . . , αN , β with random values;
2 for an error tolerance threshold do
3 run M Monte Carlo simulations;
4 compute the error vector eα1 , . . . , eαN , eβ ;
5 foreach indicator i do
6 if eαi < 0 then
7 update αi to αi × max(1 − |δαi |, 1/2);
8 else
9 update αi to αi × min(1 + |δαi |, 3/2);
10 if eβ < 0 then
11 update β to β × max(1 − |δβ |, 1/2);
12 else
13 update β to β × min(1 + |δβ |, 3/2);
average indicator ends below the empirical value are bound by Ii,0 because the model only allows
non-negative growth. However, an error where the simulated average indicator ends above the
empirical value may represent multiple times the size of the historical gap. Therefore, this metric
not only takes into account accuracy with respect to the final value, but it also penalizes extreme
errors with negative contributions when computing the mean goodness-of-fit across all indicators.
Importantly, when testing alternative calibration methods, several indicators display a negative
GoFαi . This is not the case for our algorithm.
The metric for the goodness of fit of parameter β follows the same logic, but the target feature
is the rate of positive first-differences. Formally, the goodness of fit of β is
eβ
GoFβ = 1 − , (4)
Γ
where Γ is the number of positive first differences in the empirical data as a fraction of all first
differences.
The overall goodness of fit for a country is the average
N
!
1 X
GoF = GoFαi + GoFβ . (5)
N +1
i
12
Figure 2 shows the distribution of the GoF after calibrating the model.18 More detailed results
on the goodness of fit are provided in Appendix F. Notice that, when performing this calibration
procedure, we obtain a remarkable goodness of fit at the country level. Furthermore, the large
majority of the parameters αi exhibit a fitting above 0.9, while this is always the case for β.
Figure 2: Distribution of goodness of fit metrics
(a) Country-level GoF (b) Indicator-level GoF (c) GoFβ
14
40 103
12
30 10
frequency
frequency
frequency
102
8
20 6
101
4
10
2
100
0 0
0.94 0.95 0.96 0.97 0.98 0.99 0.5 0.6 0.7 0.8 0.9 1.0 0.994 0.995 0.996 0.997 0.998 0.999 1.000
goodness of fit goodness of fit goodness of fit
Sources: Authors’ own calculations.
2.4 Definition of SDG gaps
The main estimates of the paper are the gaps or the distances between development goals and the
levels predicted for the indicators in 2030. If a prediction surpasses its goal, then we say that the
gap has been closed. Formally, an SDG gap is

Gi −I¯i,T
Gi ≥ I¯i,T

100 ×

Gi
gapi = , (6)
Gi < I¯i,T

0

where G1 , . . . , GN are the development goals obtained from the SDR, and I¯i,T is the expected value
of indicator i–across M independent Monte Carlo simulations–after T simulation periods that are
equivalent to 10 years. The underlying yearly budget for the 2021-30 period is assumed identical, in
per capita terms, to the (annual) average expenditure observed in the 21 years of data. We express
the gaps as a proportion of their goals and in percentage terms. Thus, we can read an SDG gap
18
The choice of the number of simulation periods T does not alter the results significantly because the calibration
of β compensates for a higher or lower frequency of the disbursement schedule. Appendix H provides evidence of
robustness under different disbursement schedules.
13
as: “by 2030, indicator i will still need to close x% of its goal ”.19
3 Data
There exist different databases from which one could obtain indicators classified into the SDGs, for
example, the SDG Indicators from United Nations Statistics Division (United Nations, 2020), the
World Bank Atlas of Sustainable Development Goals (World Bank, 2020), the OECD SDG distance
indicators (OECD, 2020), and the indicators compiled by the Bertelsmann Stiftung and Sustainable
Development Solutions Network to produce the Sustainable Development Report (SDR) (Sachs
et al., 2020). In this study, we use the SDR database for three main reasons.20 First, the SDR is
the only dataset that provides quantitative values for the goals to be achieved by each indicator.
Furthermore these goals are consistent across all the countries in the sample because the chosen
indicators are applicable to each nation. Since the aim of this paper is to assess the feasibility of
reaching the SDGs, having quantitative goals is necessary. Second, the SDR data have consistently
longer time series than alternative databases. This is helpful for the calibration of the model because
the estimation of the structural factors α1 , . . . , αN assumes that they capture long-term features of
the data. For a sub-sample of 140 countries, the SDR provides time series with a length of almost
21 years (from 2000 to 2020) in numerous indicators. Alternative datasets, while they contain more
indicators, fail to provide consistently long time series. Third, the majority of the data sources for
the SDR indicators are recognized international (and intergovernmental) organizations, while the
rest are scientifically sound products such as surveys from statistics bureaus, NGOs, and academic
institutions.
While the SDR team makes a substantial effort in gathering as much data as possible for each
country, there are countries that lack some of the indicators, or that have too few observations.
For this reason, different countries in our sample may have more or fewer indicators than others.
This is problematic for all studies that pool cross-national data, since decisions have to be made
regarding the imputation of missing observation, or the complete elimination of certain indicators.
19
Appendix C reports confidence intervals and provides a method to incorporate uncertainty about the quality of
the data into the intervals when information on the indicators’ errors is available.
20
A caveat of the chosen data is that they do not contain time series for SDG 12. The relevant indicators in SDG
12 relate to issues such as waste management, which have just recently been quantified in a handful of countries.
However, this is also an issue in most datasets.
14
Our approach overcomes this problem because we do not need to produce estimates on pooled
data. Thus, we allow each country to have its potentially unique set of indicators and perform
the estimations independently of other nations.21 This allows capturing as many policy dimensions
as possible for each country, which is consistent with the philosophy behind multidimensional
development. While having unbalanced panels is still not the ideal setup for ex-post cross-country
comparisons, we believe that this framework is still able to overcome some of the main hurdles
of data-fitting approaches. Appendix B provides detailed information on the 77 indicators of our
sample, and their distribution across countries.
For the purpose of visualizing some of our results, we may color or aggregate them into country
clusters. We should emphasize, this is only for visualization purposes. For these country clusters,
we use the following grouping scheme: Sub-Saharan Africa (Africa), Eastern Europe and Central
Asia (E. Europe & C. Asia), East and South Asia (East & South Asia), Latin America and the
Caribbean (LAC ), Middle East and North Africa (MENA), and Western Countries (West). Figure
3 provides a map of the countries covered in our sample.
For the national budgets, we use data on total government expenditure in current USD (which
can be accessed through data.worldbank.org/indicator/NE.CON.GOVT.KD). This information
is obtained from the dataset on General Government Final Consumption Expenditure which, in
turn, sources the data from the World Bank National Accounts Data and the OECD National
Accounts data files. We compute the total expenditure exercised by each country in the 2000-2020
period. Missing values are imputed with the average yearly expenditure, and the final amount
is transformed into per capita expenditure in order to remove population-size effects (we use the
population size reported by the SDR).
21
In Appendix D.3, we present a methodology for the imputation of missing observations that works very well when
indicators exhibit non-linear dynamics and the network is estimated with pooled country data. This is one of the
several methods available for the imputation of missing information in the SDGs. For instance, Gaussian processes
are reliable for non-linear dynamics when a database only includes time-series for one country (or region), while in
cross-section analyses heuristic approaches are more common (e.g., (Warchold et al., 2021)).
15
Figure 3: Countries and their clusters
Blue: Africa. Orange: E. Europe & C. Asia. Green: East & South Asia. Red: LAC. Purple: MENA. Brown: West.
Countries in gray were excluded from the sample due to lack of data.
4 Results
Because we only have aggregate yearly government expenditure for making worldwide comparisons,
we limit our analysis to three types of simulation exercises. Firstly, we study whether SDG gaps of
the 2030 United Nations Agenda can be closed assuming a benchmark scenario in which we project
the historical yearly average of public expenditure for the following ten years. Secondly, we analyze
the sensitivity of these gaps to different increments/decrements of the budget size. We also visualize
the response function of budgetary changes in terms of delays (or savings) in the number of years
to reach the 2030 levels obtained in the benchmark scenario. Thirdly, we study the magnitude of
structural bottlenecks that hamper the possibility of improving the indicators’ performances by just
increasing the allocated funds. These bottlenecks are made evident when, by construction, limited
and inefficient funding are ruled out in a counter-factual simulation. Although these exercises are
produced at the country level using all available indicators separately, for exposition reasons we
present several visualizations at the SDG or geographical cluster level in the main body of the
paper.
The reader should be aware that our methodology can deal with country-specific features such
16
as the following: the network of interdependencies between indicators; the historical context re-
flected in the database and considered for calibration purposes; the indicators’ initial conditions
for prospective analyses; the setting of the 2030 goals attending to the countries’ idiosyncrasies
and political systems. Country-specific estimations are key when using the model for providing
policy guidelines, however, technically this is not always possible with other methodologies. For
example, in regression analyses using aggregate data, information from different countries has to
be pooled to obtain enough degrees of freedom. The latter approach precludes the possibility of
making inferences for particular countries and, thus, the estimates have limitations in terms of
policy advice.
4.1 SDG gaps
We present our estimates of SDG gaps for 2030 at the level of each country in Figure 4. The bars
indicate average levels across indicators, and the colored dots correspond to the 10 indicators with
the largest estimated gaps. The latter exemplifies the gap disparities that exist within each country.
As expected, the advanced market economies of the West exhibit gaps that are substantially lower
that those estimated for the least developed countries (like those in Africa). However, there are
also relatively successful countries in other regions of the world, such as Cyprus (CYP) and Croatia
(HRV) in E. Europe & C. Asia; Japan (JPN), South Korea (KOR), and Singapore (SGP) in East
and South Asia; and the United Arab Emirates (ARE) in MENA. In contrast, the least successful
countries are Haiti (HTI) in LAC ; and the Central African Republic (CAF), Eritrea (ERI), and
Chad (TCD) in Africa.
At a more aggregate level, we can observe gap disparities across clusters and across SDGs.
For example, while no country in Africa has an average gap below 18%, all the countries in the
West have an average gap below 12%. The systematic persistence of certain dot colors (such as
orange, corresponding to SDG 9) suggest that, in some SDGs, it is more difficult to close the gaps.
Some of the most persistent SDGs across the dot markers are SDG 9–‘Industry, Innovation and
Infrastructure’–and SDG 7–‘Affordable and Clean Energy’. Such pattern is especially visible in
Africa.
Figure 5 provides a complementary visualization of the SDG gaps. Here, the gaps of the
indicators have been averaged across countries in the same cluster. These plots reveal that only
17
one indicator of SDG 7 is persistently close to the 100% gap in Africa, and that several indicators
of SDG 9 exhibit high gaps. Another feature revealed by this visualization is that most of the
environmental indicators in SDGs 14 and 15 present gaps above the cluster average (identified with
the solid black ring). The reader should be aware of the risks of aggregation, which are evident
when comparing the gaps estimated for the indicators in SDG 2 in Africa and West. Here, Africa
is expected to perform better than West in obesity, nitrogen emission, and human trophic levels
(which relates to dietary diversity). These problems are endemic to advanced market economies,
so our results are intuitive. However, if we were to aggregate these gaps for the whole SDG 2,
the result would suggest a similar performance between both clusters since the indicators related
to hunger and malnutrition show the opposite performance (so their gaps would cancel out each
other, at least approximately). Clearly, even with a multidimensional view of development, there
exist specific policy issues that perform in substantially different ways across countries, even if they
belong to the same dimension. This is one of the reasons why it is so important to move beyond
the common practice of pooling cross-national and SDG-level data, and to produce more granular
estimates that reflect the context and the spending capabilities of each country in each indicator.
18
Figure 4: Average SDG gaps for 2030 by country
Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
development gap estimated for 2030 as a percentage of its goal
Notes: The bars denote the average SDG gap for 2030 (across indicators) for each individual country. Bars are
colored according to the country clusters described in Figure 3. The dots correspond to the 10 indicators with the
largest estimated gaps. Each dot is colored according to the corresponding SDG of its indicator. We use the model
to estimate the indicators’ projections for 2030. For precise estimates and confidence intervals of each individual
indicator gap, see Appendix C.
19
Figure 5: SDG gaps for 2030 aggregated by cluster and indicator
(a) Africa (b) E. Europe & C. Asia (c) East & South Asia
sdg17_govrev
sdg1_oecdpov
sdg1_oecdpov
sdg17_govrev
sdg17_govrev
sdg1_320pov
sdg17_govefe
sdg17_govefe
sdg17_govefe
sdg2 besity
sdg2_s besity
sdg2_s besity
sdg dg16_ _redli pta
sdg2_o d

sdg2_o d
sdg2_o d
sdg16_ _rsf
sdg16_ _rsf
sdg16_ _rsf
sdg2_crlyl
sdg2_crlyl
sdg2_crlyl
sdg g3_b wastin sh
sdg1 omicideain
sdg1 icide
sd g2_un hic
sdg g2_w dernsh
sdg g3_b wastin sh

sd g2_un hic
sd g2_un hic
sdg2 _snmi
sdg nmi
sdg nmi
16_
16_
16_
sdg16
sd g2_ dern
sdg16rison
sdg16rison
sd g2_ dern
sd _trop
sd 2_trop
sd 2_trop
3_f irth g
s 15 _c
3_f irth g
3_f irth g
s 15 _c
s 15 _c
hom detain st
6_p
6_p
h
hom detain st
sdg 3_b astin
sa
sdg dg15
sa
sa
sdg dg15
sdg dg15
ert s
ert s
ert s
x
icid
x
y
y
ilit
sd
d
s
ilit
ilit
sd sd
s
et st
es
sd g15 iv sd g15 iv eg sd g15 iv eg
s
sdg sdg sdg
14_ g14_ _cpf 3_h 14_ g14_ _cpf 3_h com 14_ g14_ _cpf 3_h com
f t a
sdg ishsto rawl sdg fee rt f t a
sdg ishsto rawl sdg g3_in lifee ort f t a
sdg ishsto rawl sdg g3_in lifee ort
sdg1 14_c cks 3_li tmo sdg1 14_c cks sd g3_ atm sdg1 14_c cks sd g3_ atm
4_cle pma sdg g3_ma s 4_cle pm d
s g3_m s 4_cle pm d
s g3_m s
sdg1 anwat sd 3_ncd sdg1 anwat a sd 3_ncd sdg1 anwat a sd 3_ncd
3
sdg13_ _co2pc sdg sdg13_ 3_co2pc sdg 3_smoke sdg13_ 3_co2pc sdg 3_smoke
co2i wb co2i sdg _swb co2i sdg _swb
sdg11_tra mport sdg3_s sdg11_tra mport sdg3 sdg11_tra mport sdg3
nsport sdg3_tb nsport sdg3_tb nsport sdg3_tb
sdg3_traffic sdg3_traffic sdg11_rentover sdg3_traffic
sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort
sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc
sdg3_vac sdg3_vac sdg3_vac
sdg10_palmar sdg4_ear sdg10_palmar sdg4_ear
4 de de
sdg10_eldres 4 sdg4 lyedu sdg10_eldres 4 sdg4 lyedu
x 6 sdg sdg9_r _rdex 6 sdg _pisa sdg9_r _rdex 6 sdg _pisa
_rde sdg4 4_prima sd 4_prim sd 4_prim
sdg9 12 _sec ry sdg9 tents 12 sdg g4_seco ary sdg9 tents 12 sdg g4_seco ary
ond _pa _pa acc sd 4_s
se sdg9 se sd 4_s
sdg g4_te ocioe
nd sdg9 9_net use sdg g4_te ocioe
nd
obu lpi 30 sdg obu lpi 30 5_e rtia c sdg _mob 9_lpi 30 5_e rtia c
9_m dg9_ 5_e 9_m g9_ da ry 9 g da ry
sdg da sdg sd sdg sd
rtic se
sdg dg5_lf parl
sdg g8_yn icles e
sdg dg5_lf parl p

sdg g8_yn icles e
sdg 5_lf arl

s
sd _art tus
sd _art tus
t t t
9_a intu
les
s g5_
s 5_ yga
sdg g5_p aygap
5_f pr
5_f pr
5_f pr
9 in
9 in
100
sd
sdg8 _unemeet
sdg g5_pa san

sdg8 _unemeet
sd g5_p san
sdg dg9_
sdg dg9_
sdg dg9_
100 100
am
am
am
_imp p
_imp p
sd 6_safe at
_imp p
sd 6_safe
sdg8 _unem
acc
acc
sdg 6_safew
acc
sdg
ilyp
ilyp
ilyp
s
s
s
sdg anita
sdg6_s carcew
sdg6_s carcew
s
sdg7_rens
sdg6_s carcew
sdg7_rens
sdg6_s
sdg6_s
sdg6_s
l
l
ccount
ccount
ccount
sdg7_elecac
sdg6_water
sdg7_elecac
sdg6_water
sdg7_elecac
sdg6_water
sdg7_co2twh
sdg7_cleanfuel
sdg7_co2twh
sdg7_cleanfuel
sdg7_co2twh
sdg7_cleanfuel
8
8
8
sdg
anita
anita
sdg8_a
sdg8_a
sdg8_a
(d) LAC (e) MENA (f) West

sdg1_oecdpov
sdg17_govrev
sdg1_oecdpov
sdg17_govrev
sdg1_320pov
sdg17_gove
sdg17_govefe
sdg17_govefe
sdg2 besity
sdg2_s besity
sdg2_s besity
sdg2_o d
sdg dg16 _redli pta

sdg2_o d
sdg2_o d
sdg16_ _rsf
sdg16_ _rsf
sdg2_crlyl
sdg16_ _rsf
sdg2_crlyl
sdg2_crlyl
sdg g3_b wastin sh
sdg1 icide
sdg g2_w dernsh
sd g2_un hic
sdg1 icide
sdg g3_b wastin sh

sd g2_un hic
sd g2_u hic
sdg2 _snmi
sdg nmi
sdg nmi
16_
16_
16_ _de st
sdg16
sd g2_ dern
sdg16rison
sdg16rison
sd g2_ ndern
sd _trop
sd 2_trop
sd 2_trop
3_f irth g
s 15 _c
3_f irth g
3_f irth g
s 15 _c
s 15 _c
hom etain st
6_p
6_p
hom detain st
hom tain
sdg 3_b astin
safe
sdg dg15
sa
sa
sdg dg15
sdg dg15
ert s
ert s
ert s
x
icid
x
x
y
y
y
d
ilit
sd
s
ilit
sd
ilit
sd
s
s
es
sdg sdg g15_ iv eg sdg sdg g15_ v sdg sdg g15_ iv eg

s
s
i
14_ 14_ cpf 3_h com 14_ 14_ cpf 3_h 14_ 14_ cpf 3_h com
f
sd ishs raw
t a sdg g3_in lifee ort f
sd ishs raw
t a sdg fee rt f
sd ishs raw
t a sdg g3_in lifee ort
sdg1 g14_c tocks l sd g3_ atm sdg1 g14_c tocks l 3_li tmo sdg1 g14_c tocks l sd g3_ atm
4_cle pm sd g3_m s 4_cle pma sdg g3_ma s 4_c pm sd g3_m s
sdg1 anwat a sd 3_ncd sdg1 anwat sd 3_ncd sdg1 leanwat a sd 3_ncd
sdg 3_smoke sdg sdg 3_smoke
sdg13_ 3_co2pc sdg13_ 3_co2pc sdg13_ 3_co2pc
co2i sdg _swb co2i wb co2i sdg _swb
sdg11_tra mport sdg3 sdg11_tra mport sdg3_s sdg11_tra mport sdg3
nsport sdg3_tb nsport sdg3_tb nsport sdg3_tb
sdg11_rentover sdg3_traffic sdg3_traffic sdg11_rentover sdg3_traffic
sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort
sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc
sdg3_vac sdg3_vac sdg3_vac
sdg10_palmar sdg4_ear sdg10_palmar sdg4_ear
de 4 de
sdg10_eldres 4 sdg4 lyedu sdg10_eldres 4 sdg4 lyedu
sdg9_r _rdex 6 sdg _pisa x 6 sdg sdg9_r _rdex 6 sdg _pisa
sd 4_prim _rde sdg4 4_prima sd 4_prim
sdg9 tents 12 sdg g4_seco ary sdg9 12 _sec ry sdg9 tents sdg g4_seco ary
_pa acc ond _pa acc 12 4
sdg9 9_net use sd _s 4
sdg g4_te ocioe
nd se sdg9 9_net use i sd _s
sdg g4_te ocioe
nd
sdg _mob 9_lpi 30 5_e rtia c obu lpi 30 sdg
sdg _mob 9_lp 30 5_e rtia c
9 g da ry 9_m dg9_ 5_e 9 g da ry
sdg sd sdg d sdg sd
rtic se
sdg dg5_lf parl
sdg _artic tuse
sdg dg5_lf parl p

sdg _artic tuse
sdg dg5_lf parl p
t s at t
9_a intu
les
s g5_
8_y les
s g5_ ayga
sdg _yn les
s g5_ ayga
5_f pr
5_f pr
5_f pr
9 in
9 in
100
sd
sd 5_p san
sdg8 _unemeet
sd 5_p san
sdg dg9_
sdg dg9_
nee
sdg dg9_
100
am
am
am
sdg 6_safe at
_imp p
_imp p
sdg 6_safe at
100
sdg8 _unem
sdg8 impacc
sdg 6_safew
acc
acc
sdg 6_safew
ilyp
ilyp
ilyp
sdg8_a _empop
sdg
s
sdg _sanita
s
sdg6_s carcew
s
sdg6
sdg7_rens
8
sdg7_ren
sdg6_sca
sdg6_s
l
sdg6_sca
ccount
ccount
ccount
sdg7_elecac
sdg7_elecac
sdg6_water
sdg7_elecac
sdg6_water
sdg7_co2twh
sdg7_cleanfuel
sdg7_co2twh
sdg7_cleanfuel
sdg7_co2twh
sdg7_cleanfuel
8
8
_
sdg
sdg8
anita
sdg8_a
sdg8_a
rcew
rcew
Notes: We use the model to estimate the indicators’ projections for 2030. The height of each bar represents the
average gap between the SDG and the indicator level predicted by 2030 computed across countries in the cluster.
Empty spaces between bars indicate that no data was available for the corresponding indicator in any country from
the cluster. The solid black ring corresponds to the average gap across across countries (in the cluster) and indicators.
The dashed red ring indicates the largest average gap (between indicators in the cluster). The black lines at the top
of each bar denote the ± one standard error of the mean gaps across the countries of a cluster. For precise estimates
and confidence intervals of each individual indicator gap, see Appendix C.
20
4.2 Sensitivity to changes in the budget size
A key issue to be addressed when studying the feasibility of SDGs is the impact that budgetary
changes have on the evolution of social, economic, and environmental indicators. In the context of
this paper, we are interested in understanding how sensitive are the different SDG gaps to changes in
public expenditure. Thus, we estimate the country-specific sensitivity of each indicator to changes in
the overall size of the budget during the 2020-30 period. Our dataset suggests substantial variation
in the growth of public spending between the 2000-10 and the 2010-20 decades (an average of 47%).
Thus, our estimates consider prospective simulations with positive and negative changes of up to
50% with respect to the historical expenditure levels reflected in the data.22 We measure sensitivity
by calculating the difference between gaps from a benchmark scenario that maintains the historical
expenditure levels (the average yearly expenditure from the data, projected over 10 years) and a
scenario that considers changes in the size of the budget.
Figure 6 presents a highly disaggregated picture of the different sensitivities when the budget is
increased by 50%. Larger markers denote more sensitivity, while the gray lines indicate the absence
of an indicator in a particular country. As a reference point, the largest marker corresponds to
a reduction of 13% in an SDG gap. From this visualization we can highlight several important
results. First, there is substantial heterogeneity across countries–positioned in the vertical axis–
and indicators–positioned in the horizontal axis–with respect to the magnitude of gap reductions.
Second, the most notorious impacts are not randomly scattered but rather concentrated in specific
SDGs (compare columns of different colors) and indicators. For instance, two gaps in SDG 9
(‘Logistics performance index’ and ‘Mobile broadband subscriptions’) have notable gap reductions
in most of the countries where data is available. Third, the gaps of economic indicators in SDG
8 are not particularly sensitive to a 50% increase in the budget size, especially when compared
with those of SDG 9 (see the size of brown markers versus that of orange markers). Fourth, with
the exception of some African cases, the gaps in SDGs 13, 14, and 15 (the environmental ones)
rarely exhibit substantial improvements. Fifth, excluding a few country-indicator cases, the SDG
16 gaps do not seem responsive to a 50% increase in the budget. In section 4.3, we show that these
diverse sensitivities are the result of long-term structural factors that impose a constraint to the
22
An expenditure growth scenario for the next 10 years may be hindered thanks to the COVID-19 global pandemic.
21
effectiveness of public expenditure in government programs. Before elaborating on these structural
factors, we provide further sensitivity results related to reductions to the budget size, and to an
alternative sensitivity metric.
Figure 6: SDG gap shrinkage due to a 50% increment in per capita expenditure
AGO MYS
BDI NPL
BEN PAK
BFA PHL
BWA SGP
CAF THA
CIV VNM
CMR ARG
COG BOL
ERI BRA
GAB CHL
GHA COL
GIN CRI
GMB DOM
GNB ECU
KEN GTM
LBR HND
LSO HTI
MDG JAM
MLI MEX
MOZ NIC
MRT PAN
MUS PER
MWI PRY
NAM SLV
NER URY
NGA VEN
RWA ARE
SDN BHR
SEN DZA
SLE EGY
SWZ IRN
TCD IRQ
TGO JOR
TZA KWT
UGA LBN
ZAF MAR
ZMB OMN
ZWE QAT
AFG SAU
ALB TUN
ARM AUS
AZE AUT
BGR BEL
BIH CAN
BLR CHE
CYP CZE
GEO DEU
HRV DNK
KAZ ESP
KGZ EST
MDA FIN
MKD FRA
RUS GBR
TJK GRC
TKM HUN
TUR IRL
UKR ISR
UZB ITA
BGD LTU
CHN LVA
IDN NLD
IND NOR
JPN NZL
KHM POL
KOR PRT
LAO SVK
LKA SVN
MMR SWE
MNG USA
sdg3_smoke
sdg9_rdex
sdg9_rdres
sdg3_smoke
sdg9_rdex
sdg9_rdres
sdg2_trophic
sdg2_undernsh
sdg6_scarcew
sdg7_ren
sdg11_rentover
sdg15_redlist
sdg17_govex
sdg17_govrev
sdg2_trophic
sdg2_undernsh
sdg6_scarcew
sdg7_ren
sdg11_rentover
sdg15_redlist
sdg17_govex
sdg17_govrev
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc
sdg4_second
sdg4_socioec
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg8_unemp
sdg14_cpma
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc
sdg4_second
sdg4_socioec
sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg14_cpma
sdg16_safe
sdg3_fertility
sdg3_lifee
sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg7_co2twh
sdg7_cleanfuel
sdg7_elecac
sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe
sdg3_fertility
sdg3_lifee
sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg8_unemp
sdg7_cleanfuel
sdg7_elecac
sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
Notes: The size of the markers is proportional to the reduction of the SDG gap caused by an increase in government
spending. The biggest marker corresponds to the largest reduction in the sample. The gray lines indicate the absence
of an indicator in a particular country.
Figure 7 presents sensitivity results for a 50% reduction in budget sizes. In this case, the sensi-
tivity outcomes mean that the SDG gaps widen. As a reference point, a larger marker corresponds
to a gap augmentation of nearly 20% with respect to the benchmark case. This implies that, in gen-
eral, SDG gaps are more sensitive to a 50% reduction than to an increment of the same proportion
in the budget size. This sensitivity asymmetry becomes evident when contrasting the outcomes of
SDG 8–in ‘Adults with an account at a bank or other financial institution’–presented in Figures 7
and 6. A similar asymmetric pattern can be found in environmental indicators from SDGs 14 and
15.
To have a better understanding of the asymmetric sensitivity between a 50% increment and
reduction in the budget, we would like to revisit three assumptions about our modeling approach.
First, we aim to model short-term dynamics and, hence, long-term structural factors are given
22
Figure 7: SDG gap growth due to a 50% reduction in per capita expenditure
AGO MYS
BDI NPL
BEN PAK
BFA PHL
BWA SGP
CAF THA
CIV VNM
CMR ARG
COG BOL
ERI BRA
GAB CHL
GHA COL
GIN CRI
GMB DOM
GNB ECU
KEN GTM
LBR HND
LSO HTI
MDG JAM
MLI MEX
MOZ NIC
MRT PAN
MUS PER
MWI PRY
NAM SLV
NER URY
NGA VEN
RWA ARE
SDN BHR
SEN DZA
SLE EGY
SWZ IRN
TCD IRQ
TGO JOR
TZA KWT
UGA LBN
ZAF MAR
ZMB OMN
ZWE QAT
AFG SAU
ALB TUN
ARM AUS
AZE AUT
BGR BEL
BIH CAN
BLR CHE
CYP CZE
GEO DEU
HRV DNK
KAZ ESP
KGZ EST
MDA FIN
MKD FRA
RUS GBR
TJK GRC
TKM HUN
TUR IRL
UKR ISR
UZB ITA
BGD LTU
CHN LVA
IDN NLD
IND NOR
JPN NZL
KHM POL
KOR PRT
LAO SVK
LKA SVN
MMR SWE
MNG USA
sdg3_smoke
sdg9_rdex
sdg9_rdres
sdg3_smoke
sdg9_rdex
sdg9_rdres
sdg2_trophic
sdg2_undernsh
sdg6_scarcew
sdg7_ren
sdg11_rentover
sdg15_redlist
sdg17_govex
sdg17_govrev
sdg2_trophic
sdg2_undernsh
sdg6_scarcew
sdg7_ren
sdg11_rentover
sdg15_redlist
sdg17_govex
sdg17_govrev
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc
sdg9_mobuse
sdg5_edat
sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg9_intuse
sdg9_lpi
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc
sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg8_unemp
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma
sdg3_fertility
sdg3_lifee
sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg8_unemp
sdg7_cleanfuel
sdg7_elecac
sdg8_yneet
sdg9_articles
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe
sdg3_fertility
sdg3_lifee
sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg7_cleanfuel
sdg7_elecac
sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe
Notes: The size of the markers is proportional to the increase of the SDG gap caused by a reduction in government
spending. The biggest marker corresponds to the largest reduction in the sample. The gray lines indicate the absence
of an indicator in a particular country.
through the exogenous parameters αi that are specific to each indicator and country. Second, the
impact of the public funds devoted to the different government programs is viewed in the context
of short/mid-term effects. This is so because we model a probability γi,t representing the chance
of indicator i to improve in the subsequent period t + 1. These two aspects are combined into the
evolution equation 1. While more public spending increases γi,t , the long-term structural factors αi
limit the speed of such growth. Therefore, government expenditure only affects γi,t , not αi . Third,
public spending is a necessary condition for development. Thus, from looking at the evolution
equation we can tell that, if less expenditure brings γt,i close to zero, then the growth trials are
almost always unsuccessful, so the indicator dynamics stagnate.
From the side of budgetary increments, there seems to be a limit to how much some gaps can be
reduced in a given period while, on the side of reductions, no improvements can be expected in the
absence of public funds. Furthermore, given the law of motion of the indicators, and other micro-
foundations of the model, the response to expenditure changes may vary in non-linear ways. Thus,
to provide a full picture of these non-linear response functions, we measure sensitivity in terms
23
of the number of years that it would take to achieve the SDGs (for each indicator and country),
and compute them for 1% variations (positive and negative) in the budget size. We present the
results aggregated into SDGs and clusters, but provide country-level plots in http://github.com/
oguerrer/sdg_feasibility. In Figure 8, we present the aggregate response functions in the range
of budgetary changes between -50% to +50% (with marginal changes of 1%).23 We calculate the
response functions using the difference in the number of years it takes for an adjusted budget
to reach the levels of the indicators obtained in 2030 with the benchmark scenario. In the latter
calculation, the historical annual expenditure average is projected forward throughout the following
decades. If the budgetary changes produce additional years, there is a delay, while if there are some
saved years, there delay displays in a negative scale.
Due to the aforementioned problems of aggregating indicators, the results presented in 8 should
be considered as qualitative evidence of the non-linear responses to changes in public expenditure.24
First, note that every SDG shows certain level of sensitivity to both positive and negative bud-
getary changes. Second, confirming our previous findings, the sensitivity to positive and negative
budgetary changes are systematically asymmetrical in terms of the responses’ magnitudes. Third,
the sensitivity rankings across indicators vary between clusters and depending on the magnitude
and direction of the budgetary change. For example, for countries in West, SDG 13 is the most
sensitive to budgetary reductions, but the same ranking position is not observed in other clus-
ters. Nevertheless, it is important to emphasize that the SDG 13 systematically exhibits important
delays in all clusters.
23
The curves in Figure 8 are composed of indicators that were able to converge during a set of Monte Carlo
simulations. Thus, because there are selection biases due to the exclusion of non-converging indicators, these curves
should be considered a qualitative result about the non-linear nature of development outcomes to budgetary changes.
24
We provide the data of the country-indicator specific responses in http://github.com/oguerrer/sdg_
feasibility.
24
Figure 8: Changes in convergence time as a function of the budget size
(a) Africa (b) E. Europe & C. Asia (c) East & South Asia
60
1 5 9 15 1 5 9 15 2 6 11 15
2 6 11 16 60 2 6 11 16 60 3 7 13 16
average years of delay

40 3 7 13 17 3 7 13 17 4 8 14 17
4 8 14 40 4 8 14 40 5 9
20
20 20
0
0 0
20 20 20
40 40 40
40 20 0 20 40 40 20 0 20 40 40 20 0 20 40
percentage change in per capita expenditure percentage change in per capita expenditure percentage change in per capita expenditure
(d) LAC (e) MENA (f) West
1 5 9 14 50 1 5 9 15 1 5 9 14
60 2 6 10 15 2 6 11 16 60 2 6 10 15

3 7 11 16 40 3 7 13 17 3 7 11 16
4 8 13 17 30 4 8 14 4 8 13 17
40 40
20
20 20
10
0 0 0
10
20 20
20
40 20 0 20 40 40 20 0 20 40 40 20 0 20 40
percentage change in per capita expenditure percentage change in per capita expenditure percentage change in per capita expenditure
Notes: These response functions are calculated for each cluster (panel) and SDG (colored lines) averaging across
indicators. The horizontal axis denotes the increment or reduction of the annual budget during the decades following
2020. A positive value in the vertical axis indicates the number of additional years that it would take to reach the
levels originally projected for 2030 (hence, when the budget change is zero, all the lines collapse at zero in the y-axis).
A negative value in the y-axis translates into years saved to reach the 2030 levels. The reference 2030 levels are
determined in the baseline scenario used for Section 4.1.
4.3 Budgetary frontiers and structural bottlenecks
Now that we have established non-linear responses of the SDG gaps to public spending, we elaborate
on their structural origins. Let us open our argument by stating the obvious: that every government
is constrained by time and resources. Thus, in a short/mid-term scenario, time is critical in order to
achieve a set of goals. While a particular policy may succeed in improving an issue, the amount of
improvement is constrained by factors such as infrastructure, organizational practices, individuals’
incentives, and technology that can only be modified through changes to the existing government
programs; changes that take place in a longer time span. Thus, in the scope of existing government
programs, these factors are considered exogenous, and we capture them through parameter αi . It
follows that the success in reaching development goals is partly determined by how much αi allows
25
an indicator to improve during a set amount of time. Not knowing these limits to success could lead
to ineffective policy priorities and bad planning in terms of long versus short/mid-term policies.
To unearth the limits imposed by structural factors, it is useful to think about the following
hypothetical question: How much, in the years left to reach the SDGs, can the SDG gaps be closed
if public funding was unlimited and fully efficient?. This theoretical scenario removes the resource
constraints from the equation, and leaves us with the interaction between structural factors and
time. Therefore, by estimating the SDG gaps under this hypothetical setting, it is possible to
establish bounds to how small an SDG gap can become in by 2030. To achieve this, we only
need to assume ξ(γi ) = 1 in equation 1. When ξ(γi,t ) = 1 for every indicator, we say that the
country operates at the ‘budgetary frontier’. Thus, the SDG gaps obtained at this frontier describe
the limitations of increasing expenditure in the current government programs. In other words, if
an SDG gap remains open at the budgetary frontier, it means that–regardless of how much public
expenditure increases–the strategy will be unsuccessful if the long-term structural factors (i.e. their
bottlenecks) in that policy issue are not addressed.
Figure 9 presents the average SDG gaps at the budgetary frontiers of the different countries in
the sample. Panel (a) aggregates the gaps across indicators with each country. Note that none of
the average gaps closes entirely, even in the most advanced nations. As expected, these gaps are
wider in Africa, reaching 39% in the Central African Republic (CAF). This diagram illustrates the
relevance that local features have in the wide disparities observed across countries’ performances.
The estimated SDG gaps at the budgetary frontiers show countries exhibiting structural long-terms
hindrances of different magnitudes, even if they belong to the same cluster. Although the model
cannot distinguish the specific reasons behind these discrepancies, it makes sense to argue that
their causes lie in bottlenecks of a local nature.
The right panel in Figure 9 shows the average gaps at the budgetary frontier, aggregated into
SDGs within each cluster. The fact that SDG 13 presents a near-null gap in countries from Africa
and LAC indicates that environmental issues related to climate action could be improved, on
average, by properly funding existing government programs in those regions. However, this is not
the case for other environmental SDGs. For instance, in SDGs 14 and 15, the frontier gaps vary
between 27 and 42%25 .

25
SDG 15 for West is an outlier with a gap of 15%.
26
Figure 9: Budgetary frontiers
(a) Country level (b) SDG-cluster level
TUN Africa 13 Africa

DZA E. Europe & C. Asia E. Europe & C. Asia
East & South Asia 10 East & South Asia
BHR
BOL
LAC 8 LAC
MENA 6 MENA
PRY
West West
PAN 4
PER 3
MYS 3 16
ITA THA SAU
1 11
LVA AUS VNM OMN
FIN CAN NPL JOR
8 5
DEU GRC MAR UZB EGY KWT LBN 3 1
HUN ISR QAT UKR SLV JAM PHL 1 13 15
NLD LTU ESP IRN GEO NIC VEN BGD IND 13 6 9
CZE GBR POL COL AZE HND MEX TKM MWI 6 10 8 2
SVN NZL USA URY ARM MNG LKA SWZ CMR PAK 4 7 4 16
BEL PRT ECU CRI MDA IDN KHM KEN MRT AFG
10 13 6 5 17 16 5 2 15
FRA SVK SGP DOM KAZ RUS LAO CIV IRQ BEN TGO GNB
EST ARE KOR BRA ALB TJK TUR ZWE GTM NGA SLE SDN
7 7 8 4 7 10 5 16 9 15 17
DNK SWE CYP JPN CHL KGZ MUS GHA GAB MMR MOZ UGA AGO HTI 13 6 3 1 11 7 2 2 11 15 17 15 15 9
CHE IRL HRV BLR ARG BIH BWA SEN LSO ZMB GMB GIN NER MDG TCD 13 3 8 10 16 2 16 11 5 11 14 9 14 14 14 17
AUT NOR BGR MKD CHN ZAF NAM RWA COG BFA TZA MLI BDI LBR ERI CAF 10 1 4 8 3 11 4 5 6 2 17 14 14 9 7 1 17 9
5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
SDG gap at the budgetary frontier (%) SDG gap at the budgetary frontier (%)
Notes: A country operating at the budgetary frontier has a ξ(γi,t ) = 1 for every indicator i and every period t (see
equation 2) in the methods section). At the budgetary frontier, the only frictions slowing down the indicators’ growth
are in the structural parameter αi . Panel (a): budgetary frontiers calculated by averaging gaps across indicators for
each individual country. Panel (b): budgetary frontiers calculated by averaging gaps across countries and indicators
at the level of SDG for each cluster. The average gaps have been discretized in order to produce the visualizations.
Finally, a cautious reader may consider that public spending should have structural conse-
quences, so the exogenous factors αi could also be affected in the short term. While this reasoning
is, in principle, correct, the empirical evidence suggests that this process is rather weak. For in-
stance, if the structural factors contained in αi were to change substantially in the short term,
then the SDG gaps estimated from simulations using more recent data samples should significantly
differ. To demonstrate that this is not the case, we calibrate and perform the same analysis as in
section 4.1 but, instead of using the full 21-year dataset (with 2000-2020 coverage) to calibrate the
model, we employ a 10-year (2011-2020) and a 5-year (2016-2020) sample.26
Figure 10 shows that our original estimates are robust to these alternative samples, as the six
clusters show relatively small differences in their average gaps (see Appendix I for more disaggre-
26
This involves re-estimating the network, the structural parameters, and the gaps.
27
Figure 10: Robustness to different sampling lengths
AGO 15 17 AFG 23 13 ARG 11 35 AUS 7 45
BDI 9 18 ALB 13 28 BOL 19 18
BEN 41 25 ARM 7 24 AUT 10 14
BFA 11 22 BRA 6 9
BEL 10 10
AZE 18 19
CHL 9 22
BWA 23 21 BGR 8 19 CAN 5 16
CAF 9 20 BIH 18 17 COL 8 18
CIV 17 21 BLR 19 20 CRI 11 15 CHE 11 11
CMR 12 21 CYP 8 20 DOM 21 21 CZE 11 20
COG 22 20 GEO 28 28 ECU 12 10 DEU 10 8
ERI 8 18 HRV 9 24 GTM 17 18
GAB 37 34 DNK 8 11
KAZ 13 16 HND 16 36
GHA 30 23 KGZ 12 30 HTI 16 30 ESP 10 19
GIN 21 44 MDA 8 16 EST 12 20
GMB 8 26 JAM 8 27
MKD 11 13
MEX 34 15 FIN 12 7
GNB 11 28
RUS 16 16
KEN 12 23
TJK 8 30 NIC 11 31 FRA 17 18
LBR 20 20
TKM 22 9 PAN 15 18 GBR 12 25
LSO 9 9
TUR 11 15 PER 21 20
MDG 7 30 GRC 21 13
MLI 26 26 UKR 40 30 PRY 15 14
HUN 17 25
MOZ 18 14 UZB 9 10 SLV 21 21
BGD 10 17 URY 20 20 IRL 9 19
MRT 16 48
CHN 24 36
MUS 16 53 VEN 19 53 ISR 13 14
MWI 5 26 IDN 8 16
ARE 23 38 ITA 8 12
NAM 37 13 IND 14 28
BHR 13 18
JPN 5 5 LTU 26 19
NER 18 22 DZA 17 55
NGA 24 24 KHM 15 18 LVA 14 18
RWA 15 12 KOR 7 10 EGY 17 22
NLD 16 12
SDN 17 22 LAO 22 18 IRN 13 9
LKA 11 9 IRQ 13 31 NOR 14 18
SEN 13 10
SLE 9 16 MMR 31 9 JOR 9 11 NZL 15 5
SWZ 14 51 MNG 10 23 KWT 10 12 POL 35 16
TCD 9 24 MYS 14 20 LBN 9 10 PRT 6 10
TGO 16 28 NPL 12 19
MAR 12 21
TZA 17 15 PAK 13 42 SVK 12 14
UGA 10 14 PHL 8 9 OMN 24 9
SVN 11 15
ZAF 11 34 SGP 6 10 QAT 17 22
SAU 10 16 SWE 6 8
ZMB 11 14 THA 4 19
ZWE 11 32 VNM 9 24 TUN 19 12 USA 10 17
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
average absolute difference in terms of SDG gap (%)
Notes: The bars indicate the average absolute difference in estimated gaps (in percentage) between the benchmark
case–using 21 years of data–and one where the model was calibrated with shorter time series. The dark bars are
calculated using the model calibrated with 10-year time series. The light bars are computed using the model calibrated
with with 5-year time series. The solid squares on the right of each panel denote the color of the SDG to which the
most sensitive indicator belongs in the case of differences using 10-year time series. The hollow ones correspond to
5-year time series. For a more disaggregated presentation of these results see Appendix I.
gated yet robust results).27 For calculating these differences, we compare the average SDG gap for
each country produced in the benchmark simulations–using 21 years of data–and the gap estimated
with smaller time series of historical data (either 10 or 5 years). Notice also that the closer the
size of these time series is to the whole historical sample, the smaller the difference in the average
gaps is. That is to say, the dark bars are smaller than the light ones. From this, we conclude that
the SDG network and the structural factors exhibit slow dynamics, validating our conceptualiza-
27
These are only differences in the average gaps. The numbers to the right of these bars show the most sensitive
indicators to the length of the time series considered. The first column of numbers corresponds to comparisons with
10-year time series, while the second to comparisons with 5-year time series.
28
tion of long- versus short/mid-term effects. Accordingly, the budgetary frontiers involve long-term
considerations and demand the implementation of innovative micro-policies.
5 A discussion on the model’s strengths and limitations
Models of multidimensional development typically use composite indices such as the Human De-
velopment Index and the SDG Index. However, if analysts wish to provide more nuanced advice
with respect to specific SDGs in terms of policy prioritization and budgetary allocations, it is
necessary to model the evolution of each separate dimension without aggregating them and losing
valuable information. This task is problematic for statistical/econometric and machine learning
approaches since they cannot deal easily with a high-dimensional policy space characterized by few
observations (short time series). For example, multi-output models (such as regressions of equation
systems or neural networks) demand unrealistically large amounts of observations for each dimen-
sion/indicator. To overcome this limitation, analysts pool cross-national data to produce their
estimates. This, however, has the costly implication of removing country-specificity because any
interpretation from the estimated parameters is limited to a hypothetical country with the average
characteristics of the sample. In addition, data-pooling strategies only work with a limited number
of indicators, since there exist only so many countries.
In data-fitting approaches, the problem of few observations aggravates when considering inter-
dependencies between indicators because the number of potential interactions (parameters to be
estimated) grows exponentially with the number of dimensions (e.g., Asadikia et al. (2021); Osuji
and Nwani (2020); Dhaoui (2018)). On the other hand, aggregate models like systems dynamics and
integrated assessment frameworks try to overcome this limitation by, ex-ante, imposing the struc-
ture of interactions (e.g., Zelinka and Amadei (2019); Pedercini et al. (2020); Collste et al. (2017)).
This approach introduces strong assumptions and still demands large amounts of data since the
analysts tend to estimate the model’s parameters through regressions. Often, if data are not avail-
able, such parameters are directly imposed from existing estimations from other countries/regions
or, again, from pooled regressions, which brings us back to the context-specificity problem. Such
limitations to the quantitative study of the SDGs become more evident in the context of the causal
relationship between government expenditure and development indicators. In terms of this nexus,
29
we provide a list of more specific drawbacks.
1. Much of the empirical quantitative literature–which policymakers often use to guide their
decisions–focus on the impact of one (or multiple) indicator(s) on another (or others). How-
ever, indicators are not instruments that governments can directly manipulate but rather
endogenous variables resulting from spending decisions. Hence, governments often motivate
their expenditure choices using studies that do not offer evidence on how effective or viable
it would be to fund a particular SDG given the existing government programs. There are
two alternatives to remedy this analytical hurdle: the use of granular expenditure data or the
implementation of a generative model of public spending.
2. Highly granular data of public expenditure–properly linked to specific development indicators–
are practically non-existent. Under these circumstances, analysts have to rely on data ag-
gregated into a few broad sectors (e.g., education, health, poverty alleviation) and select a
‘representative’ indicator for each one–or an average index.
3. When constructing a dataset to use these methods, one must assemble a large cross-country
panel to obtain the necessary degrees of freedom for making estimations possible.
4. Most data-fitting approaches can only consider one dependent variable, which is inconsistent
with the systemic view of the SDGs.
5. Establishing a causal link between an expenditure variable and an aggregate indicator is
problematic because of confounding factors and reverse causation (because a government can
adjust its budget according to the observed performance of the indicators). This problem
also applies to studies using more sophisticated machine learning methods.
6. Due to inefficiencies during the policymaking process and spillover effects, the level of expen-
diture in a policy issue does not reflect the actual amount of resources effectively used.
In general, data-fitting and aggregate models are ill-suited due to their lack of explicit causal
mechanisms. To overcome this problem, computational approaches such as agent-based models
(ABMs) can be useful. Nonetheless, these models may also demand large amounts of data, so they
are typically employed in micro-level studies relevant to a specific SDG. However, this does not
30
mean that ABMs cannot be used for comprehensive analyses of SDGs but rather that substantial
efforts need to be made in this direction. For instance, (Allen et al., 2016) provide an extensive
review of model types used for the assessment of SDGs. They find that ABMs account for only 1%
of the studies in their literature survey.
In this paper, we contribute to this effort by developing an ABM that is explicit about a
critical causal channel: public expenditure. In contrast with recent studies on SDGs, like those
mentioned in the paper’s introduction, our model can be calibrated for individual countries on a
large policy space, helping researchers and practitioners to get the most out of the available data.
Furthermore, it does not impose aggregate relationships between the different indicators. Rather,
it is very flexible since it allows the user to introduce any network of interdependencies that are
relevant to the context under study (Ospina-Forero et al., 2020). While our model is not explicit
about the full complexity of the system (and no other model is), it provides a rich enough yet
parsimonious specification, which facilitates counterfactual experiments (‘what if’ scenarios) and
allows estimating the impact of budgetary changes.
The proposed computational method also has some limitations that the reader should be aware
of. Our model cannot produce ex-ante evaluations of new government programs. Nor it can yield
policy prescriptions if, in the out-of-the-sample analysis, there is a drastic transformation in the
technological, political, and organizational underpinnings of a country. In this sense, the model
assumes a ‘business as usual’ setting in which the system keeps working with similar government
programs and structural features as those observed during the sampling period. This assumption
is realistic in short-term analyses (less than 6 years) and admissible for evaluating policy design in
a medium-term setting (5-15 years). Thus, our approach focuses on short/mid-term effects, and
it explicitly separates structural factors that shape long-term dynamics. Ironically, while much of
the existing methods suffer from the same limitations, their applications often tend to emphasize
long-term scenarios.
The robustness of the model can be enhanced when disaggregated expenditure data is available
at the SDG or government program levels. However, these databases only exist for a very few
countries. Therefore, in this paper, we offer a worldwide application of the model and show that it
can provide insightful policy guidelines even if a country only has aggregate expenditure data. As
in any quantitative approach, when more detailed empirical information is available, our ABM can
31
generate more specific policy prescriptions. For instance, with expenditure data disaggregated at
the SDG level, it is possible to establish whether different budgetary allocations, to those observed
historically, could exert an impact on the closing of development gaps.
6 Conclusion
We propose a bottom-up computational framework to analyze the short- and mid-term impact of
budgetary allocations in a large set of SDG indicators. Our simulations use data from individual
countries. Hence, it allows specifying context-dependent settings: initial conditions, calibrated pa-
rameters, and spillover effects among indicators. The underlying theory assumes fixed structural
factors in the indicators’ evolution equations and exogenous network topologies (which could be
constructed in tandem with other qualitative and quantitative approaches). Our approach is useful
to understand how to allocate resources across the existing government programs, so it facilitates
identifying key priority areas. Moreover, through counter-factual simulations, the model can dis-
cover bottlenecks associated with the inefficacy of the public expenditure, which is key to achieve
any development agenda. However, the model is not designed to identify the causes behind the
structural constraints that prevent a country from closing its SDG gaps. Hence, the outputs are
not informative about how to reformulate the existing micro-policies or how to generate new ones.
Our main results provide novel and nuanced estimates of the development gaps that will remain
open in 2030, at the level of each country and indicator. We also find that more government spend-
ing is not enough to close the SDGs gaps, even if countries were operating at a budgetary frontier
that entails enough resources for the existing government programs. Hence, complementary micro-
policies are ultimately needed to overcome structural–long-term–bottlenecks and to improve the
relevant indicators. When looking at the model’s estimates, we can offer detailed interpretations
of the simulation results. For instance, some environmental concerns such as clean air can be sub-
stantially ameliorated with a larger budget, while others (e.g., SDG 14 and 15) require undertaking
well-designed government programs in order to shift the historical course of ineffective policies.
Despite the simplicity of the model, it is possible to use it to infer two crucial country-specific
features: (i) the possibility of closing the SDG gaps, and (ii) the existence of long-term bottlenecks.
Therefore, when analysts can identify one or several government programs with a particular indi-
32
cator, it is possible to establish some policy guidelines with the model’s estimates. Depending on
the values of these features and the level of the indicator’s historical performance, it is possible to
define different routes of policy action. That is to say, whether the program should be reviewed,
in terms of incentives and organizational practices before spending more public funds, or whether
the SDG gaps can be closed by just channeling more funds into the existing programs.
References
Akenroye, T., Nygård, H., and Eyo, A. (2018). Towards Implementation of Sustainable Develop-
ment Goals (SDG) in Developing Nations: A Useful Funding Framework. International Area
Studies Review, 21(1):3–8.
Allen, C., Metternicht, G., and Wiedmann, T. (2016). National pathways to the Sustainable
Development Goals (SDGs): A Comparative Review of Scenario Modelling Tools. Environmental
Science & Policy, 66:199–207.
Amos, R. and Lydgate, E. (2020). Trade, Transboundary Impacts and the Implementation of SDG
12. Sustainability Science, 15(6):1699–1710.
Aragam, B., Gu, J., and Zhou, Q. (2019). Learning Large-Scale Bayesian Networks with the
Sparsebn Package. Journal of Statistical Software, 91(11).
Asadikia, A., Rajabifard, A., and Kalantari, M. (2021). Systematic Prioritisation of SDGs: Machine
Learning Approach. World Development, 140:105269.
Benedek, D., Gemayel, E., Senhadji, A., and Tieman, A. (2021). A Post-Pandemic Assessment of
the Sustainable Development Goals. Staff Discussion Notes, 2021(003).
Boeren, E. (2019). Understanding Sustainable Development Goal (SDG) 4 on “Quality Education”
from Micro, Meso and Macro Perspectives. International Review of Education, 65(2):277–294.
Castañeda, G., Chávez-Juárez, F., and Guerrero, O. (2018). How Do Governments Determine
Policy Priorities? Studying Development Strategies through Networked Spillovers. Journal of
Economic Behavior & Organization, 154:335–361.
33
Castañeda, G. and Guerrero, O. (2018). The Resilience of Public Policies in Economic Development.
Complexity, 2018.
Castañeda, G. and Guerrero, O. (2019a). The Importance of Social and Government Learning in
Ex Ante Policy Evaluation. Journal of Policy Modeling.
Castañeda, G. and Guerrero, O. (2019b). Inferencia de Prioridades de Polı́tica para el Desarrollo
Sostenible. Reporte Metodológico, Programa de las Naciones Unidas para el Desarrollo.
Castañeda, G. and Guerrero, O. (2019c). Inferencia de Prioridades de Polı́tica para el Desarrollo
Sostenible: El Caso Sub-Nacional de México. Reporte Técnico, Programa de las Naciones Unidas
para el Desarrollo.
Castañeda, G. and Guerrero, O. (2019d). Inferencia de Prioridades de Polı́tica para el Desarrollo
Sostenible: Una Aplicación para el Caso de México. Reporte Técnico, Programa de las Naciones
Unidas para el Desarrollo.
Collste, D., Pedercini, M., and Cornell, S. E. (2017). Policy Coherence to Achieve the SDGs: Using
Integrated Simulation Models to Assess Effective Policies. Sustainability Science, 12(6):921–931.
Dhaoui, I. (2018). Achieving Sustainable Development Goals in MENA countries: An Analytical
and Econometric Approach.
Fader, M., Cranmer, C., Lawford, R., and Engel-Cox, J. (2018). Toward an Understanding of Syner-
gies and Trade-Offs Between Water, Energy, and Food SDG Targets. Frontiers in Environmental
Science, 0.
Fuso Nerini, F., Sovacool, B., Hughes, N., Cozzi, L., Cosgrave, E., Howells, M., Tavoni, M., Tomei,
J., Zerriffi, H., and Milligan, B. (2019). Connecting Climate Action with Other Sustainable
Development Goals. Nature Sustainability, 2(8):674–680.
Gobierno del Estado de México (2020). Informe de Ejecución del Plan de Desarrollo del Estado de
México 2017-2023; a 3 Años de la Administración.
González-Pier, E., Barraza-Lloréns, M., Beyeler, N., Jamison, D., Knaul, F., Lozano, R., Yamey, G.,
and Sepúlveda, J. (2016). Mexico’s path towards the Sustainable Development Goal for health:
34
An assessment of the feasibility of reducing premature mortality by 40% by 2030. lancet.Global
health, 4(10):e714–e725.
Guerrero, O. and Castañeda, G. (2020a). Policy Priority Inference: A Computational Framework
to Analyze the Allocation of Resources for the Sustainable Development Goals. Data & Policy,
2.
Guerrero, O. and Castañeda, G. (2020b). Quantifying the Coherence of Development Policy Pri-
orities. Development Policy Review, 00:1–26.
Guerrero, O. and Castañeda, G. (2021). Does expenditure in public governance guarantee less
corruption? Large non-linearities and complementarities of the rule of law. Economics of Gov-
ernance, forthcoming.
Guerrero, O., Castañeda, G., Trujillo, G., Hackett, L., and Chávez-Juárez, F. (2021). Subna-
tional Sustainable Development: The Role of Vertical Intergovernmental Transfers in Reaching
Multidimensional Goals. SSRN Working Paper.
Ionescu, G., Firoiu, D., Tănasie, A., Sorin, T., Pı̂rvu, R., and Manta, A. (2020). Assessing the
Achievement of the SDG Targets for Health and Well-Being at EU Level by 2030. Sustainability,
12(14):5829.
Jones, B., Baumgartner, F., Breunig, C., Wlezien, C., Soroka, S., Foucault, M., François, A.,
Green-Pedersen, C., Koski, C., John, P., Mortensen, P., Varone, F., and Walgrave, S. (2009).
A General Empirical Law of Public Budgets: A Comparative Analysis. American Journal of
Political Science, 53(4):855–873.
Kroll, C., Warchold, A., and Pradhan, P. (2019). Sustainable Development Goals (SDGs): Are We
Successful in Turning Trade-Offs into Synergies? Palgrave Communications, 5(1):1–11.
Luken, R., Mörec, U., and Meinert, T. (2020). Data Quality and Feasibility Issues with Industry-
Related Sustainable Development Goal Targets for Sub-Saharan African Countries. Sustainable
development, 28(1):91–100.
Lusseau, D. and Mancini, F. (2019). Income-Based Variation in Sustainable Development Goal
Interaction Networks. Nature Sustainability, 2(3):242–247.
35
Machingura, F. and Lally, S. (2017). The Sustainable Development Goals and Their Trade-Offs.
Technical report, Overseas Development Institute, London, United Kingdom.
McGowan, P., Stewart, G., Long, G., and Grainger, M. (2019). An Imperfect Vision of Indivisibility
in the Sustainable Development Goals. Nature Sustainability, 2(1):43–45.
Mensi, A. and Udenigwe, C. (2021). Emerging and Practical Food Innovations for Achieving
the Sustainable Development Goals (SDG) Target 2.2. Trends in Food Science & Technology,
111:783–789.
Moyer, J. and Hedden, S. (2020). Are We on the Right Path to Achieve the Sustainable Development
Goals? World Development, 127:104749.
OECD (2020). Measuring the Distance to the SDGs in Regions and Cities.
Ospina-Forero, L., Castañeda Ramos, G., and Guerrero, O. (2020). Estimating Networks of Sus-
tainable Development Goals. Information and Management.
Osuji, E. and Nwani, S. (2020). Achieving Sustainable Development Goals: Does Government
Expenditure Framework Matter? International Journal of Management, Economics and Social
Sciences (IJMESS), 9(3):131–160.
Pedercini, M., Arquitt, S., and Chan, D. (2020). Integrated Simulation for the 2030 Agenda. System
Dynamics Review, 36(3):333–357.
Pedercini, M., Arquitt, S., Collste, D., and Herren, H. (2019). Harvesting synergy from sustainable
development goal interactions. Proceedings of the National Academy of Sciences, 116(46):23021–
23028.
Philippidis, G., Shutes, L., M’Barek, R., Ronzon, T., Tabeau, A., and van Meijl, H. (2020). Snakes
and Ladders: World Development Pathways’ Synergies and Trade-Offs through the Lens of the
Sustainable Development Goals. Journal of Cleaner Production, 267:122147.
Porciello, J., Ivanina, M., Islam, M., Einarson, S., and Hirsh, H. (2020). Accelerating Evidence-
Informed Decision-Making for the Sustainable Development Goals Using Machine Learning. Na-
ture Machine Intelligence, 2(10):559–565.
36
Pradhan, P., Costa, L., Rybski, D., Lucht, W., and Kropp, J. (2017). A Systematic Study of
Sustainable Development Goal (SDG) Interactions. Earth’s Future, 5(11):1169–1179.
Pradhan, P., Subedi, D., Khatiwada, D., Joshi, K., Kafle, S., Chhetri, R., Dhakal, S., Gautam, A.,
Khatiwada, P., Mainaly, J., Onta, S., Pandey, V., Parajuly, K., Pokharel, S., Satyal, P., Singh,
D., Talchabhadel, R., Tha, R., Thapa, B., Adhikari, K., Adhikari, S., Bastakoti, R., Bhandari,
P., Bharati, S., Bhusal, Y., Bk, B., Bogati, R., Kafle, S., Khadka, M., Khatiwada, N., Lal,
A., Neupane, D., Neupane, K., Ojha, R., Regmi, N., Rupakheti, M., Sapkota, A., Sapkota, R.,
Sharma, M., Shrestha, G., Shrestha, I., Shrestha, K., Tandukar, S., Upadhyaya, S., Kropp, J.,
and Bhuju, D. (2021). The COVID-19 Pandemic Not Only Poses Challenges, but Also Opens
Opportunities for Sustainable Transformation. Earth’s Future, 9(7):e2021EF001996.
Putra, M., Pradhan, P., and Kropp, J. (2020). A Systematic Analysis of Water-Energy-Food
Security Nexus: A South Asian Case Study. Science of The Total Environment, 728:138451.
Sachs, J., Schmidt-Traub, G., Kroll, C., Lafortune, G., Fuller, G., and Woelm, F. (2020). Sustain-
able Development Report 2020. Bertelsmann Stiftung and Sustainable Development Solutions
Network (SDSN), New York.
Sobczak, E., Bartniczak, B., and Raszkowski, A. (2021). Implementation of the No Poverty Sus-
tainable Development Goal (SDG) in Visegrad Group (V4). Sustainability, 13(3):1030.
Sulmont, A., Garcı́a de Alba Rivas, M., and Visser, S. (2021). Policy Priority Inference for Sustain-
able Development: A Tool for Identifying Global Interlinkages and Supporting Evidence-Based
Decision Making. In Understanding the Spillovers and Transboundary Impacts of Public Policies.
OECD Publishing, Paris.
United Nations (2020). SDG Indicators, United Nations Global SDG Database.
Warchold, A., Pradhan, P., and Kropp, J. (2021). Variations in Sustainable Development Goal
Interactions: Population, Regional, and Income Disaggregation. Sustainable Development,
29(2):285–299.
World Bank (2020). SDG Atlas 2020.
37
Zelinka, D. and Amadei, B. (2019). A Systems Approach for Modeling Interactions Among the
Sustainable Development Goals Part 2: System Dynamics. International Journal of System
Dynamics Applications (IJSDA), 8(1):41–59.
38
How Does Government Expenditure Impact Sustainable
Development?
Studying the Multidimensional Link between Budgets and
Development Gaps
Online Appendix
Omar A. Guerrero1,2 and Gonzalo Castañeda3
1
Department of Economics, UCL, United Kingdom
2
The Alan Turing Institute, United Kingdom
3
Centro de Investigación y Docencia Económica (CIDE), Mexico
A Full model details
This appendix provides all the equations of the agent-based model and their motivations. Guerrero
and Castañeda (2020a) provide further discussions on the theoretical foundations of the model as
well as internal and external validation tests.
A.1 Policy-making agents
There are n agents (or public officials), each in charge of a public policy that is specific to a single
policy issue. To implement the mandated policy in a given period t, agent i receives Pi,t resources
from the central authority (or government). With these resources, the public official tries to leverage
two potential benefits: (1) the reputation from being a proficient public servant and (2) the utility
derived from being inefficient. This trade-off is modeled through the benefit function
1
∗ Ci,t (Pi,t − Ci,t )
Fi,t+1 = ∆Ii,t + (1 − θi,t τ ) , (7)
Pi,t Pi,t
where Fi,t+1 represents the benefit or utility obtained in the next period. The first summand in
∗ is the change in indicator i with respect
equation 7 captures the benefit of being proficient. ∆Ii,t
to the previous period (its performance), relative to the changes of all other indicators. More
specifically, the relative change in indicator i is computed as
∗ Ii,t − Ii,t−1
∆Ii,t =P , (8)
j Ij,t − Ij,t−1
and it captures the idea that the central authority compares and evaluates the relative performance
of each public official, and their implemented policies, through the corresponding development
indicators.
Going back to the first summand in equation 7, we find that the relative change in the indicator
Ci,t
is pondered by Pi,t . Here, Ci,t is the fraction of the allocated resources Pi,t that are effectively used
towards the policy. We call it the contribution of agent i.
Next, let us focus on the second addend of equation 7, which corresponds to the utility derived
from being inefficient. Here, Pi,t − Ci,t is the benefit extracted from not devoting resources to the
policy. Thus, when dividing by Pi,t , it represents the level of inefficiency. Public procurement
mechanisms such as monitoring and penalties may hinder inefficiencies. This is captured by factor
(1 − θi,t τ ). Variable θi,t is the binary outcome of monitoring inefficiencies. If θi,t = 1, it means that
the government has spotted agent i in inefficient behavior. In that case, i is penalized by a factor
τ , such that the benefit from these private gains are reduced.
In order to model the binary outcomes of monitoring efforts, we assume that, every period, an
independent realization of θi,t takes place for each indicator. This is nothing else than a Bernoulli
process with a probability of success λi,t determined by
Pi,t − Ci,t
λi,t = ϕ , (9)
Pt∗
where Pt∗ is the largest allocation in period t. Parameter ϕ in equation 9 corresponds to the quality
of the monitoring efforts. By normalizing the inefficiencies by Pt∗ , we are considering the emergence
2
Pi,t −Ci,t
of social norms of corruption. That is, factor Pt∗ captures how much a deviation of resources
stands out from the largest allocation. Thus, diversions that deviate from this norm are more likely
to be under the spotlight and to become media scandals.
If an agent becomes more inefficient and their benefits increase, then reinforcement learning
takes place, becoming more inefficient the next period. If, in contrast, the government is able to
penalize, according to the learning process, they become more proficient the next period. There
are several ways in which an agent may become more or less inefficient. Therefore, we represent
any action through an abstract variable Xi,t , which may take any real value. If Xi,t > 0 is positive,
it means that the agent has a proclivity to be more efficient than inefficient. Otherwise, the agent
is more propense to be inefficient. We model the reinforcement of action Xi,t as
Xi,t+1 = Xi,t + sgn((Xi,t − Xi,t−1 )(Fi,t − Fi,t−1 ))|Fi,t − Fi,t−1 |, (10)
where sgn(·) is the sign function. Equation 10 corresponds to the directed learning model (Dhami,
2016), which is a type of reinforcement learning.
In order to translate action Xi,t into a contribution of resources that is bounded by [0, Pi,t ], we
define
Pi,t
Ci,t = . (11)
1 + e−Xi,t
A.2 The government agent
Policy priorities are represented by the allocation profile P = P1 , . . . Pn . It is important to introduce
a distinction between those indicators that can be intervened via public policies: instrumental ; and
those that cannot: collateral. An instrumental indicator exists if the government has a program
to directly impact it (i.e., it receives public funds). In contrast, a collateral indicator cannot be
directly impacted, for example, because it is a composite aggregation of various topics, e.g. GDP
per capita or financial development. Policy priorities can only be defined on the n instrumental
indicators, and we assume that there are n public officials (one in charge of each instrumental
indicator). When talking about all the indicators together, we say that there are N ≥ n policy
issues in total, and a government has goals for all of them (even for the collateral ones).
3
The objective of the government is to close the gap between the goals and the indicators by
solving the problem
N
" #
X
min (Gi − Ii,t )2 , (12)
i
where Gi is the goal established for indicator i. The central authority achieves this by adapting its
allocation profile.
In the real world, identifying the precise mechanisms through which governments establish their
budgets is extremely challenging. A starting point is the principle of ‘gaping’, which suggests that
governments prioritize the most laggard topics as these may be development bottlenecks. Neverthe-
less, this political process also introduces adaptations motivated from signals such as the people’s
demands, and the performance of the different expenditure programs. In the political science lit-
erature, these budgetary changes exhibit punctuated dynamics and are modeled through simple
stochastic processes (Jones et al., 2009). Thus, we combine all these insights into a government
heuristic where the policy priorities are established according to
qi,t
Pi,t = B P , (13)
j qj,t
where qi,t is the propensity to spend in policy issue i in time t, and B is the budget available in
time t.
The evolution of the policy priorities takes place through the propensities. In the first period,
these are determined by the normalized gaps
Gi − Ii,0
qi,0 = . (14)
max(G· − I·,0 )
Then, as time progresses, the propensities are updated according to
t−1
!−1 t−1
X X Pi,k − Ci,k
qi,t = qi,t−1 + U (0, 1) θi,k . (15)
Pi,k
k k|θi,k =1
The previous equation is rather intuitive. The term U (0, 1) is a random draw from a uniform
distribution in the (0,1) interval. This captures the randomness of societal signals received by the
4
government (it is consistent with the stochastic processes used to model budgetary changes in the
literature). The remaining terms to the right correspond to the inter-temporal average inefficiency,
which lies in the interval [0,1]. Therefore, the government encourages increments among the most
efficient policymaking agents. Note that, in general, the contribution Ci,t is not observable by the
government, unless there is a successful audit by the monitoring authority. This is why equation 15
conditions the efficiency bias in the allocation of the budget to successful outcomes of the monitoring
random variable θi,t . Thus, the government tends to be more inquisitive with policymakers whose
inefficiencies have been spotted in the past.
A.3 Indicator dynamics
As discussed in the main text, we model indicator dynamics through a random growth process. Let
γi denote a probability associated with an improvement in indicator i. This probability depends on
a combination of network effects (i.e., incoming spillovers) and budgetary allocations. Therefore,
the growth process is modeled as independent Bernoulli trials with a probability of success
1 P
Ci,t + n j Cj,t
γi,t = β , (16)
1 + e−Si,t
where β is a normalizing parameter and Si,t are the net amount of spillovers received by indicator
i in time t (this could be positive or negative). The spillovers are computed every period according
P
to Si,t = j 1j,t Aj,i , where 1 is the indicator function: 1 if indicator j grew in the previous period
and 0 otherwise.
Next, we define the difference equation of indicator i as
Ii,t+1 = Ii,t + αi ξ(γi,t ) (17)
where ξ(·) is the binary outcome (0 or 1) of a growth trial. Note that, if the indicator exceeds its
theoretical maximum (when provided by the user), the model assigns zero growth.
5
B Data
B.1 Development indicators
The original dataset of the indicators used for the 2020 Sustainable Development Report (SDR)
can be downloaded here: github.com/sdsna/SDR2020. In total, we use 77 indicators and 140
countries. Table B.1 provides the complete list of indicators, their codes, and the SDG to which
they belong. Table B.1, on the other hand, presents the complete list of countries that we extracted
for our sample, with counts of indicators and SDG coverage.
Table B.2: Indicators and SDGs per country
Country Indicators SDGs Country Indicators SDGs Country Indicators SDGs
AFG 49 13 AGO 56 15 ALB 54 14

ARE 51 13 ARG 56 14 ARM 52 13
AUS 71 16 AUT 66 15 AZE 51 13
BDI 50 14 BEL 69 16 BEN 53 15
BFA 52 14 BGD 55 14 BGR 55 14
BHR 46 14 BIH 50 13 BLR 52 13
BOL 51 13 BRA 56 14 BWA 53 14
CAF 49 14 CAN 69 16 CHE 66 15
CHL 71 16 CHN 52 14 CIV 55 15
CMR 54 15 COG 52 15 COL 56 14
CRI 56 14 CYP 51 14 CZE 68 15
DEU 72 16 DNK 72 16 DOM 54 14
DZA 55 15 ECU 55 14 EGY 57 15
ERI 46 14 ESP 71 16 EST 72 16
FIN 72 16 FRA 72 16 GAB 54 14
GBR 69 16 GEO 55 14 GHA 56 15
GIN 53 15 GMB 53 15 GNB 44 15
GRC 70 16 GTM 56 14 HND 55 14
Continued . . .
6
Table B.2: Indicators and SDGs counts per country (continued . . .)
HRV 55 14 HTI 50 13 HUN 68 15

IDN 56 14 IND 55 14 IRL 71 16
IRN 56 15 IRQ 54 15 ISR 67 16
ITA 72 16 JAM 53 14 JOR 53 15
JPN 68 16 KAZ 52 13 KEN 57 15
KGZ 52 13 KHM 56 14 KOR 67 16
KWT 51 14 LAO 49 13 LBN 52 15
LBR 54 15 LKA 56 14 LSO 51 14
LTU 70 16 LVA 72 16 MAR 56 15
MDA 51 13 MDG 56 15 MEX 71 16
MKD 51 13 MLI 52 14 MMR 54 14
MNG 52 13 MOZ 56 15 MRT 53 15
MUS 55 15 MWI 51 14 MYS 56 14
NAM 57 15 NER 51 14 NGA 54 15
NIC 55 14 NLD 71 16 NOR 72 16
NPL 51 13 NZL 69 16 OMN 49 14
PAK 56 14 PAN 54 14 PER 56 14
PHL 56 14 POL 70 16 PRT 72 16
PRY 52 13 QAT 47 14 RUS 54 14
RWA 51 14 SAU 52 14 SDN 55 15
SEN 55 15 SGP 49 14 SLE 54 15
SLV 56 14 SVK 68 15 SVN 70 16
SWE 70 16 SWZ 48 14 TCD 50 14
TGO 54 15 THA 56 14 TJK 51 13
TKM 45 11 TUN 57 15 TUR 67 16
TZA 57 15 UGA 52 14 UKR 56 14
URY 55 14 USA 71 16 UZB 51 13
VEN 53 14 VNM 56 14 ZAF 57 15
Continued . . .
7
Table B.2: Indicators and SDGs counts per country (continued . . .)
ZMB 52 14 ZWE 51 13
Sources: Sample from the 2020 Sustainable Development Report.
8
Table B.1: List of policy issues by SDG
SDG Code Description
1 320pov Poverty headcount ratio at $3.20/day (%)
1 oecdpov Poverty rate after taxes and transfers (%)
2 crlyld Cereal yield (tonnes per hectare of harvested land)
2 obesity Prevalence of obesity, BMI ≥ 30 (% of adult population)
2 snmi Sustainable Nitrogen Management Index (worst 0-1.41 best)
2 trophic Human Trophic Level (best 2-3 worst)
2 undernsh Prevalence of undernourishment (%)
2 wasting Prevalence of wasting in children under 5 years of age (%)
3 births Births attended by skilled health personnel (%)
3 fertility Adolescent fertility rate (births per 1,000 adolescent females aged 15 to 19)
3 hiv New HIV infections (per 1,000 uninfected population)
3 incomeg Gap in self-reported health status by income (percentage points)
3 lifee Life expectancy at birth (years)
3 matmort Maternal mortality rate (per 100,000 live births)
3 ncds Age-standardized death rate due to cardiovascular disease, cancer, diabetes, or chronic respiratory disease in adults
aged 30–70 years (%)
3 smoke Daily smokers (% of population aged 15 and over)
3 swb Subjective well-being (average ladder score, worst 0-10 best)
3 tb Incidence of tuberculosis (per 100,000 population)
3 traffic Traffic deaths (per 100,000 population)
3 u5mort Mortality rate, under-5 (per 1,000 live births)
3 uhc Universal health coverage (UHC) index of service coverage (worst 0-100 best)
3 vac Percentage of surviving infants who received 2 WHO-recommended vaccines (%)
4 earlyedu Participation rate in pre-primary organized learning (% of children aged 4 to 6)
4 pisa PISA score (worst 0-600 best)
4 primary Net primary enrollment rate (%)
4 second Lower secondary completion rate (%)
4 socioec Variation in science performance explained by socio-economic status (%)
4 tertiary Tertiary educational attainment (% of population aged 25 to 34)
5 edat Ratio of female-to-male mean years of education received (%)
5 familypl Demand for family planning satisfied by modern methods (% of females aged 15 to 49 who are married or in unions)
5 lfpr Ratio of female-to-male labor force participation rate (%)
5 parl Seats held by women in national parliament (%)
5 paygap Gender wage gap (% of male median wage)
6 safesan Population using safely managed sanitation services (%)
6 safewat Population using safely managed water services (%)
6 sanita Population using at least basic sanitation services (%)
6 scarcew Scarce water consumption embodied in imports (m3 /capita)
6 water Population using at least basic drinking water services (%)
7 cleanfuel Population with access to clean fuels and technology for cooking (%)
7 co2twh CO2 emissions from fuel combustion for electricity and heating per total electricity output (MtCO2 /TWh)
7 elecac Population with access to electricity (%)
7 ren Share of renewable energy in total primary energy supply (%)
8 accounts Adults with an account at a bank or other financial institution or with a mobile-money-service provider (% of
population aged 15 or over)
8 empop Employment-to-population ratio (%)
8 impacc Fatal work-related accidents embodied in imports (per 100,000 population)
8 unemp Unemployment rate (% of total labor force)
8 yneet Youth not in employment, education or training (NEET) (% of population aged 15 to 29)
9 articles Scientific and technical journal articles (per 1,000 population)
9 intuse Population using the internet (%)
9 lpi Logistics Performance Index: Quality of trade and transport-related infrastructure (worst 1-5 best)
9 mobuse Mobile broadband subscriptions (per 100 population)
9 netacc Gap in internet access by income (percentage points)
9 patents Triadic patent families filed (per million population)
9 rdex Expenditure on research and development (% of GDP)
9 rdres Researchers (per 1,000 employed population)
10 elder Elderly poverty rate (% of population aged 66 or over)
10 palma Palma ratio
11 pipedwat Access to improved water source, piped (% of urban population)
11 pm25 Annual mean concentration of particulate matter of less than 2.5 microns in diameter (PM2.5) (µg/m3 )
11 rentover Population with rent overburden (%)
11 transport Satisfaction with public transport (%)
13 co2import CO2 emissions embodied in imports (tCO2 /capita)
13 co2pc Energy-related CO2 emissions (tCO2 /capita)
14 cleanwat Ocean Health Index: Clean Waters score (worst 0-100 best)
14 cpma Mean area that is protected in marine sites important to biodiversity (%)
14 fishstocks Fish caught from overexploited or collapsed stocks (% of total catch)
14 trawl Fish caught by trawling (%)
15 cpfa Mean area that is protected in freshwater sites important to biodiversity (%)
15 cpta Mean area that is protected in terrestrial sites important to biodiversity (%)
15 redlist Red List Index of species survival (worst 0-1 best)
16 detain Unsentenced detainees (% of prison population)
16 homicides Homicides (per 100,000 population)
16 prison Persons held in prison (per 100,000 population)
16 rsf Press Freedom Index (best 0-100 worst)
16 safe Percentage of population who feel safe walking alone at night in the city or area where they live (%)
17 govex Government spending on health and education (% of GDP)
17 govrev Other countries: Government revenue excluding grants (% of GDP)
B.2 Public governance indicators
As part of the behavioral component of the policymaking agents described in Appendix A.1, we
consider two mechanisms that affect the incentives of the agents in determining their contributions.
9
These are the quality of monitoring (parameter ϕ from equation 9) and the quality of the rule
of law (parameter τ from equation 7). Since both are institutional variables, we consider them
exogenous and directly impute their values through empirical data.1 The intuition behind these
parameters is to provide a comparative metric of the different qualities of the public procurement
mechanisms across countries. Therefore, rather than being actual estimates, they are indicators
reflecting relative qualities. We use the Worldwide Governance Indicators database, which can
be obtained here: info.worldbank.org/governance/wgi. In particular, we obtain the indicators
of control of corruption, reflecting the quality of the monitoring efforts by the central authority,
and the one of rule of law, capturing the quality of institutions designed to reassure a law-abiding
society. These data are normalized using their theoretical minimum and maximums (provided in
the source dataset), so all their values lie in the interval (0, 1). Then, for the countries in the
SDR sample, we compute the inter-temporal values of these two indicators for the period 2000-
2020. As we later show in Appendix G, this institutional information, together with the behavioral
component, facilitate the external validation of the model.
B.3 Instrumental indicators
In the model, policy priorities (budgetary allocations) are defined over those indicators that are
considered to be directly impacted through specific government programs; that is why we call them
instrumental indicators. In this study, we identify a subset of indicators that, from our experience,
are likely to be instrumental. The rest of the indicators are defined as collateral because we
find them too aggregate for any government to claim any capability of direct manipulation. Of
course, some indicators could be instrumental in some countries but not in others. This, however,
requires extensive contextual knowledge, something difficult to obtain when studying 140 nations.
Therefore, the 55 indicators identified in Table B.3 are assumed to be instrumental in all countries
from our sample.
1
Castañeda et al. (2018); Guerrero and Castañeda (2021) show that these governance variables can also be con-
sidered endogenous if these are relevant indicators.
10
Table B.3: Instrumental indicators
SDG Indicator code SDG Indicator code SDG Indicator code SDG Indicator code
1 320pov 1 oecdpov 2 snmi 2 undernsh

2 wasting 3 births 3 incomeg 3 matmort
3 tb 3 u5mort 3 uhc 3 vac
4 earlyedu 4 pisa 4 primary 4 second
4 socioec 4 tertiary 5 edat 5 familypl
6 safesan 6 safewat 6 sanita 6 water
7 cleanfuel 7 co2twh 7 elecac 7 ren
8 accounts 8 yneet 9 intuse 9 lpi
9 mobuse 9 netacc 9 rdex 10 elder
11 pipedwat 11 pm25 11 rentover 11 transport
13 co2import 13 co2pc 14 cleanwat 14 cpma
14 fishstocks 14 trawl 15 cpfa 15 cpta
15 redlist 16 detain 16 homicides 16 prison
16 safe 17 govex 17 govrev
Sources: Authors’ manual classification.
C Confidence
C.1 Confidence intervals
We are interested in building confidence intervals for the SDG gaps. Recall these are the average
gaps computed over multiple Monte Carlo simulations. In order to construct their confidence
intervals the brute force approach is to perform X sets of M Monte Carlo simulations. Taking as
reference that we use M = 10000 for our estimates, confidence intervals based on X = 10000, for
example, would imply a hundred million simulations for each country. A more efficient strategy
is to construct bootstrap confidence intervals from the M Monte Carlo simulations of the original
estimate. Figure C.1 shows an example of the distributions of SDG gaps for illustrative indicators
11
from Mexico obtained through both methods. As predicted by the bootstrap theory (Efron, 1981),
the bootstrap intervals closely resemble the original intervals for a large enough M and sufficient
resampling.
Figure C.1: Brute-force and bootstrap confidence intervals for SDG gaps
(a) Nitrogen management (b) Researchers (c) Fish stocks

140
bootstrap bootstrap 50 bootstrap
30 brute force 120 brute force brute force
25 100 40
20
density
density
density 80 30
15 60
20
10 40
5 10
20
0 0 0
53.34 53.36 53.38 53.40 53.42 53.44 94.125 94.130 94.135 94.140 94.145 28.55 28.56 28.57 28.58 28.59 28.60
SDG gap for sdg2_snmi (%) SDG gap for sdg9_rdres (%) SDG gap for sdg14_fishstocks (%)
Notes: Each gap was estimated using 1000 Monte Carlo simulations. The brute-force distributions use 10000 gap
estimates. The bootstrap distributions use 10000 re-samples.
From Figure C.1, we can see that the confidence intervals of the SDG gaps are narrow. This is
a systematic feature across all the estimated SDG gaps. The widest intervals have an amplitude
of nearly 0.5% between the percentiles 2.5 and 97.5. While the data files with exact estimates
and percentiles can be found in the repository http://github.com.oguerrer/sdg_feasibility,
here we provide a qualitative view of the amplitude of the intervals for the SDG gaps of each
country and indicator. In Figure C.2, we depict marker sizes proportionally to the amplitude of the
corresponding 95% confidence interval. This visualization provides information about the relative
uncertainties across countries and indicators with respect to their SDG gaps.
12
Figure C.2: Amplitude of 95% bootstrap confidence intervals of SDG gaps
AGO MYS
BDI NPL
BEN PAK
BFA PHL
BWA SGP
CAF THA
CIV VNM
CMR ARG
COG BOL
ERI BRA
GAB CHL
GHA COL
GIN CRI
GMB DOM
GNB ECU
KEN GTM
LBR HND
LSO HTI
MDG JAM
MLI MEX
MOZ NIC
MRT PAN
MUS PER
MWI PRY
NAM SLV
NER URY
NGA VEN
RWA ARE
SDN BHR
SEN DZA
SLE EGY
SWZ IRN
TCD IRQ
TGO JOR
TZA KWT
UGA LBN
ZAF MAR
ZMB OMN
ZWE QAT
AFG SAU
ALB TUN
ARM AUS
AZE AUT
BGR BEL
BIH CAN
BLR CHE
CYP CZE
GEO DEU
HRV DNK
KAZ ESP
KGZ EST
MDA FIN
MKD FRA
RUS GBR
TJK GRC
TKM HUN
TUR IRL
UKR ISR
UZB ITA
BGD LTU
CHN LVA
IDN NLD
IND NOR
JPN NZL
KHM POL
KOR PRT
LAO SVK
LKA SVN
MMR SWE
MNG USA
sdg3_smoke
sdg9_rdex
sdg9_rdres
sdg3_smoke
sdg9_rdex
sdg9_rdres
sdg2_trophic
sdg2_undernsh
sdg6_scarcew
sdg7_ren
sdg11_rentover
sdg15_redlist
sdg17_govex
sdg17_govrev
sdg2_trophic
sdg2_undernsh
sdg6_scarcew
sdg7_ren
sdg11_rentover
sdg15_redlist
sdg17_govex
sdg17_govrev
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc
sdg9_mobuse
sdg5_edat
sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg9_intuse
sdg9_lpi
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc
sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg8_unemp
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma
sdg3_fertility
sdg3_lifee
sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg8_unemp
sdg7_cleanfuel
sdg7_elecac
sdg8_yneet
sdg9_articles
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe
sdg3_fertility
sdg3_lifee
sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg7_cleanfuel
sdg7_elecac
sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe
Notes: The size of the markers are proportional to the amplitude of the 95% bootstrap confidence interval. The
largest marker correspond to an amplitude of approximately 0.5%. The gray lines indicate the absence of an indicator
in a particular country.
C.2 Uncertainty from data quality
The distributions of SDG gaps reported above are the result of the stochastic elements of the model
(such as the growth process of the indicators), of path dependency in the learning component, and
of the random initial conditions of the endogenous variables. A natural question for any empirical
work that relies on indicators is how can we asses our confidence on the inferences, if the data are
subject to errors. Here, we show how to incorporate this additional source of uncertainty when
estimating the SDG gaps.
In the agent-based modeling literature, one can find different strategies to tackle this problem
because each model may have a very particular way of using the data. Furthermore, models that
can be approximated through stationary stochastic processes may enjoy the benefit of existing
asymptotic results from the statistical literature. In the case of the model presented in this paper,
propagating the uncertainty of the data into the parameters involves heavy computational work.
In this appendix, we present a viable strategy, and provide some results for an illustrative country.
First, let us assume that we know the standard error ei,t of each empirical indicator at each point
13
in time. With this information, we can generate an alternative dataset in which each data point
has been perturbed according to its standard error, so its value is a randomly chosen point in the
interval [Ii,t − ei,t , Ii,t + ei,t ]. Once this alternative dataset has been built, we compute the fraction
of positive first differences Γ across all indicators, which will be used to calibrate parameter β.
This alternative dataset also provides updated values for the indicators’ initial and final conditions,
necessary to calibrate α1 , . . . , αN . Thus, with this information, we calibrate the model following
the procedure described in section 2.3 of the main text. Finally, we store the resulting parameters
and repeat the entire process in order to obtain a sample of parameter configurations. In order
to compute the confidence intervals that account for the indicators’ errors, we need to perform
independent estimations of the SDG gaps for each parameter configuration.2
The procedure described above assumes knowledge about the errors of the indicators. Unfortu-
nately, the SDR does not provide this information; and most of the original sources do not report
them either. For this reason, we do not report these intervals in the main text. However, here
we present an example of the procedure for the case of Mexico, using the inter-temporal standard
deviation of each indicator in order to obtain a proxy error ρi /(number of years). First, in Figure
C.3, we present histograms approximating the distributions of the model’s parameters. Then, in
Table C.1, we show the confidence intervals obtained for the different SDG gaps, and compare them
with the ones estimated when no measurement error is assumed.
In Figure C.4, we compare the distributions from Figure C.1 with the ones obtained when
accounting for the indicators’ errors. As expected, the errors from the indicators introduce more
variability in the distributions of the SDG gaps. However, the range of the new distributions is still
modest, usually not exceeding 1%.
2
The estimations have to be done strictly with the stored parameters and not by randomizing them using their
distributions. The reason for this is that the calibration procedure does not treat each parameter independent of
each other; each configuration is the result of a joint estimation, so any ex post randomization should consider their
interdependencies.
14
Figure C.3: Parameter distributions obtained from randomized indicators
140 70
120 60
100 50
frequency
frequency
80 40
60 30
40 20
20 10
0 0
0 1 2 3 4 5 0.12425 0.12450 0.12475 0.12500 0.12525 0.12550 0.12575 0.12600
structural factor normalizing constant
Figure C.4: Confidence intervals of SDG gaps with and without data errors
(a) Nitrogen management (b) Researchers (c) Fish stocks

140
without data errors without data errors 50 without data errors
30 with data errors 120 with data errors with data errors
25 100 40
20
density
density
80
density
30
15 60
20
10 40
5 10
20
0 0 0
53.20 53.25 53.30 53.35 53.40 53.45 53.50 53.55 94.08 94.10 94.12 94.14 94.16 94.18 28.40 28.45 28.50 28.55 28.60 28.65 28.70 28.75
SDG gap for sdg2_snmi (%) SDG gap for sdg9_rdres (%) SDG gap for sdg14_fishstocks (%)
Table C.1: 95% confidence intervals for Mexico
Indicator Gap CI CI+se Indicator Gap CI CI+se Indicator Gap CI CI+se
oecdpov 5.32 ±0.14 ±0.29 crlyld 35.86 ±0.32 ±0.50 obesity 28.81 ±0.00 ±0.00
snmi 53.38 ±0.05 ±0.20 trophic 15.02 ±0.00 ±0.01 undernsh 3.75 ±0.01 ±0.02
wasting 1.02 ±0.02 ±0.03 births 0.00 ±0.00 ±0.00 fertility 4.44 ±0.04 ±0.07
hiv 0.06 ±0.00 ±0.00 lifee 6.71 ±0.04 ±0.07 matmort 0.22 ±0.00 ±0.00
ncds 6.29 ±0.03 ±0.05 smoke 0.00 ±0.00 ±0.00 swb 8.85 ±0.05 ±0.21
tb 0.22 ±0.00 ±0.00 traffic 11.02 ±0.00 ±0.00 u5mort 0.45 ±0.01 ±0.02
uhc 13.48 ±0.16 ±0.49 vac 13.98 ±0.03 ±0.24 earlyedu 0.00 ±0.00 ±0.00
pisa 21.01 ±0.00 ±0.00 primary 0.00 ±0.00 ±0.00 second 0.00 ±0.00 ±0.00
socioec 4.63 ±0.00 ±0.00 tertiary 49.76 ±0.11 ±0.43 edat 2.00 ±0.06 ±0.12
familypl 18.02 ±0.04 ±0.09 lfpr 39.63 ±0.12 ±0.25 parl 0.00 ±0.00 ±0.00
paygap 12.12 ±0.08 ±0.24 safesan 13.78 ±0.63 ±1.89 safewat 52.73 ±0.06 ±0.22
scarcew 0.71 ±0.01 ±0.01 cleanfuel 11.17 ±0.05 ±0.13 co2twh 1.45 ±0.01 ±0.01
elecac 0.00 ±0.00 ±0.00 ren 84.23 ±0.00 ±0.00 accounts 43.09 ±0.09 ±0.66
Continued . . .
15
Table C.1: 95% confidence intervals for Mexico (continued . . .)
Indicator Gap CI CI+se Indicator Gap CI CI+se Indicator Gap CI CI+se
impacc 0.14 ±0.00 ±0.00 yneet 12.40 ±0.06 ±0.12 articles 85.33 ±0.13 ±0.20
intuse 0.37 ±0.24 ±0.20 lpi 31.03 ±0.00 ±0.00 mobuse 1.53 ±0.33 ±0.44
netacc 1.37 ±0.32 ±0.75 patents 99.74 ±0.00 ±0.00 rdex 87.04 ±0.04 ±0.08
rdres 94.13 ±0.01 ±0.06 elder 15.86 ±0.13 ±0.26 palma 1.56 ±0.00 ±0.00
pipedwat 0.00 ±0.01 ±0.01 pm25 16.08 ±0.04 ±0.13 rentover 2.24 ±0.00 ±0.01
transport 32.30 ±0.12 ±0.41 co2import 0.48 ±0.00 ±0.00 co2pc 4.14 ±0.00 ±0.01
cleanwat 35.37 ±0.00 ±1.30 cpma 1.93 ±0.40 ±0.44 fishstocks 28.57 ±0.03 ±0.29
trawl 12.68 ±0.02 ±0.13 cpfa 75.90 ±0.05 ±0.32 cpta 55.16 ±0.11 ±0.54
redlist 32.77 ±0.00 ±0.00 detain 13.89 ±0.02 ±0.09 homicides 2.50 ±0.00 ±0.02
prison 10.71 ±0.10 ±0.23 rsf 45.57 ±0.02 ±0.11 safe 56.55 ±0.00 ±0.00
govex 37.48 ±0.18 ±0.34 govrev 35.51 ±0.25 ±0.63
Notes: All quantities are expressed in percentages.

Column ‘Gap’ denotes the estimated SDG gap to be expected in 2030.
‘CI’ corresponds to the bootstrap confidence internals.
‘CI+se’ indicates confidence intervals that account data errors.
Recall that the SDR does not report data errors, so ‘CI+se’ is illustrative.
D Network
D.1 Estimation
The network of interlinkages consists of a directed acyclic graph estimated through Bayesian meth-
ods from the package sparsebn (Aragam et al., 2019), which can be accessed here: https:
//github.com/itsrainingdata/sparsebn. The links do not represent causal relationship, but
conditional dependencies. This is discussed in detail by Guerrero and Castañeda (2020a); Ospina-
Forero et al. (2020). In order to remove the influence of temporal trends, we transform the series
into their first differences.
A virtue of Bayesian methods over alternative network estimation approaches is their ability
to specify a ‘white list’ of edges that can be considered true positives. In other words, with prior
knowledge, one can determine a set of links that would be expected from the estimation. We
identify 109 synergies (links with positive weights) that should be expected in any network of any
country. Of course, this could be refined for each individual country, should more specific contextual
information becomes available. These synergies are reported in Table D.1.
16
Table D.1: White list of synergies
Origin Destination Origin Destination Origin Destination
wpc (SDG 1) undernsh (SDG 2) wpc (SDG 1) u5mort (SDG 3) wpc (SDG 1) fertility (SDG 3)
wpc (SDG 1) vac (SDG 3) wpc (SDG 1) primary (SDG 4) wpc (SDG 1) earlyedu (SDG 4)
wpc (SDG 1) accounts (SDG 8) wpc (SDG 1) netacc (SDG 9) wpc (SDG 1) elder (SDG 10)
320pov (SDG 1) undernsh (SDG 2) 320pov (SDG 1) u5mort (SDG 3) 320pov (SDG 1) fertility (SDG 3)
320pov (SDG 1) vac (SDG 3) 320pov (SDG 1) primary (SDG 4) 320pov (SDG 1) earlyedu (SDG 4)
320pov (SDG 1) accounts (SDG 8) 320pov (SDG 1) netacc (SDG 9) 320pov (SDG 1) elder (SDG 10)
oecdpov (SDG 1) undernsh (SDG 2) oecdpov (SDG 1) u5mort (SDG 3) oecdpov (SDG 1) fertility (SDG 3)
oecdpov (SDG 1) vac (SDG 3) oecdpov (SDG 1) primary (SDG 4) oecdpov (SDG 1) earlyedu (SDG 4)
oecdpov (SDG 1) accounts (SDG 8) oecdpov (SDG 1) netacc (SDG 9) oecdpov (SDG 1) elder (SDG 10)
undernsh (SDG 2) u5mort (SDG 3) undernsh (SDG 2) lifee (SDG 3) undernsh (SDG 2) swb (SDG 3)
undernsh (SDG 2) pisa (SDG 4) wasting (SDG 2) ncds (SDG 3) wasting (SDG 2) lifee (SDG 3)
obesity (SDG 2) ncds (SDG 3) obesity (SDG 2) lifee (SDG 3) trophic (SDG 2) obesity (SDG 2)
crlyld (SDG 2) undernsh (SDG 2) snmi (SDG 2) crlyld (SDG 2) matmort (SDG 3) oecdpov (SDG 1)
matmort (SDG 3) lifee (SDG 3) matmort (SDG 3) swb (SDG 3) neonat (SDG 3) lifee (SDG 3)
u5mort (SDG 3) lifee (SDG 3) tb (SDG 3) u5mort (SDG 3) ncds (SDG 3) swb (SDG 3)
fertility (SDG 3) second (SDG 4) births (SDG 3) matmort (SDG 3) births (SDG 3) u5mort (SDG 3)
vac (SDG 3) u5mort (SDG 3) uhc (SDG 3) oecdpov (SDG 1) uhc (SDG 3) u5mort (SDG 3)
uhc (SDG 3) tb (SDG 3) uhc (SDG 3) ncds (SDG 3) uhc (SDG 3) vac (SDG 3)
uhc (SDG 3) swb (SDG 3) incomeg (SDG 3) oecdpov (SDG 1) smoke (SDG 3) ncds (SDG 3)
smoke (SDG 3) lifee (SDG 3) primary (SDG 4) swb (SDG 3) second (SDG 4) edat (SDG 5)
second (SDG 4) yneet (SDG 8) pisa (SDG 4) empop (SDG 8) socioec (SDG 4) pisa (SDG 4)
science (SDG 4) pisa (SDG 4) resil (SDG 4) pisa (SDG 4) familypl (SDG 5) fertility (SDG 3)
edat (SDG 5) fertility (SDG 3) edat (SDG 5) lfpr (SDG 5) edat (SDG 5) paygap (SDG 5)
lfpr (SDG 5) parl (SDG 5) lfpr (SDG 5) paygap (SDG 5) water (SDG 6) u5mort (SDG 3)
water (SDG 6) swb (SDG 3) sanita (SDG 6) u5mort (SDG 3) sanita (SDG 6) swb (SDG 3)
elecac (SDG 7) empop (SDG 8) cleanfuel (SDG 7) co2pc (SDG 13) co2twh (SDG 7) co2pc (SDG 13)
ren (SDG 7) cleanfuel (SDG 7) unemp (SDG 8) intuse (SDG 9) empop (SDG 8) intuse (SDG 9)
empop (SDG 8) mobuse (SDG 9) empop (SDG 8) govrev (SDG 17) yneet (SDG 8) empop (SDG 8)
intuse (SDG 9) accounts (SDG 8) lpi (SDG 9) empop (SDG 8) rdex (SDG 9) rdres (SDG 9)
rdres (SDG 9) articles (SDG 9) rdres (SDG 9) patents (SDG 9) netacc (SDG 9) empop (SDG 8)
adjgini (SDG 10) rentover (SDG 11) palma (SDG 10) rentover (SDG 11) pipedwat (SDG 11) water (SDG 6)
transport (SDG 11) swb (SDG 3) rentover (SDG 11) swb (SDG 3) co2pc (SDG 13) ncds (SDG 3)
co2import (SDG 13) ncds (SDG 3) cpma (SDG 14) fishstocks (SDG 14) cpma (SDG 14) trawl (SDG 14)
cpta (SDG 15) redlist (SDG 15) cpfa (SDG 15) fishstocks (SDG 14) safe (SDG 16) swb (SDG 3)
cpi (SDG 16) homicides (SDG 16) cpi (SDG 16) safe (SDG 16) prison (SDG 16) detain (SDG 16)
govex (SDG 17) uhc (SDG 3) govex (SDG 17) tertiary (SDG 4) govex (SDG 17) rdex (SDG 9)
govrev (SDG 17) govex (SDG 17)
Sources: Authors’ manual identification.
We also identify negative links or trade-offs that should be expected. Our trade-offs white list
is substantially smaller because establishing negative structural relations require highly contextual
information, unless is it self-evident, like in the case of industrial growth versus the environment.
We report them in table D.1.
17
Table D.2: White list of trade-offs
Origin Destination
elecac (SDG 7) co2twh (SDG 7)

elecac (SDG 7) co2pc (SDG 13)
empop (SDG 8) pm25 (SDG 11)
empop (SDG 8) co2pc (SDG 13)
Sources: Authors’ manual identification.
The specification of the white list does not force the estimation to yield a specific sign. Instead,
sparsebn takes the white lists and forces the algorithm to maintain those links in the estimated
network. It may be the case that some of these links come out with the opposite sign from the
expected one. We consider these to be false positives so we remove these links from the network in
an edge-correction procedure.
D.2 Edge correction
Besides eliminating links with an incorrect sign, we also remove negative edges between indicators
that belong to the same SDG. The intuition here is that trade-offs are likely to occur across topics
in different SDGs, not in the same one. While there is still the possibility of intra-SDG trade-offs,
we rather sacrifice them and allow this type of error (losing some true positives) than permitting a
large amount of false positives.
Finally, it is still possible that certain links have excessively large magnitudes in their weights,
i.e. outliers. We consider these to be false positives as such magnitudes are likely to be an artifact
from exogenously-produced high variance in the data. To eliminate these links, we establish weight
thresholds in the 5 and 95 percentiles of the weights of all the networks pooled together. If the weight
of a particular link lies below or above these thresholds, it is eliminated from its corresponding
network.
D.3 Imputation of missing observations
Like most statistical methods, sparsebn requires a balanced panel since it cannot produce estimates
with missing observations. Thus, we resort to a novel data imputation method created by de Wolff
18
et al. (2021). Traditional imputation methods consider linear inter and extrapolations, or some type
of clustering criterion across data from other indicators. The issue with these approaches is that
indicators often display non-linear dynamics, so traditional approaches fail to accurately account
for non-linear shifts and empirical variance (and parametric approaches like splines may be too
rigid). Today, Gaussian processes are considered the most reliable approach for data imputation
because the imputation does not try to fit a particular function to the data, but to find a function of
moments for point-specific distributions. This non-parametric approach can accommodate a wide
variety of non-linear empirical behaviors, while providing uncertainty estimates for each imputation.
The method developed by de Wolff et al. (2021) goes one step further and embeds Gaussian
processes with in a multi-input-multi-output framework that uses neural networks. This means
that, the imputation of the missing observations of indicator i in country k can be improved by
providing additional data on the same indicator i but from similar countries to k. We exploit this
virtue and construct groups with 3 reference countries whose data can be used to impute the missing
observations of a given country k. While this strategy may seem similar to the econometric practice
of pooling cross-national data, the size of our groups is considerable smaller (only 4 countries: the
country of interest k + the 3 reference ones). In addition, this imputation procedure is mainly used
for the estimation of the network, and to assign final values (from 2020) to those countries that
lack them.
The reference groups are unique to each country because we define them through a multidimen-
sional criterion. Under this criterion, we construct an index to rank the most similar countries to
k, and pick the top 3. For a given country i, the similarity index to another country j takes into
account:
• If both countries share a common border (borderk,h );
• if they belong to the same country group (groupk,h );
• their distance, weighted by population centers (distancek,h );
• the total imports of k from h (importsk,h ) and;
• the total exports from k to h (exportsk,h ).
19
To compute the similarity index, we employ trade and geographical data from the Centre
d’Etudes Prospectives et d’Informations Internationales (CEPII). Trade data on imports and ex-
ports between every country is provided by the CEPII BACI Database covering 2002 to 2018
(Gaulier and Zignago, 2010). The information on geographical proximity weighted by urban pop-
ulation centers is obtained from the CEPII GeoDist Database (de Sousa et al., 2012).
The variable borderk,h is binary and takes the value 1 if there is a shared border, and 0 otherwise.
Component groupk,h is also binary and becomes 1 if both countries belong to the same group (i.e.,
geographical cluster) and 0 otherwise. The term distancek,h is the geographical distance between
k and h, divided by the largest distance between k and any other country, and subtracted from
1. The value of importsk,h consists of the total number of imports received by country k from h,
divided by the maximum number of imports received by k from any country. Similarly, exportsk,h
consists of the total number of exports sent by country k to h, divided by the maximum number
of exports sent by k to any country. Finally, the similarity index is expressed as
similarityk,h = borderk,h + groupk,h + distancek,h + importsk,h + exportsk,h . (18)
We compute the similarity index for every pair of countries. Then, for a given country k, we
rank all other countries according to the index. We select the top 3 most similar countries to k,
and create a mini-pooled dataset that includes these nations and k (i.e., a group composed by 4
countries in total). Once the the pooled dataset of a single indicator has been built, we perform
the imputation procedure for country k only. Finally, before proceeding to estimate the networks,
we make sure that the imputed extrapolations are bound to the indicators’ theoretical limits. For
this, we develop a variance correction procedure that we explain next.
D.4 Variance correction
To correct imputed extrapolations that lie beyond the theoretical limits of an indicator, we perform
a variance compression procedure that preserves the periodicity of the extrapolations, but re-
normalizes the imputed data points in order to bound them to the limits established in the SDR
dataset. We apply this procedure also to those extrapolations that, even if they remain within the
theoretical bounds, have a variance that exceeds the empirical one. By correcting the variance of
20
the extrapolations, we produce data imputations with a volatility that is closer to the empirical
one.
To explain the variance compression procedure, let us consider forward extrapolations. Given
a time series I2000,t covering the years {2000, . . . , t} and an extrapolation Et+1,2020 covering {t +
1, . . . , 2020}, we want to compress the extrapolation such that var(Et+1,2020 ) ≤ var(I2000,t ). We
perform this compression in a procedural fashion by iteratively re-normalizing Et+1,2020 by a factor
z / 1. The compression procedure for forward extrapolations is described in algorithm 3.
Algorithm 3: Variance compression pseudocode

Input: I2000,t and Et+1,2020
1 while var(Et+1,2020 ) > var(I2000,t ) or any value in Et+1,2020 lies beyond a theoretical limit
do
2 Et+1,2020 = I2000,t (t) + z[Et+1,2020 − I2000,t (t)];
Figure D.1 shows an example of the outcome of this procedure. In this case, the extrapolation
remains within the theoretical boundaries, but the variance was substantially larger with respect to
the one from the empirical time series. This variance may have been the result of large changes in
the time series of the reference group. Thus, with the compression algorithm we are able to preserve
the information on relative fluctuations and trend direction provided by the reference group, while
normalizing the imputed data to be consistent with the empirical one in terms of its volatility. The
same logic and algorithm applies to backward extrapolations.
Figure D.1: Example of variance compression
21
Now that we have corrected the imputations we proceed to estimate the networks. We take
advantage of the reference groups previously constructed in order to pool their data and create
longer first-difference series (with 80 observations in total). This helps sparsebn in producing
sparser graphs, which reduces the rate of potential false positive links. Again, even if a small
amount of data are pooled, the estimated networks are unique because each country has a unique
reference group and indicators.
Finally, let us remind the reader that none of these imputation, pooling, and correction proce-
dures are necessary for our method to work. We decide to do them as part of our empirical strategy,
which consists of trying to minimize the number of false-positives in the spillover network. Thus,we
would also apply them if we were using alternative methods to study SDG gaps. Thus, our method
does not depend on this data pre-processing, so it can be adapted to the particular needs of each
empiricist.
E Some calibration nuances
E.1 Development goals
The model assumes a given set of development goals for the government agent. This assumption
works well for prospective estimations such as the ones done for the SDG gaps, where public
documents provide precise values for each goal. However, revealed goals may not exist in the
case of historical data. We have two alternative ways to deal with this data limitation. The first
one is to assume that the final values of the data were the goals that the government wanted to
achieve, which is the strategy followed by Guerrero and Castañeda (2020b), and which is justified
in the context of Mexico’s government public discourse about emulating the OECD countries. The
second one (the one adopted here) is to produce a random allocation profile as initial condition,
disregarding any specific goal vector. In a Monte Carlo setting, this captures the uncertainty
(without particular priors) of the goals that the government could possibly have had. Formally, it
means that, in equation 14, the initial propensities are randomly determined. Overall, the model
is flexible enough to accommodate any type of goals or, more generally, any prioritization heuristic
that the user may want to employ for the government agent. Importantly, our particular modeling
choice has been guided by the principles of parsimony and evidence from prior work by political
22
scientists.
E.2 Negative trends
By design, the model simulates indicator dynamics with non-negative growth. There are two reasons
for this. First, from the point of view of government expenditure, generating improvements through
public spending is an intuitive way to think about the expenditure-indicator linkage. Second, it
makes the model more parsimonious and easier to calibrate on a multidimensional space (since
fitting dynamics with both positive and negative changes is extremely challenging from a bottom-
up modeling perspective). In our application, we assume that, if an empirical indicator has a
negative trend, it reflects a poor performance of the existing government program. This means
that the contribution of public spending to improving the indicator is poor or near to null.
In order to capture poor performance, we apply a simple data transformation procedure to
those indicators showing negative trends. If, in the empirical data, indicator i presents a final value
lower than its initial value, then we replace the final value by largest one in the time series. This
transformation implies that a fall in the final observation may not reflect an ineffective government
program, but an exogenous event that moved the indicator away from its usual performance. If,
on the other hand, all the values in the time series are not higher than the initial observation, then
we establish a final value of I0,i + 10−3 .
E.3 Calibration algorithm
The algorithm presented in section 2.3 produces excellent goodness of fit results. Here, we would like
to discuss some details regarding its precision and scalability, especially in relation to an algorithm
previously developed by Guerrero and Castañeda (2020a). The proposed calibration procedure
outperforms the one of Guerrero and Castañeda (2020a) not only in precision, but also in speed.
Furthermore, the calibration is simultaneous for all parameters, as opposed to the ceteris paribus
approach from Guerrero and Castañeda (2020a).
Like in Guerrero and Castañeda (2020a), the precision with which the algorithm can reduce the
error depends on the number M of Monte Carlo simulations run for each evaluation. This is due to
the stability of the average final values of the simulated indicators. Nevertheless, remarkably low
error levels can be achieved through, for example M = 1000, without the need to resort to parallel
23
computing. Panel (a) in Figure E.1 shows how the error decreases exponentially during the first
iterations of the algorithm, to then decrease further, but at a slower rate. Clearly, running more
Monte Carlo Simulations allow achieving a lower error, but at a computational cost. Nevertheless,
we implement an adaptive algorithm that increases M as the average error falls, so it performs
a larger amount of simulations only when needed. Panel (b) shows the same decaying dynamics,
but at the level of each indicator, for M = 1000. Finally, panel (c) depicts the distribution of
indicator-level errors (not in absolute values) across the entire dataset.
Figure E.1: Calibration algorithm performance
(a) Precision & sample size (b) Indicators’ errors (c) Distribution of errors
0.5 4000
10 samples
10 1 100 samples 3500
1000 samples 0.4
3000
indicator error
average error
0.3 2500
frequency
10 2 2000
0.2
1500
0.1 1000
500
10 3
0.0
0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 0.015 0.010 0.005 0.000 0.005 0.010 0.015
iteration iteration error
Notes: Panel (a) shows the evolution of the average error for different numbers of Monte Carlo simulations using data
from Mexico. Panel (b) shows the evolution of the indicator-specific errors for M = 1000 using data from Mexico.
Panel (c) presents the distribution of all the errors (not in absolute values) calculated across the entire sample after
calibration.
F Goodness of fit
Table F.1 presents the calculations of goodness of fit for each country. In addition we provide the
standard deviation, minimum, and maximum values for the indicator-specific GoFα metrics. The
reader may consult the complete results in the data files provided in github.com/oguerrer/SDG_
feasibility.
24
Table F.1: Model goodness of fit
Country GoFβ GoFαi stdαi minαi maxαi Country GoFβ GoFαi stdαi minαi maxαi
AFG 0.9989 0.9960 0.003 0.986 0.9999 AGO 0.9991 0.9943 0.004 0.984 1.0000
ALB 0.9998 0.9937 0.006 0.970 0.9998 ARE 0.9985 0.9957 0.003 0.986 0.9997
ARG 0.9990 0.9933 0.005 0.981 0.9999 ARM 0.9995 0.9943 0.004 0.982 0.9997
AUS 0.9997 0.9926 0.005 0.975 1.0000 AUT 0.9992 0.9940 0.005 0.977 0.9999
AZE 0.9997 0.9957 0.003 0.982 0.9998 BDI 0.9992 0.9944 0.004 0.983 0.9999
BEL 0.9993 0.9933 0.005 0.982 0.9998 BEN 0.9997 0.9951 0.004 0.984 0.9999
BFA 1.0000 0.9965 0.003 0.984 1.0000 BGD 0.9992 0.9940 0.006 0.976 1.0000
BGR 0.9992 0.9953 0.003 0.981 0.9993 BHR 0.9995 0.9956 0.004 0.985 0.9997
BIH 0.9998 0.9933 0.005 0.981 0.9995 BLR 0.9992 0.9949 0.005 0.980 1.0000
BOL 1.0000 0.9949 0.004 0.985 0.9999 BRA 0.9993 0.9914 0.005 0.978 0.9996
BWA 0.9976 0.9933 0.005 0.976 0.9995 CAF 0.9974 0.9920 0.007 0.973 0.9999
CAN 0.9997 0.9904 0.009 0.951 0.9999 CHE 0.9991 0.9918 0.006 0.972 0.9999
CHL 0.9998 0.9951 0.004 0.979 0.9999 CHN 1.0000 0.9942 0.004 0.983 0.9998
CIV 0.9988 0.9948 0.004 0.983 0.9998 CMR 0.9976 0.9945 0.004 0.984 1.0000
COG 0.9993 0.9957 0.003 0.985 0.9999 COL 0.9986 0.9949 0.004 0.984 0.9999
CRI 0.9997 0.9937 0.005 0.979 1.0000 CYP 0.9988 0.9544 0.091 0.636 0.9989
CZE 0.9969 0.9933 0.005 0.981 1.0000 DEU 0.9981 0.9896 0.007 0.973 0.9999
DNK 0.9995 0.9929 0.005 0.973 0.9999 DOM 0.9996 0.9952 0.004 0.979 1.0000
DZA 0.9978 0.9939 0.005 0.980 1.0000 ECU 0.9998 0.9946 0.005 0.978 1.0000
EGY 0.9987 0.9939 0.004 0.984 1.0000 ERI 0.9986 0.9934 0.005 0.982 0.9997
ESP 0.9996 0.9942 0.005 0.982 0.9998 EST 0.9996 0.9964 0.003 0.987 1.0000
FIN 0.9990 0.9858 0.026 0.798 0.9999 FRA 0.9989 0.9927 0.006 0.971 1.0000
GAB 0.9993 0.9950 0.004 0.983 0.9999 GBR 0.9988 0.9905 0.009 0.946 0.9998
GEO 0.9990 0.9951 0.004 0.980 1.0000 GHA 0.9985 0.9958 0.004 0.982 0.9999
GIN 0.9987 0.9941 0.004 0.986 0.9996 GMB 0.9987 0.9953 0.004 0.984 1.0000
GNB 0.9989 0.9954 0.003 0.984 0.9998 GRC 0.9996 0.9942 0.004 0.981 0.9998
GTM 0.9978 0.9933 0.005 0.978 0.9998 HND 0.9989 0.9951 0.005 0.979 0.9999
HRV 0.9996 0.9954 0.003 0.986 1.0000 HTI 0.9938 0.9785 0.037 0.807 0.9989
HUN 0.9988 0.9941 0.004 0.980 0.9999 IDN 1.0000 0.9948 0.004 0.982 0.9999
IND 0.9995 0.9960 0.003 0.988 0.9999 IRL 0.9999 0.9948 0.004 0.981 1.0000
IRN 0.9997 0.9943 0.004 0.981 0.9999 IRQ 0.9969 0.9945 0.005 0.978 1.0000
ISR 0.9993 0.9520 0.095 0.524 0.9999 ITA 0.9997 0.9941 0.005 0.975 0.9996
JAM 0.9973 0.9786 0.036 0.869 0.9996 JOR 0.9994 0.9936 0.005 0.980 1.0000
JPN 0.9964 0.9454 0.099 0.499 0.9997 KAZ 0.9997 0.9946 0.004 0.982 0.9999
KEN 0.9975 0.9940 0.004 0.981 1.0000 KGZ 0.9988 0.9939 0.005 0.978 0.9995
KHM 0.9982 0.9958 0.003 0.987 1.0000 KOR 0.9985 0.9330 0.134 0.441 0.9998
KWT 0.9978 0.9920 0.007 0.971 1.0000 LAO 0.9993 0.9956 0.003 0.987 0.9996
LBN 0.9997 0.9935 0.005 0.979 0.9997 LBR 0.9992 0.9931 0.005 0.978 0.9998
LKA 0.9986 0.9935 0.005 0.978 0.9999 LSO 0.9981 0.9935 0.005 0.979 1.0000
LTU 0.9999 0.9955 0.003 0.989 0.9998 LVA 0.9998 0.9955 0.004 0.987 1.0000
MAR 0.9996 0.9963 0.003 0.989 0.9999 MDA 0.9990 0.9941 0.005 0.979 1.0000
MDG 0.9990 0.9946 0.005 0.976 0.9997 MEX 0.9983 0.9932 0.006 0.976 1.0000
MKD 0.9995 0.9931 0.005 0.977 0.9997 MLI 0.9999 0.9964 0.003 0.984 1.0000
MMR 0.9964 0.9930 0.005 0.984 0.9999 MNG 0.9994 0.9954 0.004 0.986 0.9999
MOZ 0.9988 0.9942 0.005 0.982 0.9999 MRT 0.9993 0.9951 0.004 0.977 0.9995
MUS 0.9995 0.9926 0.006 0.970 1.0000 MWI 0.9996 0.9933 0.006 0.972 0.9996
MYS 0.9998 0.9817 0.036 0.785 0.9996 NAM 0.9986 0.9964 0.003 0.987 0.9999
NER 0.9994 0.9957 0.003 0.983 1.0000 NGA 0.9987 0.9946 0.004 0.987 0.9998
NIC 0.9958 0.9912 0.006 0.976 0.9995 NLD 0.9999 0.9797 0.030 0.854 0.9998
NOR 0.9980 0.9916 0.007 0.966 0.9999 NPL 0.9982 0.9951 0.004 0.984 0.9999
NZL 0.9985 0.9901 0.008 0.952 0.9999 OMN 0.9990 0.9948 0.005 0.974 0.9996
Continued . . .
25
Table F.1: Model goodness of fit (continued . . .)
Country GoFβ GoFαi stdαi minαi maxαi Country GoFβ GoFαi stdαi minαi maxαi
PAK 0.9996 0.9930 0.005 0.984 0.9999 PAN 0.9998 0.9946 0.005 0.975 0.9998
PER 0.9987 0.9944 0.005 0.974 0.9997 PHL 0.9997 0.9945 0.004 0.984 0.9996
POL 0.9985 0.9947 0.004 0.978 0.9998 PRT 0.9998 0.9940 0.004 0.982 1.0000
PRY 0.9998 0.9953 0.004 0.986 1.0000 QAT 0.9990 0.9937 0.004 0.983 0.9990
RUS 0.9997 0.9935 0.005 0.981 1.0000 RWA 0.9984 0.9960 0.004 0.983 1.0000
SAU 0.9998 0.9957 0.003 0.986 0.9998 SDN 0.9975 0.9938 0.005 0.978 0.9996
SEN 0.9994 0.9961 0.003 0.985 1.0000 SGP 0.9959 0.9665 0.073 0.624 0.9993
SLE 0.9974 0.9930 0.005 0.979 1.0000 SLV 0.9990 0.9952 0.004 0.985 0.9998
SVK 0.9998 0.9940 0.004 0.981 0.9999 SVN 0.9996 0.9945 0.004 0.984 0.9995
SWE 0.9986 0.9786 0.036 0.794 0.9998 SWZ 0.9996 0.9953 0.003 0.986 0.9998
TCD 0.9988 0.9938 0.004 0.981 0.9995 TGO 0.9998 0.9956 0.003 0.988 0.9999
THA 0.9988 0.9944 0.004 0.984 0.9999 TJK 0.9969 0.9937 0.006 0.975 0.9999
TKM 0.9961 0.9803 0.031 0.801 0.9997 TUN 0.9999 0.9944 0.004 0.980 0.9999
TUR 0.9985 0.9927 0.006 0.971 0.9999 TZA 0.9992 0.9951 0.004 0.980 0.9990
UGA 0.9999 0.9950 0.005 0.980 1.0000 UKR 0.9990 0.9935 0.006 0.975 0.9999
URY 0.9972 0.9933 0.005 0.981 0.9991 USA 0.9975 0.9823 0.026 0.888 1.0000
UZB 0.9990 0.9931 0.005 0.980 0.9999 VEN 0.9975 0.9926 0.006 0.975 0.9999
VNM 0.9995 0.9955 0.004 0.981 0.9999 ZAF 0.9988 0.9943 0.005 0.980 0.9998
ZMB 0.9983 0.9953 0.004 0.981 0.9999 ZWE 0.9985 0.9928 0.006 0.976 0.9997
G Validation
In computational simulation models, validation can be tackled at multiple levels. The work of
Carley (1996) is a classic reference on this topic. In the context of the model developed in this
paper, Castañeda et al. (2018); Guerrero and Castañeda (2020a) present several levels of validation
that are consistent with Carley’s view. In this appendix, we present a new validation exercise
that highlights the importance of the behavioral component of the model in order to produce
governance-related outcomes that are consistent with an independent data source.
As explained in the main text, policymaking agents determine a contribution level Ci ≤ Pi
every period. This means that the resources Di = Pi − Ci are diverted for a private gain. Our
main interpretation for such diversions is corruption, a central challenge of the international public
governance agenda (World Bank, 2017; Izquierdo et al., 2018; OECD, 2019). Our validation pro-
cedure consists of demonstrating that the model is capable of accurately reproducing international
empirical patterns of corruption. Importantly, the calibration of the model does not intend to
optimize the parameters in order to generate such patterns. In fact, it cannot do it because the
26
model is calibrated for each country individually. Hence, by showing that the model’s endogenous
variable Di reproduces the empirical distribution of corruption across countries (from an indepen-
dent dataset) provides a strong case for external validity. Furthermore, by showing the sensitivity
of these results to modifications in the learning model, we also provide evidence of internal validity.
First, let us define the endogenous variable of corruption for a single simulation as
T n
1 XX
D= Pi,t − Ci,t . (19)
B t
i
For a set of M independent Monte Carlo simulations, the expected level of corruption is
M
1 X
D̄ = (1 − Dm ), (20)
M m
where we have applied the complement operator 1−Dm so that D̄ denotes better outcomes through
higher values.
We are interested in testing if D̄ correlates with an empirical indicator of corruption across
countries. Notice that the SDR dataset contains Transparency International’s Perception of Cor-
ruption Index for the 140 countries analyzed in the paper. We intentionally left this indicator out of
our study, since it is redundant with our endogenous variable of corruption. Thus, since it contains
data that was not used to calibrate the model, we can exploit it to test model’s validity.
Panel (a) in Figure G.1 shows a high Pearson correlation between the empirical indicator of
corruption and the one generated by the model (D̄) from 10000 Monte Carlo simulations. Remember
that the model has been calibrated for each country individually, so this cross-sectional match is not
the result of any fitting procedure. Next, recall that the agent’s learning process may be influenced
by two parameters related to public governance: the quality of monitoring and the quality of the
rule of law. The former affects the probability of being caught diverting funds. The latter sets
the size of the penalty incurred when caught. Both parameters are taken from the Worldwide
Governance Indicators, and are known to be strongly correlated to corruption indices. Therefore,
in the remaining panels of the Figure G.1, we show that the strong correlation between the empirical
and the simulated corruption variables is not trivially driven by the data on public governance.
First, we remove the learning model from the policymaking agents, and replace it with random
27
Figure G.1: Validation via corruption/inefficiencies
(a) Full model (b) No learning (c) Random governance
0.625 0.50015 = -0.06 0.5450

pval=0.45
model's corruption output

0.600 0.50010 0.5425
= 0.97 0.50005 0.5400
0.575 pval=0.00
0.550 0.50000 0.5375
0.525 0.49995 0.5350

0.49990 0.5325
0.500 = 0.65
0.475
0.49985 0.5300 pval=0.00
0.49980 0.5275
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
perception of corruption index perception of corruption index perception of corruption index
Notes: Panel (a) is obtained from the model presented in the main text. Panel (b) is the result of removing the
learning component of the model and replacing it with random choices of Ci . Panel (c) results from having the
learning component, but replacing the public governance parameters (obtained from empirical data) by random
values.
Sources: Perception of Corruption Index of Transparency International and authors’ own calculations.
choices of Ci from a uniform distribution in [0, Pi ]. Panel (b) shows that the correlation is entirely
lost. Next, let us put back the learning model, but not the empirical parameters of monitoring and
rule of law. Instead, the probability of being caught and the corresponding penalty are determined
every period through a uniform random draw in [0,1]. Panel (c) shows that a substantial portion of
the correlation is recovered through this procedure. This is an intriguing result because it suggests
that, even without empirical data on public governance, the model is able to produce a cross-
sectional distribution of corruption that resembles the empirical one. We believe that the reason
why this happens is that the SDG indicators contain implicit information about the efficiency with
which the resources are being used. This information is distilled into the model when calibrating
its parameters. Once calibrated, the learning model is sensitive to this information through the
proficiency component of the policymakers’ benefit function, something that we find remarkable.
H Robustness to the disbursement schedule
In the calibration procedure we assume that all the indicators reach their final values in T = 50
simulation periods (the disbursement schedule). This implies that the disbursement schedule is
being mapped to the number of years covered in the dataset. However, there exists the possibility
that the simulation results could be biased by this assumption. Hence, as a robustness test, we
modify the number of disbursement periods to 25 and to 100. In Figure H.1, we analyze if there
28
are significant changes in the average gaps estimated for 2030 when using these two new schedules,
in relation to the outcomes derived from the benchmark simulation from the main text. The
differences in estimated SDG gaps are minimal, hence our results are not sensitive to the chosen
disbursement schedule. An identical appraisal is obtained when looking at average SDG gaps in
specific countries. Here, we compare the simulation results presented in Figure 5 for T = 50, in
the main text, with those in Figures H.2 and H.3 for T = 25 and T = 100, respectively. These
differences, in absolute terms, are extremely narrow, as indicated by the calculation presented in
in Figure H.4. Even for the most sensitive indicators, the discrepancies are rather small, as shown
by the numbers in the colored squares.
Figure H.1: Distribution of differences in the 2030 SDG gaps under different disbursement schedules
(a) T = 25 (b) T = 100

3000 4000
2500 3500
3000
2000
frequency
2500
frequency
1500 2000
1000 1500
1000
500
500
0 0
3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0
difference in SDG gap (%) difference in SDG gap (%)
Notes: The difference is with respect to the benchmark estimations of T = 50.
29
Figure H.2: Average SDG gaps by country under 25 disbursements
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Notes: The bars denote the average SDG gap (across indicators) for each individual country. The dots correspond
to the 10 indicators with the largest estimated gaps. Each dot is colored according to the corresponding SDG of its
indicator. For precise estimates and confidence intervals of each individual indicator gap, see the data provided in
http://github.com/oguerrer/SDG_feasibility.
30
Figure H.3: Average SDG gaps by country under 100 disbursements
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
31
Figure H.4: Robustness to different disbursement schedules
AGO 1.07 1.10 AFG 0.61 0.79 ARG 0.52 0.53 AUS 0.55 0.34
BDI 0.33 0.69 ALB 0.59 1.20 BOL 0.75 0.75
BEN 0.39 0.34 ARM 1.45 1.26 AUT 0.38 0.53
BFA 0.40 0.81 BRA 0.70 0.60
BEL 0.47 0.55
AZE 0.54 1.19
CHL 0.65 0.64
BWA 0.81 0.45 BGR 0.68 1.06 CAN 0.94 0.81
CAF 0.44 0.49 BIH 0.67 0.39 COL 0.38 1.15
CIV 0.53 0.71 BLR 0.54 0.52 CRI 0.56 0.46 CHE 0.27 0.58
CMR 0.81 1.03 CYP 0.78 0.61 DOM 0.56 1.24 CZE 0.78 0.58
COG 0.94 2.28 GEO 0.53 1.21 ECU 0.83 1.12 DEU 0.51 0.82
ERI 0.18 0.50 HRV 0.36 0.56 GTM 0.57 0.79
GAB 0.57 1.87 DNK 0.26 0.50
KAZ 0.25 0.61 HND 0.69 1.10
GHA 0.62 0.82 KGZ 1.26 1.13 HTI 0.33 0.65 ESP 0.45 0.41
GIN 0.68 0.92 MDA 0.62 1.47 EST 0.43 0.53
GMB 0.29 0.84 JAM 0.57 0.70
MKD 0.34 0.59
MEX 0.83 1.36 FIN 0.41 0.53
GNB 0.34 0.62
RUS 0.25 0.48
KEN 0.60 0.77
TJK 0.77 0.69 NIC 0.74 0.87 FRA 0.49 0.66
LBR 0.32 0.67
TKM 1.11 1.90 PAN 0.62 1.14 GBR 0.37 0.53
LSO 1.02 1.49
TUR 0.50 1.03 PER 0.56 1.02
MDG 0.23 0.59 GRC 0.75 0.63
MLI 0.26 0.61 UKR 0.92 1.28 PRY 0.84 1.30
HUN 1.20 1.03
MOZ 0.31 0.67 UZB 1.23 1.80 SLV 0.55 1.17
BGD 0.62 1.18 URY 0.51 0.38 IRL 0.57 0.58
MRT 0.39 0.48
CHN 0.38 0.67
MUS 0.47 0.96 VEN 0.33 1.03 ISR 0.54 0.39
MWI 0.25 0.62 IDN 0.85 0.78
ARE 0.42 0.90 ITA 0.73 0.49
NAM 0.51 1.01 IND 0.52 0.62
BHR 0.39 0.55
JPN 0.43 0.33 LTU 0.40 1.03
NER 0.34 0.47 DZA 0.62 1.07
NGA 0.65 1.10 KHM 0.64 0.77 LVA 0.70 0.91
RWA 0.74 0.86 KOR 0.36 0.28 EGY 1.07 1.08
NLD 0.30 0.82
SDN 0.50 0.81 LAO 0.37 0.59 IRN 0.57 0.56
LKA 0.76 1.41 IRQ 0.48 1.10 NOR 0.22 0.63
SEN 0.23 0.62
SLE 0.28 0.50 MMR 0.36 0.79 JOR 0.71 1.02 NZL 0.31 0.39
SWZ 1.98 2.92 MNG 0.24 0.83 KWT 0.28 0.30 POL 0.67 0.87
TCD 0.73 0.73 MYS 0.90 2.15 LBN 0.31 0.64 PRT 0.55 0.51
TGO 0.33 0.64 NPL 0.57 1.00
MAR 0.58 1.01
TZA 0.34 0.94 PAK 0.41 0.67 SVK 0.56 0.99
UGA 0.59 0.82 PHL 0.49 0.73 OMN 0.26 0.29
SVN 0.73 0.62
ZAF 0.50 0.58 SGP 0.50 0.20 QAT 0.63 0.38
SAU 0.46 0.85 SWE 0.34 0.34
ZMB 0.86 0.64 THA 0.15 0.42
ZWE 0.63 0.79 VNM 0.56 0.72 TUN 0.70 1.11 USA 0.27 0.46
0 0.1 0.2 0.3 0 0.1 0.2 0.3 0 0.1 0.2 0.3 0 0.1 0.2 0.3
average absolute difference in terms of SDG gap (%)
Notes: The bars indicate the average absolute difference in estimated gaps (in percentage) between the benchmark
case and one where the model was calibrated with a different disbursement schedule (different number of simulation
periods T ). The dark bars are calculated using T = 25. The light bars are computed with T = 100 time series.
The comparison benchmark corresponds to T = 50. The solid squares on the right of each panel denote the color
of the SDG to which the most sensitive indicator belongs in the case of differences using T = 25. The hollow ones
correspond to T = 100. Their magnitudes are indicated inside each square.
32
I Robustness to shorter time series
We perform a second test to show that our results are robust when re-calibrating the model with sub-
samples of the data. This is important because one may argue, for example, that investments done
during the 2000-10 decade must have produced structural changes, and this should be reflected in
better performance of the indicators during the 2010-20 decade. Thus, if the model is re-calibrated
using 2010-20 data, its structural parameters αi should induce faster indicator dynamics. These
new parameters, in turn, should produce SDG gap predictions that are substantially different from
the ones reported in the main text.
We explore this line of reasoning by using the most recent 5 or 10 years, instead of the 21 included
in the database. With these shorter time series, we re-estimate the network of interlinkages and
re-calibrate the model. Then, for each SDG gap (of each indicator and country), we compute the
difference between the original estimation (the one in the main text) and the one obtained from a
more recent sub-sample (recall that the units of the SDG gaps are in percentage with respect to
the goal).
Figure I.1: Distribution of differences in the 2030 SDG gaps under shorter time series
(a) Ten years (b) Five years

4000
3500 3500
3000 3000
2500 2500
frequency
frequency
2000 2000
1500 1500
1000 1000
500 500
0 0
40 30 20 10 0 10 20 30 40 40 20 0 20 40
difference in SDG gap (%) difference in SDG gap (%)
Notes: The difference is with respect to the benchmark estimations of 21 years of data.
Panels (a) and (b) in Figure I.1 show the differences in the SDG gaps between the benchmark
from the main text and the one obtained from shorter time series. These differences are computed
at the level of each indicator of each country. As suggested by the highly zero-mean concentrated
33
histograms, these gaps show no significant differences, suggesting modest structural changes in the
long-term structural components of the model during the last two decades. Then, when comparing
the average gaps at the country level using the full sample–see Figure 5 in the main text–with
those obtained with reduced samples (Figures I.2 and I.3), no notorious difference emerge. Thus, if
there were substantial structural improvements in last decades, these should be discernible through
significantly smaller SDG gaps for the reduced datasets. Since this is not the case, our assumption of
capturing long-term structural factors in α1 , . . . , αN and our choice of 21 years of data are justified.
34
Figure I.2: Average SDG gaps by country inferred from 10 years of data
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
35
Figure I.3: Average SDG gaps by country inferred from 5 years of data
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
36
References
Aragam, B., Gu, J., and Zhou, Q. (2019). Learning Large-Scale Bayesian Networks with the
Sparsebn Package. Journal of Statistical Software, 91(11).
Carley, K. (1996). Validating Computational Models. Working Paper. CASOS Program, Pittsburgh,
PA.
Castañeda, G., Chávez-Juárez, F., and Guerrero, O. (2018). How Do Governments Determine
Policy Priorities? Studying Development Strategies through Networked Spillovers. Journal of
Economic Behavior & Organization, 154:335–361.
de Sousa, J., Mayer, T., and Zignago, S. (2012). Market Access in Global and Regional Trade.
Regional Science and Urban Economics, 42(6):1037–1052.
de Wolff, T., Cuevas, A., and Tobar, F. (2021). MOGPTK: The Multi-Output Gaussian Process
Toolkit. Neurocomputing, 424:49–53.
Dhami, S. (2016). The Foundations of Behavioral Economic Analysis. Oxford Univeristy Press,
Oxford.
Efron, B. (1981). Censored Data and the Bootstrap. Journal of the American Statistical Associa-
tion, 76(374):312–319.
Gaulier, G. and Zignago, S. (2010). BACI: International Trade Database at the Product-Level.
Technical Report 2010-23, CEPII.
Guerrero, O. and Castañeda, G. (2020a). Policy Priority Inference: A Computational Framework
to Analyze the Allocation of Resources for the Sustainable Development Goals. Data & Policy,
2.
Guerrero, O. and Castañeda, G. (2020b). Quantifying the Coherence of Development Policy Pri-
orities. Development Policy Review, 00:1–26.
Guerrero, O. and Castañeda, G. (2021). Does expenditure in public governance guarantee less
corruption? Large non-linearities and complementarities of the rule of law. Economics of Gov-
ernance, forthcoming.
37
Izquierdo, A., Pessino, C., and Vuletin, G., editors (2018). Better Spending for Better Lives: How
Latin America and the Caribbean Can Do More with Less. Inter-American Development Bank.
OECD (2019). Governance as an SDG Accelerator: Country Experiences and Tools. OECD Pub-
lishing.
Ospina-Forero, L., Castañeda Ramos, G., and Guerrero, O. (2020). Estimating Networks of Sus-
tainable Development Goals. Information and Management.
World Bank (2017). World Development Report 2017: Governance and the Law. International
Bank for Reconstruction and Development / The World Bank, Washington, D.C.
38

SSRN Id3800218

Uploaded by

Copyright:

Available Formats

SSRN Id3800218

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id3800218

Uploaded by

Copyright:

Available Formats

How Does Government Expenditure Impact Sustainable

Studying the Multidimensional Link between Budgets and

Omar A. Guerrero1,2 and Gonzalo Castañeda3

any international development agenda.

erates development advancement (with various degrees of effectiveness). A bottom-up approach

to changes in public expenditure.

expenditure and existing development strategies.5

simultaneous and interdependent–evolution of a large set of development indicators. The model

2.1 Model description

B resources across n policymakers in order to improve the N indicators. Policymakers, however,

as Ci ; we say that the latter is the contribution of the policymaker.

case it spots inefficiencies.

approach overcomes concerns about biases from grouping countries or indicators.

and establish a clear differentiation between short/mid-term and long-term dynamics.

Algorithm 1: Model pseudocode

13 the government monitors the policymakers through imperfect mechanisms;

INTERVENTIONS POLITICAL ECONOMY OUTCOMES

Institutional reforms: Development-indicator

MICRO LEVEL functionaries’

Now, let us define the evolution of indicator i as

Ii,t+1 = Ii,t + αi ξ(γi,t ), (1)

endogenous variable of the model, so we proceed to explain how it is formed.

assume that the disbursement schedule is homogeneous, so Bt = B ∀ t. Next, consider the

dynamics of the other indicators in the previous period.

As we have previously explained, the structure of the interdependencies between indicators is

be false positives (hence the “sparse” term in the name).13

which tends to use pooled data.14

we need to find the parameter vector α1 , . . . , αN , β that minimizes an error measure.

accelerates the convergence rate significantly.

2.3.1 Goodness of fit

For a single indicator i, the goodness of fit of its corresponding parameter αi is

GoFαi . This is not the case for our algorithm.

is the rate of positive first-differences. Formally, the goodness of fit of β is

The overall goodness of fit for a country is the average

Figure 2: Distribution of goodness of fit metrics

(a) Country-level GoF (b) Indicator-level GoF (c) GoFβ

2.4 Definition of SDG gaps

gap has been closed. Formally, an SDG gap is

of data-fitting approaches. Appendix B provides detailed information on the 77 indicators of our

sample, and their distribution across countries.

3 provides a map of the countries covered in our sample.

can be accessed through data.worldbank.org/indicator/NE.CON.GOVT.KD). This information

population size reported by the SDR).

4.1 SDG gaps

Chad (TCD) in Africa.

sdg dg16_ _redli pta

sdg dg16_ _redli pta

sdg g2_w dernsh

sdg g3_b wastin sh

sd g15 iv sd g15 iv eg sd g15 iv eg

sdg dg5_lf parl

sdg g8_yn icles e

sdg dg5_lf parl p

sdg 5_lf arl

sdg g5_pa san

(d) LAC (e) MENA (f) West

sdg dg16 _redli pta

sdg g2_w dernsh

sdg g3_b wastin sh