Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\automark

[section]section \setkomafontpageheadfoot \setkomafontpagenumber \clearpairofpagestyles \cohead\xrfill[0.525ex]0.6pt \theshorttitle \xrfill[0.525ex]0.6pt \cehead\xrfill[0.525ex]0.6pt \theshortauthor \xrfill[0.525ex]0.6pt \cfoot*\xrfill[0.525ex]0.6pt \pagemark \xrfill[0.525ex]0.6pt

Bayesian material flow analysis for systems with multiple levels of disaggregation and high dimensional data

Abstract

Abstract. Material Flow Analysis (MFA) is used to quantify and understand the life cycles of materials from production to end of use, which enables environmental, social and economic impacts and interventions. MFA is challenging as available data is often limited and uncertain, leading to an underdetermined system with an infinite number of possible stocks and flows values. Bayesian statistics is an effective way to address these challenges by principally incorporating domain knowledge, and quantifying uncertainty in the data and providing probabilities associated with model solutions.

This paper presents a novel MFA methodology under the Bayesian framework. By relaxing the mass balance constraints, we improve the computational scalability and reliability of the posterior samples compared to existing Bayesian MFA methods. We propose a mass based, child and parent process framework to model systems with disaggregated processes and flows. We show posterior predictive checks can be used to identify inconsistencies in the data and aid noise and hyperparameter selection. The proposed approach is demonstrated on case studies, including a global aluminium cycle with significant disaggregation, under weakly informative priors and significant data gaps to investigate the feasibility of Bayesian MFA. We illustrate just a weakly informative prior can greatly improve the performance of Bayesian methods, for both estimation accuracy and uncertainty quantification.

Keywords. Material flow analysis \bullet Bayesian statistics \bullet Probabilistic modelling \bullet Uncertainty quantification \bullet Missing data \bullet Circular Economy

11footnotetext: Department of Civil and Environmental Engineering, Imperial College London, United Kingdom. junyang.wang21@imperial.ac.uk, r.myers@imperial.ac.uk22footnotetext: Department of Earth Science and Engineering, Imperial College London, United Kingdom. p.brito-parada@imperial.ac.uk,y.plancherel@imperial.ac.uk, john.morley18@imperial.ac.uk33footnotetext: Department of Mathematics, Imperial College London, London, United Kingdom. kolyan.ray@imperial.ac.uk44footnotetext: Department of Civil, Environmental and Geomatic Engineering, University College London, London, United Kingdom. j.stegemann@ucl.ac.uk55footnotetext: British Geological Survey, United Kingdom. tode@bgs.ac.uk, jmank@bgs.ac.uk

1 Introduction

Increasingly, the world is facing major shifts in resource utilisation. To prevent the worst consequences of climate change, global carbon emissions must be significantly reduced, which requires a fundamental change in how fossil fuels and other high carbon footprint materials are used in the coming decades. Additionally, the instability of supply chains, changing behaviour caused by Covid-19 (Jowitt (2020)), as well as the growing world population, projected to reach almost 10 billion by 2050 (WPP (2022)), will also likely significantly impact resource utilisation globally (Jowitt et al. (2020)). However, many material flows of interest are poorly understood, which severely impedes the ability of policy makers to identify ways to use important materials or resources more efficiently and sustainably, and plan the transition to more sustainable systems of production and use (Mudd (2021); UNE (2020)).

Material flow analysis (MFA) is a broad terminology encompassing any quantitative method used to map and quantify flows and stocks of a material of interest in a well defined system. MFA is applicable to all materials and a range of economic, environmental and social scales, from a company’s supply chain to entire countries and economies. MFA studies have been conducted on metals, including steel and aluminium (Bertram et al. (2009) and Cullen et al. (2012)), non-metals such as glass and concrete (Westbroek et al. (2021) and da Costa Reis et al. (2019)), for entire countries (Matsubae-Yokoyama et al. (2009)), on a global scale (Miatto et al. (2017)) and for multiple materials across whole economies (Fischer-Kowalski et al. (2011). Common to all MFA models is the precise definition of a system, which includes the system boundary, the processes within this system, as well as stock and flow variables representing the storage within and movement between the processes. A physical assumption common to all MFA models is the conservation of mass, where the outflows and inflows of a process must balance with its change of stock.

Typically in static MFA problems it is difficult or impossible to collect data for all the flow and change in stock variables inside the system. This gives rise to an underdetermined system, in the sense that the size of available data combined with physical constraints, containing at least mass balance of processes, is still less than the number of model parameters. For modelling this places MFA datasets in the domain of high dimensional statistics (Lederer (2022)), which is statistically challenging as the number of unknown parameters exceeds the sample size (Wainwright (2019)). Well known statistical methods in this area include ridge regression, the LASSO, Elastic net, as well as Bayesian approaches (Hoerl and Kennard (2000); Tibshirani (1996); Zou and Hastie (2005); Kruschke (2015)).

Bayesian statistics is particularly well suited for MFA for three main reasons. Firstly, Bayesian statistics works naturally in the underdetermined setting by providing a posterior probability over all possible solutions instead of a single solution. Secondly, while commonly there is a lack of data in MFA problems, there is often domain knowledge or expert opinion that can be used to greatly improve the prediction accuracy of the model, and the Bayesian prior distribution provides a natural framework to incorporate such domain knowledge, which is missed in the traditional MFA approach. Thirdly, Bayesian statistics rigorously quantifies uncertainty in the data and propagates it into the posterior distribution. Additionally, authors including Brunner and Rechberger (2004) and Lupton and Allwood (2018) argued for an incremental approach to MFA, where the system diagram and data is continuously refined and improved until the required level of data certainty has been achieved. The Bayesian paradigm naturally faciliates this through iterative learning of models as new data becomes available by rerunning the model with new data, since the posterior distribution from a previous analysis can be interpreted as the prior distribution for the subsequent analysis.

Non Bayesian approaches to uncertainty quantification in MFA include works such as Laner et al. (2015), Schwab et al. (2016a), Schwab and Rechberger (2017). These approaches focus on qualitative and quantitative methods of assigning confidence onto data or mass balanced flows, but do not provide a modelling procedure that combines prior belief with data and mass balance to produce flow estimates, nor a framework for propagating uncertainty. Bayesian inference provides a modelling procedure that is able to propagate uncertainty, and a rigorous, probabilistic interpretation of uncertainty. The popular software ‘STAN’ (Cencic (2016)) uses least squares minimisation to conduct data reconciliation of MFA systems. However, STAN requires nonlinear data to be approximated by first order Taylor expansion, and a normality assumption is made on flows, which can permit negative flow values that are not physically meaningful.

1.1 Previous work on Bayesian approaches to MFA

The use of Bayesian statistics in MFA is still relatively limited. Perhaps the earliest use of Bayesian statistics in MFA is by Gottschalk et al. (2010), where the authors proposed a model where the mass balance equations are parametrised using specific flow ratios (known as transfer coefficients) in the system, and tested the model on a case study of flows of concentrations of nanoparticles in Switzerland using simulated data. In Lupton and Allwood (2018), the authors used the same mass balance parametrisation as Gottschalk et al. (2010), but with Dirichlet priors on the transfer coefficients rather than uniform or triangular priors, using a HMC (Hamiltonian Monte Carlo) based sampler called the NUTS (No-U-Turn Sampler) algorithm to sample from the posterior, and conducted a case study mapping global steel flows. Recently, Dong et al. (2023) proposed a method which combines prior information from multiple experts in the MFA setting, demonstrating it using the mass balance parametrisation of Gottschalk et al. (2010) and Lupton and Allwood (2018). However, when working with the model of Lupton and Allwood (2018) in practice, we found that many divergent samples were produced with the NUTS algorithm, suggesting the posterior samples have not converged. Dong et al. (2023) also encountered divergent samples in their study when using HMC, and opted to use Sequential Monte Carlo (SMC) instead to attempt to circumvent this problem. However, it is unclear that the samples from SMC are better converged than the samples from HMC, since SMC does not have access to the same divergent sample diagnostics as HMC. Furthermore, SMC appears to significantly increase computational time.

Cencic and Frühwirth (2015) developed a linear Bayesian data conciliation method that can be applied to MFA systems, and a subsequent paper Cencic and Frühwirth (2018) extended this method to include nonlinear constraints. These methods instead parametrise the model directly in terms of the flow variables, by partitioning the set of flow variables in the system into ‘free variables’ 𝒗𝒇subscript𝒗𝒇\bm{v_{f}}bold_italic_v start_POSTSUBSCRIPT bold_italic_f end_POSTSUBSCRIPT and ‘dependent variables’ 𝒗𝒅subscript𝒗𝒅\bm{v_{d}}bold_italic_v start_POSTSUBSCRIPT bold_italic_d end_POSTSUBSCRIPT, where the mass balance equations can be expressed as a relationship between the free variables and the dependent variables. For example, in the linear case, 𝒗𝒅=D𝒗𝒇𝒅subscript𝒗𝒅𝐷subscript𝒗𝒇𝒅\bm{v_{d}}=-D\bm{v_{f}}-\bm{d}bold_italic_v start_POSTSUBSCRIPT bold_italic_d end_POSTSUBSCRIPT = - italic_D bold_italic_v start_POSTSUBSCRIPT bold_italic_f end_POSTSUBSCRIPT - bold_italic_d can be obtained via Gaussian elimination for some constant matrix D𝐷Ditalic_D and vector 𝒅𝒅\bm{d}bold_italic_d. In both papers the methodology was tested on low dimensional, simulated examples, with Metropolis Hastings (MH) used as the sampling algorithm on the free variables, with the prior distribution f(𝒗𝒇))f(\bm{v_{f})})italic_f ( bold_italic_v start_POSTSUBSCRIPT bold_italic_f end_POSTSUBSCRIPT bold_) ) of the free variables as the proposal distribution. However, even in low dimensional examples it was reported in Cencic and Frühwirth (2015) that the MH sampler can have a very low acceptance probability. Therefore it is unclear whether the method works well in high dimensional, underdetermined systems with hundreds of flow and change in stock variables, which is important since these systems are typical in MFA. In high dimensions, it becomes increasingly unlikely for proposals from the MH sampling algorithm to satisfy both the mass balance conditions and non negativity of flow variables. To see this, 𝒗𝒅=D𝒗𝒇𝒅subscript𝒗𝒅𝐷subscript𝒗𝒇𝒅\bm{v_{d}}=-D\bm{v_{f}}-\bm{d}bold_italic_v start_POSTSUBSCRIPT bold_italic_d end_POSTSUBSCRIPT = - italic_D bold_italic_v start_POSTSUBSCRIPT bold_italic_f end_POSTSUBSCRIPT - bold_italic_d does not guarantee every component of 𝒗𝒅subscript𝒗𝒅\bm{v_{d}}bold_italic_v start_POSTSUBSCRIPT bold_italic_d end_POSTSUBSCRIPT will be positive for arbitrary fixed D𝐷Ditalic_D and 𝒅𝒅\bm{d}bold_italic_d. As the dimension of 𝒗𝒅subscript𝒗𝒅\bm{v_{d}}bold_italic_v start_POSTSUBSCRIPT bold_italic_d end_POSTSUBSCRIPT increases, it becomes increasingly likely that randomly sampled proposal 𝒗𝒇subscript𝒗𝒇\bm{v_{f}}bold_italic_v start_POSTSUBSCRIPT bold_italic_f end_POSTSUBSCRIPT during MH will lead to at least one component of 𝒗𝒅subscript𝒗𝒅\bm{v_{d}}bold_italic_v start_POSTSUBSCRIPT bold_italic_d end_POSTSUBSCRIPT to be negative, causing the proposal to be in a region of zero posterior probability and the MCMC algorithm to be stuck at the current value. More generally, sampling from constrained posteriors is known to be challenging (Lan and Kang (2023)), and HMC also have difficulties when encountering regions of zero posterior probability formed by the constraints (Hoffman and Gelman (2014)).

1.2 Scope of paper

This paper continues the development of Bayesian methodology for material flow analysis. To address the aforementioned computational issues, we relax the mass balance conditions via a noise term. This reduces regions of zero posterior probability in the parameter space, making the posterior easier to sample using MCMC algorithms. We show computationally with an aluminium cycle case study that this leads the NUTS sampling algorithm to converge well in high dimensions. Simultaneously, the noise term can be interpreted as a way of modelling epistemic uncertainty in the system, which is likely to be present as MFA systems are simplified rather than perfect representations of reality (Schwab et al. (2016a, b)). It is similar to the concept of ‘phantom flows’, which is used to account for unexplainable mass imbalances in MFA studies (e.g. Reck et al. (2008) ). The variance of the noise term can be chosen to be small so good approximate mass balance is still achieved when there is high confidence in the system definition.

We introduce a child and parent process parametrisation framework to model systems with multiple layers of disaggregation in processes and flows, which is a common feature in material flow datasets (Myers et al. (2019a) and Myers et al. (2019b)) but has not been considered in previous Bayesian MFA studies. To this end we assign priors directly on flow mass and change in stock variables in the material system, while retaining the ability to incorporate ratio data between arbitrary flows. We demonstrate our method on a high dimensional aluminium material flow system where change in stock and disaggregation of processes and flows are simultaneously present.

We illustrate how Bayesian posterior predictive checks can reveal inconsistencies in the data and aid hyperparameter selection. We also show the posterior distribution can inform data collection strategies by identifying which flow or changes in stock variables in the system retain the most uncertainty.

Previous work have not investigated under what conditions estimates and uncertainty quantification produced by Bayesian MFA are reliable. This is important as strong theoretical guarantees which hold in low-dimensional parametric models, such as the Bernstein-von Mises theorem (Vaart (1998)), can fail to hold in high-dimensional settings (Johnstone (2010)) such as the present MFA setting. We address this gap by examining how Bayesian MFA performs on a high dimensional aluminium case study under relatively weak assumptions, namely a weakly informative prior and significant data gaps. We also conduct a simulation study on a zinc cycle to examine the estimation accuracy and uncertainty quantification of the posterior distributions of our model from a frequentist perspective, which can be found in the supporting information.

2 Methods

In this section, we present the details of our proposed Bayesian MFA methodology. Our model produces estimates and uncertainty quantification of all flow and change in stock variables of interest in any given material flow system, in the form of a posterior distribution, which mathematically combines prior domain knowledge and expert opinion with available data while simultaneously propagating uncertainty.

A MFA analysis begins with the definition of a system diagram, a graph like structure of nodes representing processes𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠processesitalic_p italic_r italic_o italic_c italic_e italic_s italic_s italic_e italic_s, where the material of interest can be stored as stock, and edges representing flows𝑓𝑙𝑜𝑤𝑠flowsitalic_f italic_l italic_o italic_w italic_s of the material of interest between processes. Notably however, flows in both directions between any two processes are permitted. The system diagram will also contain a system boundary, which is used to describe the flows between the system and some external environment. This framework is quite broad, and allows processes within a system to not only represent a physical manufacturing process (such as components of a blast furnace), but also examples such as the Earth’s lithosphere, the environment, or various usage outlets like households where the material of interest can be stored or flow in and out of. Similarly, the system’s scope can range from a small supply chain, to a global flow of a metal such as zinc. The level of detail of the system diagram is chosen by the modeller to fit the scope of the problem being examined.

Often in MFA problems, certain processes in the system can be disaggregated into constituent subprocesses (see e.g. Myers et al. (2019b) ). We define a parent𝑝𝑎𝑟𝑒𝑛𝑡parentitalic_p italic_a italic_r italic_e italic_n italic_t process𝑝𝑟𝑜𝑐𝑒𝑠𝑠processitalic_p italic_r italic_o italic_c italic_e italic_s italic_s to be a process which contains subprocesses, and a child𝑐𝑖𝑙𝑑childitalic_c italic_h italic_i italic_l italic_d process𝑝𝑟𝑜𝑐𝑒𝑠𝑠processitalic_p italic_r italic_o italic_c italic_e italic_s italic_s be a process which contains no subprocesses instead. By definition, parent and child processes form a partition of the set of all processes in the system. The MFA practitioner should decide the level of disaggregation of each parent process according to their requirements, but parent processes where data is only available on some of its child processes may still be worth disaggregating to incorporate additional data into the model.

The parent and child process structure is useful for modelling the disaggregation of processes. To see this, we assume the stocks and flows of any parent process can be expressed as a linear combination of its constituent child processes. Under this parent and child process framework, multiple levels of disaggregation of processes can be reduced to just two levels (the set of parent processes and the set of child processes), which greatly simplifies modelling of the MFA system. A simple example illustrating the parent and child process framework can be found in the supplementary information.

2.1 Formulation of the physical model

Suppose there are m𝑚mitalic_m child processes in the system, indexed by P0,P1,Pm1subscript𝑃0subscript𝑃1subscript𝑃𝑚1P_{0},P_{1},\dots P_{m-1}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … italic_P start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT. Let si(t)subscript𝑠𝑖𝑡s_{i}(t)italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) to be the stock variable associated with the process Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, denoting the amount of stock in process Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at time t𝑡titalic_t. In practice material flow data are typically recorded as the total amount of flow over a period time (such as on a monthly or yearly basis), so let Uj,isubscript𝑈𝑗𝑖U_{j,i}italic_U start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT represents the total amount of flow of the material of interest from process j𝑗jitalic_j to process i𝑖iitalic_i during the time period tΔt𝑡Δ𝑡t-\Delta titalic_t - roman_Δ italic_t to t𝑡titalic_t. Here ΔtΔ𝑡\Delta troman_Δ italic_t represents the period during which the total amount of flow was reported, which can for example be in months or years. Note typically in MFA systems not all processes necessarily possess a stock variable, for example if the physical process which it is modelling does not contain a physical stock of the material of interest. Similarly, most processes will not receive flows from or flow to every other process, so the notation introduced in this paragraph such as sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Uj,ksubscript𝑈𝑗𝑘U_{j,k}italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT are understood to be over existing stocks and flow variables only. For each process Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we assume mass is conserved between its stock, inflows and outflows over the time period tΔt𝑡Δ𝑡t-\Delta titalic_t - roman_Δ italic_t to t𝑡titalic_t, which we formulate as:

Si=si(t)si(tΔt)=jUj,ikUi,ksubscript𝑆𝑖subscript𝑠𝑖𝑡subscript𝑠𝑖𝑡Δ𝑡subscript𝑗subscript𝑈𝑗𝑖subscript𝑘subscript𝑈𝑖𝑘S_{i}=s_{i}(t)-s_{i}(t-\Delta t)=\sum_{j}U_{j,i}-\sum_{k}U_{i,k}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t - roman_Δ italic_t ) = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT (1)

The left hand side Si=si(t)si(tΔt)subscript𝑆𝑖subscript𝑠𝑖𝑡subscript𝑠𝑖𝑡Δ𝑡S_{i}=s_{i}(t)-s_{i}(t-\Delta t)italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t - roman_Δ italic_t ) is simply the change in stock during the time period t𝑡titalic_t to tΔt𝑡Δ𝑡t-\Delta titalic_t - roman_Δ italic_t. In this paper we only consider stationary models at a snapshot of time t𝑡titalic_t, so the variables of interest in the model are the Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Uj,ksubscript𝑈𝑗𝑘U_{j,k}italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT. For more compact notation, let 𝑺𝑺\bm{S}bold_italic_S be the vector of q𝑞qitalic_q change in stock variables Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the child processes in the system, and 𝑼𝑼\bm{U}bold_italic_U a vector of pq𝑝𝑞p-qitalic_p - italic_q flow variables Uj,ksubscript𝑈𝑗𝑘U_{j,k}italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT of the child processes in the system (for a total of p𝑝pitalic_p variables). Note because flows and stocks changes of parent processes can be expressed linearly in terms of its constituent child processes, conservation of mass for all child processes automatically implies conservation of mass for all parent processes. Similarly, Bayesian modelling only needs to be conducted on the child processes, as the posterior samples on flow and stocks change variables of parent processes can be obtained by simply summing the posterior samples of the constituent child processes.

2.2 Data structure

Data in MFA can in principle be any arbitrary function f(𝑺,𝑼)𝑓𝑺𝑼f(\bm{S},\bm{U})italic_f ( bold_italic_S , bold_italic_U ) of the variables of interest 𝑺𝑺\bm{S}bold_italic_S and 𝑼𝑼\bm{U}bold_italic_U. However, data and mass balance conditions typically come in the following four forms, which we can express in terms of 𝑺𝑺\bm{S}bold_italic_S and 𝑼𝑼\bm{U}bold_italic_U:

1. Observations of changes in stock of child processes or observations of flows between child processes. For example, in Figure 2, S0=37.2subscript𝑆037.2S_{0}=-37.2italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - 37.2 Mt could be used to describe the change in stock of the ‘Reserves’ process, while the flow from ‘Reserves’ to ‘Mining’ can be described by U0,1=37.2subscript𝑈0137.2U_{0,1}=37.2italic_U start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT = 37.2 Mt. Here the process ‘Reserves’ is labelled by P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the process ‘Mining’ by P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

2. Observations of changes in stock of parent processes, or observations of flows between processes where at least one process is a parent, which can be treated as the sum of multiple flows. For example, the combined flow of 9.89.89.89.8 Mt in Figure 2 from ‘WasteManagement’ to ‘Recycling’ consists of ‘WasteManagement’ as the origin process, and the destination processes are ‘Remelting’ and ‘Refining’, two of the subprocesses of ‘Recycling’. This flow can be represented by the sum of flows U31,7+U31,8=9.8subscript𝑈317subscript𝑈3189.8U_{31,7}+U_{31,8}=9.8italic_U start_POSTSUBSCRIPT 31 , 7 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 31 , 8 end_POSTSUBSCRIPT = 9.8 Mt. Here we used P31subscript𝑃31P_{31}italic_P start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT to denote ‘WasteManagement’, and P7,P8subscript𝑃7subscript𝑃8P_{7},P_{8}italic_P start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ‘Remelting’ and ‘Refining’ subprocesses of ‘Recycling’ respectively. So for example, U31,7subscript𝑈317U_{31,7}italic_U start_POSTSUBSCRIPT 31 , 7 end_POSTSUBSCRIPT represents the flow from ‘WasteManagement’ to ‘Remelting’.

3. Conservation of mass. For child process i𝑖iitalic_i this is represented by equation (1).

4. Ratio data between flows or sums of flows, for instance transfer coefficients:

Ui,jkUi,k=αi,jsubscript𝑈𝑖𝑗subscript𝑘subscript𝑈𝑖𝑘subscript𝛼𝑖𝑗\frac{U_{i,j}}{{\sum_{k}U_{i,k}}}=\alpha_{i,j}divide start_ARG italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT end_ARG = italic_α start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT (2)

where αi,jsubscript𝛼𝑖𝑗\alpha_{i,j}italic_α start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is a known transfer coefficient of the flow from process i𝑖iitalic_i to process j𝑗jitalic_j, and is defined as the ratio between the flow from process i𝑖iitalic_i to process j𝑗jitalic_j, divided by the total outflow of process i𝑖iitalic_i. In case studies of this paper we only consider transfer coefficients of processes without a change in stock variable. In general, a change in stock can be split into two flows (flow into and out of stock), which allows transfer coefficients to be calculated. We also consider ratios between aggregated flows as an extension beyond standard transfer coefficients.

Ratio data can also be alternatively parametrised linearly in the following way:

Ui,jαi,jkUi,k=0subscript𝑈𝑖𝑗subscript𝛼𝑖𝑗subscript𝑘subscript𝑈𝑖𝑘0U_{i,j}-\alpha_{i,j}{\sum_{k}U_{i,k}}=0italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = 0 (3)

We note it is possible to conveniently formulate all the most common forms of data and mass balance conditions in MFA in terms of linear relationships between the change in stock variables Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the flow variables Uj,ksubscript𝑈𝑗𝑘U_{j,k}italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT, even when the system contains disaggregation of processes. However, for a more general framework and to demonstrate our model can incorporate nonlinear data as well, we choose to parametrise ratio data as Equation 2 when evaluating the model on the aluminium case study in the Results section.

Using more compact notation, the relationship between the flow and change in stock variables and the data and physical constraints can be represented by the following model:

𝒀=[X𝜽𝑹(𝜽)]+ϵ,𝜽=[𝑺𝑼]formulae-sequence𝒀matrix𝑋𝜽𝑹𝜽bold-italic-ϵ𝜽matrix𝑺𝑼\displaystyle\bm{Y}=\begin{bmatrix}X\bm{\theta}\\ \bm{R}(\bm{\theta})\end{bmatrix}+\bm{\epsilon},\quad\bm{\theta}=\begin{bmatrix% }\bm{S}\\ \bm{U}\end{bmatrix}bold_italic_Y = [ start_ARG start_ROW start_CELL italic_X bold_italic_θ end_CELL end_ROW start_ROW start_CELL bold_italic_R ( bold_italic_θ ) end_CELL end_ROW end_ARG ] + bold_italic_ϵ , bold_italic_θ = [ start_ARG start_ROW start_CELL bold_italic_S end_CELL end_ROW start_ROW start_CELL bold_italic_U end_CELL end_ROW end_ARG ] (4)

where 𝜽𝜽\bm{\theta}bold_italic_θ is a concatenated vector of 𝑺𝑺\bm{S}bold_italic_S and 𝑼𝑼\bm{U}bold_italic_U of length p𝑝pitalic_p, representing all the child flow and change in stock variables the system. 𝒀𝒀\bm{Y}bold_italic_Y is a vector of length n𝑛nitalic_n of observed values, or 00 for mass balance conditions. X𝑋Xitalic_X is a design matrix representing linear data and mass balance conditions, and 𝑹(𝜽)𝑹𝜽\bm{R}(\bm{\theta})bold_italic_R ( bold_italic_θ ) a vector representing nonlinear data. ϵbold-italic-ϵ\bm{\epsilon}bold_italic_ϵ is a random vector (of length n𝑛nitalic_n) representing uncertainty in the data, which could be caused by measurement or rounding errors. Note in the MFA setting, it is often the case that npmuch-less-than𝑛𝑝n\ll pitalic_n ≪ italic_p.

The goal of any Bayesian model is to obtain posterior distributions over the variables of interest, in this case 𝜽𝜽\bm{\theta}bold_italic_θ, in order to perform inference such as point estimation or uncertainty quantification. In the following section we describe how to construct the prior for our model, the form of the likelihood and how to obtain the posterior of 𝜽𝜽\bm{\theta}bold_italic_θ via Bayes’s theorem.

2.3 Bayesian model detail

Bayesian inference is a statistical framework in which Bayes’s theorem, a fundamental result in probability theory, is used to update one’s beliefs regarding parameters of interest 𝜽𝜽\bm{\theta}bold_italic_θ based on new data or evidence 𝒀𝒀\bm{Y}bold_italic_Y. Mathematically the prior distribution p(𝜽)𝑝𝜽p(\bm{\theta})italic_p ( bold_italic_θ ) is used to express one’s belief prior to seeing the data, and the likelihood function p(𝒀|𝜽)𝑝conditional𝒀𝜽p(\bm{Y}|\bm{\theta})italic_p ( bold_italic_Y | bold_italic_θ ) used to express the probability of observing the data. The goal in Bayesian inference is to obtain the posterior distribution p(𝜽|𝒀)𝑝conditional𝜽𝒀p(\bm{\theta}|\bm{Y})italic_p ( bold_italic_θ | bold_italic_Y ), which represents the state of one’s updated beliefs after observing the data. This is done via Bayes’s theorem:

p(𝜽|𝒀)=p(𝜽)p(𝒀|𝜽)p(𝜽)p(𝒀|𝜽)𝑑𝜽𝑝conditional𝜽𝒀𝑝𝜽𝑝conditional𝒀𝜽𝑝𝜽𝑝conditional𝒀𝜽differential-d𝜽p(\bm{\theta}|\bm{Y})=\frac{p(\bm{\theta})p(\bm{Y}|\bm{\theta})}{\int p(\bm{% \theta})p(\bm{Y}|\bm{\theta})d\bm{\theta}}italic_p ( bold_italic_θ | bold_italic_Y ) = divide start_ARG italic_p ( bold_italic_θ ) italic_p ( bold_italic_Y | bold_italic_θ ) end_ARG start_ARG ∫ italic_p ( bold_italic_θ ) italic_p ( bold_italic_Y | bold_italic_θ ) italic_d bold_italic_θ end_ARG (5)

Most posterior distributions do not have a closed form expression due not being possible to evaluate p(𝜽)p(𝒀|𝜽)𝑑𝜽𝑝𝜽𝑝conditional𝒀𝜽differential-d𝜽\int p(\bm{\theta})p(\bm{Y}|\bm{\theta})d\bm{\theta}∫ italic_p ( bold_italic_θ ) italic_p ( bold_italic_Y | bold_italic_θ ) italic_d bold_italic_θ in closed form. Markov Chain Monte Carlo (MCMC) methods are computational algorithms used to sample from distributions that do not have a closed-form, including posterior distributions. MCMC methods construct a Markov Chain that converges to the target distribution, by iteratively sampling from a proposal distribution and only accepting samples that satisfy a suitable criteria that suggests they could be feasibly generated from the target distribution.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Schematic of the Bayesian model (a), example of simple MFA system (b). The circles coloured in grey represent noise parameters or prior hyperparameters, the diamond coloured in green represent the data and the rounded rectangles coloured in blue represent distributions over the variables of interest as well as the model likelihood. Definitions for each variable and parameter in this figure can be found in the Methods section.

For our Bayesian model, we consider normal priors for the change in stock variables Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which reflects the fact that the change in stock can be both positive or negative. For the flow variables Uj,ksubscript𝑈𝑗𝑘U_{j,k}italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT, we consider truncated normal priors for the flow variables on the positive interval [0,L]0𝐿[0,L][ 0 , italic_L ], where L𝐿Litalic_L is chosen based on domain knowledge of the scale of flows inside the system being modelled, or simply chosen to be positive infinite if none is available. This is to ensure the posterior distribution of the flow variables have positive support, as flow quantities are always positive in reality. Furthermore, the normal and truncated normal distributions are flexible, in the sense that an informative prior distribution can be assigned by choosing the prior mode as a confident estimate (from expert knowledge), and choosing a small prior variance to create a narrow distribution around the prior mode. On the other hand, an uninformative prior distribution can be assigned by choosing a large prior variance around a rough guess for the prior mode instead.

Si𝒩(μi,σi2),Uj,k𝒯𝒩(μj,k,σj,k2,0,L)formulae-sequencesimilar-tosubscript𝑆𝑖𝒩subscript𝜇𝑖superscriptsubscript𝜎𝑖2similar-tosubscript𝑈𝑗𝑘𝒯𝒩subscript𝜇𝑗𝑘superscriptsubscript𝜎𝑗𝑘20𝐿\displaystyle S_{i}\sim\mathcal{N}(\mu_{i},\sigma_{i}^{2}),\quad U_{j,k}\sim% \mathcal{TN}(\mu_{j,k},\sigma_{j,k}^{2},0,L)italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT ∼ caligraphic_T caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 0 , italic_L ) (6)

where μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and σi2superscriptsubscript𝜎𝑖2\sigma_{i}^{2}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are the prior mean and variance of i𝑖iitalic_ith change in stock variable Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively, and μj,ksubscript𝜇𝑗𝑘\mu_{j,k}italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT and σj,k2superscriptsubscript𝜎𝑗𝑘2\sigma_{j,k}^{2}italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT the corresponding prior hyperparameters respectively for the truncated normal distribution for the flow variable Uj,ksubscript𝑈𝑗𝑘U_{j,k}italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT, representing the flow quantity from process j𝑗jitalic_j to process k𝑘kitalic_k. We also assume prior variables are independent, so the overall prior distribution p(𝜽)𝑝𝜽p(\bm{\theta})italic_p ( bold_italic_θ ) over the change in stock and flow variables is of the form:

p(𝜽)i1σiexp((Siμi)22σi2)(j,k)1σj,kexp((Uj,kμj,k)22σj,k2)IUj,k[0,L]proportional-to𝑝𝜽subscriptproduct𝑖1subscript𝜎𝑖superscriptsubscript𝑆𝑖subscript𝜇𝑖22superscriptsubscript𝜎𝑖2subscriptproduct𝑗𝑘1subscript𝜎𝑗𝑘superscriptsubscript𝑈𝑗𝑘subscript𝜇𝑗𝑘22superscriptsubscript𝜎𝑗𝑘2subscriptIsubscript𝑈𝑗𝑘0𝐿\displaystyle p(\bm{\theta})\propto\prod_{i}\frac{1}{\sigma_{i}}\exp(\frac{-(S% _{i}-\mu_{i})^{2}}{2\sigma_{i}^{2}})\prod_{(j,k)}\frac{1}{\sigma_{j,k}}\exp(% \frac{-(U_{j,k}-\mu_{j,k})^{2}}{2\sigma_{j,k}^{2}})\mathrm{I}_{U_{j,k}\in[0,L]}italic_p ( bold_italic_θ ) ∝ ∏ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_exp ( divide start_ARG - ( italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ∏ start_POSTSUBSCRIPT ( italic_j , italic_k ) end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT end_ARG roman_exp ( divide start_ARG - ( italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) roman_I start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT ∈ [ 0 , italic_L ] end_POSTSUBSCRIPT (7)

where IUj,k[0,L]subscriptIsubscript𝑈𝑗𝑘0𝐿\mathrm{I}_{U_{j,k}\in[0,L]}roman_I start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT ∈ [ 0 , italic_L ] end_POSTSUBSCRIPT is an indicator function, and the products are over the change in stock variables Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and flow variables Uj,ksubscript𝑈𝑗𝑘U_{j,k}italic_U start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT.

For the likelihood, we again use a truncated normal likelihood for any data on observed flow values to ensure the data generated by the likelihood is plausible (since flow data should be strictly positive), and a normal likelihood for data on observed change in stock values and general nonlinear data. We include an additional normally distributed noise term in the mass balance conditions mainly for practical reasons: the noise term reduces the region of zero posterior probability, which makes sampling from the posterior computationally much more tractable. Additionally, the noise term has the interpretation of modelling epistemic or systematic uncertainty in the MFA system. In theory, every process should be exactly mass conserved if the system definition and diagram are a perfect representation of the underlying real system being modelled. However, in reality the system diagram and definition are simplified and approximate models of reality, and so it is reasonable to account for uncertainty in the underlying system itself, which can be modelled through a noise term in the mass balance conditions. In practice, flows reported in MFA studies are rarely exactly mass balanced, so likewise a Bayesian MFA approach does not need to produce perfectly mass balanced flows to provide useful estimates.

Yi|𝜽conditionalsubscript𝑌superscript𝑖𝜽\displaystyle Y_{i^{\prime}}|\bm{\theta}italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ 𝒩(𝒙𝒊𝜽,τi2)for observationsYion change in stock datasimilar-toabsent𝒩superscriptsubscript𝒙superscript𝒊bold-′top𝜽superscriptsubscript𝜏superscript𝑖2for observationssubscript𝑌superscript𝑖on change in stock data\displaystyle\sim\mathcal{N}(\bm{x_{i^{\prime}}}^{\top}\bm{\theta},\tau_{i^{% \prime}}^{2})\quad\text{for observations}\ Y_{i^{\prime}}\ \text{on change in % stock data}∼ caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT bold_italic_i start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ , italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for observations italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT on change in stock data (8)
Yj|𝜽conditionalsubscript𝑌superscript𝑗𝜽\displaystyle Y_{j^{\prime}}|\bm{\theta}italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ 𝒯𝒩(𝒙𝒋𝜽,τj2,0,)for observationsYjon flow datasimilar-toabsent𝒯𝒩superscriptsubscript𝒙superscript𝒋bold-′top𝜽superscriptsubscript𝜏superscript𝑗20for observationssubscript𝑌superscript𝑗on flow data\displaystyle\sim\mathcal{TN}(\bm{x_{j^{\prime}}}^{\top}\bm{\theta},\tau_{j^{% \prime}}^{2},0,\infty)\quad\text{for observations}\ Y_{j^{\prime}}\ \text{on % flow data}∼ caligraphic_T caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT bold_italic_j start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ , italic_τ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 0 , ∞ ) for observations italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT on flow data (9)
Yk|𝜽conditionalsubscript𝑌superscript𝑘𝜽\displaystyle Y_{k^{\prime}}|\bm{\theta}italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ 𝒩(R(𝜽)k,τk2)for observationsYkon nonlinear datasimilar-toabsent𝒩𝑅subscript𝜽superscript𝑘superscriptsubscript𝜏superscript𝑘2for observationssubscript𝑌superscript𝑘on nonlinear data\displaystyle\sim\mathcal{N}(R(\bm{\theta})_{k^{\prime}},\tau_{k^{\prime}}^{2}% )\quad\text{for observations}\ Y_{k^{\prime}}\ \text{on nonlinear data}∼ caligraphic_N ( italic_R ( bold_italic_θ ) start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for observations italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT on nonlinear data (10)
Yl|𝜽conditionalsubscript𝑌superscript𝑙𝜽\displaystyle Y_{l^{\prime}}|\bm{\theta}italic_Y start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ 𝒩(𝒙𝒍𝜽,τl2)forYl=0on mass balance conditionsformulae-sequencesimilar-toabsent𝒩superscriptsubscript𝒙superscript𝒍bold-′top𝜽superscriptsubscript𝜏superscript𝑙2forsubscript𝑌superscript𝑙0on mass balance conditions\displaystyle\sim\mathcal{N}(\bm{x_{l^{\prime}}}^{\top}\bm{\theta},\tau_{l^{% \prime}}^{2})\quad\text{for}\ Y_{l^{\prime}}=0\ \text{on mass balance conditions}∼ caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT bold_italic_l start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ , italic_τ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for italic_Y start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0 on mass balance conditions (11)

Here Yisubscript𝑌superscript𝑖Y_{i^{\prime}}italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the isuperscript𝑖i^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTth row or entry of the observation vector 𝒀𝒀\bm{Y}bold_italic_Y, and 𝒙𝒊superscriptsubscript𝒙superscript𝒊bold-′top\bm{x_{i^{\prime}}}^{\top}bold_italic_x start_POSTSUBSCRIPT bold_italic_i start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT the isuperscript𝑖i^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTth row of the design matrix X𝑋Xitalic_X, R(𝜽)k𝑅subscript𝜽superscript𝑘R(\bm{\theta})_{k^{\prime}}italic_R ( bold_italic_θ ) start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT the ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTth row of the vector 𝑹(𝜽)𝑹𝜽\bm{R}(\bm{\theta})bold_italic_R ( bold_italic_θ ) and τi2superscriptsubscript𝜏superscript𝑖2\tau_{i^{\prime}}^{2}italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT representing variance of the noise of the isuperscript𝑖i^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTth observation, which are assumed to be independent. The overall likelihood p(𝒀|𝜽)𝑝conditional𝒀𝜽p(\bm{Y}|\bm{\theta})italic_p ( bold_italic_Y | bold_italic_θ ) is therefore of the form:

p(𝒀|𝜽)=ip(Yi|𝜽)jp(Yj|𝜽)kp(Yk|𝜽)lp(Yl|𝜽)𝑝conditional𝒀𝜽subscriptproductsuperscript𝑖𝑝conditionalsubscript𝑌superscript𝑖𝜽subscriptproductsuperscript𝑗𝑝conditionalsubscript𝑌superscript𝑗𝜽subscriptproductsuperscript𝑘𝑝conditionalsubscript𝑌superscript𝑘𝜽subscriptproductsuperscript𝑙𝑝conditionalsubscript𝑌superscript𝑙𝜽\displaystyle p(\bm{Y}|\bm{\theta})=\prod_{i^{\prime}}p(Y_{i^{\prime}}|\bm{% \theta})\prod_{j^{\prime}}p(Y_{j^{\prime}}|\bm{\theta})\prod_{k^{\prime}}p(Y_{% k^{\prime}}|\bm{\theta})\prod_{l^{\prime}}p(Y_{l^{\prime}}|\bm{\theta})italic_p ( bold_italic_Y | bold_italic_θ ) = ∏ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) ∏ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) ∏ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) ∏ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( italic_Y start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) (12)

where

p(Yi|𝜽)𝑝conditionalsubscript𝑌superscript𝑖𝜽\displaystyle p(Y_{i^{\prime}}|\bm{\theta})italic_p ( italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) 1τiexp((Yi𝒙𝒊𝜽)22τi2)proportional-toabsent1subscript𝜏superscript𝑖superscriptsubscript𝑌superscript𝑖superscriptsubscript𝒙superscript𝒊bold-′top𝜽22superscriptsubscript𝜏superscript𝑖2\displaystyle\propto\frac{1}{\tau_{i^{\prime}}}\exp(\frac{-(Y_{i^{\prime}}-\bm% {x_{i^{\prime}}}^{\top}\bm{\theta})^{2}}{2\tau_{i^{\prime}}^{2}})∝ divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG roman_exp ( divide start_ARG - ( italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT bold_italic_i start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (13)
p(Yj|𝜽)𝑝conditionalsubscript𝑌superscript𝑗𝜽\displaystyle p(Y_{j^{\prime}}|\bm{\theta})italic_p ( italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) 1τjexp((Yj𝒙𝒋𝜽)22τj2)IYj[0,)proportional-toabsent1subscript𝜏superscript𝑗superscriptsubscript𝑌superscript𝑗superscriptsubscript𝒙superscript𝒋bold-′top𝜽22superscriptsubscript𝜏superscript𝑗2subscriptIsubscript𝑌superscript𝑗0\displaystyle\propto\frac{1}{\tau_{j^{\prime}}}\exp(\frac{-(Y_{j^{\prime}}-\bm% {x_{j^{\prime}}}^{\top}\bm{\theta})^{2}}{2\tau_{j^{\prime}}^{2}})\mathrm{I}_{Y% _{j^{\prime}}\in[0,\infty)}∝ divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG roman_exp ( divide start_ARG - ( italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT bold_italic_j start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_τ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) roman_I start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ [ 0 , ∞ ) end_POSTSUBSCRIPT (14)
p(Yk|𝜽)𝑝conditionalsubscript𝑌superscript𝑘𝜽\displaystyle p(Y_{k^{\prime}}|\bm{\theta})italic_p ( italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) 1τkexp((YkR(𝜽)k)22τk2)proportional-toabsent1subscript𝜏superscript𝑘superscriptsubscript𝑌superscript𝑘𝑅subscript𝜽superscript𝑘22superscriptsubscript𝜏superscript𝑘2\displaystyle\propto\frac{1}{\tau_{k^{\prime}}}\exp(\frac{-(Y_{k^{\prime}}-R(% \bm{\theta})_{k^{\prime}})^{2}}{2\tau_{k^{\prime}}^{2}})∝ divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG roman_exp ( divide start_ARG - ( italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_R ( bold_italic_θ ) start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_τ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (15)
p(Yl|𝜽)𝑝conditionalsubscript𝑌superscript𝑙𝜽\displaystyle p(Y_{l^{\prime}}|\bm{\theta})italic_p ( italic_Y start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | bold_italic_θ ) 1τlexp((Yl𝒙𝒍𝜽)22τl2)proportional-toabsent1subscript𝜏superscript𝑙superscriptsubscript𝑌superscript𝑙superscriptsubscript𝒙superscript𝒍bold-′top𝜽22superscriptsubscript𝜏superscript𝑙2\displaystyle\propto\frac{1}{\tau_{l^{\prime}}}\exp(\frac{-(Y_{l^{\prime}}-\bm% {x_{l^{\prime}}}^{\top}\bm{\theta})^{2}}{2\tau_{l^{\prime}}^{2}})∝ divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG roman_exp ( divide start_ARG - ( italic_Y start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT bold_italic_l start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_τ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (16)

Given the prior and likelihood, we can use Bayes’s Theorem 5 to obtain the posterior distribution p(𝜽|𝒀)𝑝conditional𝜽𝒀p(\bm{\theta}|\bm{Y})italic_p ( bold_italic_θ | bold_italic_Y ). However, the posterior induced by this model does not admit an analytical form, and so we employ the No-U-Turn Sampler (NUTS) algorithm of Hoffman and Gelman (2014) to sample from the posterior, implemented via the PyMC3 library Salvatier et al. (2016) in Python. NUTS is a Hamiltonian Monte Carlo that achieves increased sampling efficiency over traditional MCMC methods, such as Metropolis Hastings, by exploiting the gradient information of the target distribution to generate more informed sample proposals and explore the target distribution more efficiently. This is especially important in high dimensions (which applies to many MFA systems), since the probability mass of the target distribution are more likely concentrated in smaller regions, which is inefficient to explore via Metropolis Hastings due to random walk behaviour.

2.4 Posterior predictive checks

For Bayesian modelling of MFA systems, we recommend performing posterior predictive checks to verify whether data generated by the model is similar to the observed data and adequately mass conserved, which gives some assurance that the prior, model and parameters chosen are sensible. Here we briefly describe the method of posterior predictive checking outlined in Chapter 6 of Gelman et al. (2013). Recall that in our model, the observation vector 𝒀𝒀\bm{Y}bold_italic_Y represents the observed data (such as on flows or change in stocks), as well as physical conditions such as mass balance, and 𝜽𝜽\bm{\theta}bold_italic_θ the vector of parameters. Let 𝒀𝒓𝒆𝒑superscript𝒀𝒓𝒆𝒑\bm{Y^{rep}}bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT be replicated data that could have been observed, the distribution of 𝒀𝒓𝒆𝒑superscript𝒀𝒓𝒆𝒑\bm{Y^{rep}}bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT given the observed data 𝒀𝒀\bm{Y}bold_italic_Y, also known as the posterior predictive distribution, is given by:

p(𝒀𝒓𝒆𝒑|𝒀)=p(𝒀𝒓𝒆𝒑|𝜽)p(𝜽|𝒀)𝑑𝜽𝑝conditionalsuperscript𝒀𝒓𝒆𝒑𝒀𝑝conditionalsuperscript𝒀𝒓𝒆𝒑𝜽𝑝conditional𝜽𝒀differential-d𝜽p(\bm{Y^{rep}}|\bm{Y})=\int p(\bm{Y^{rep}}|\bm{\theta})p(\bm{\theta}|\bm{Y})\,% d\bm{\theta}italic_p ( bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT | bold_italic_Y ) = ∫ italic_p ( bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT | bold_italic_θ ) italic_p ( bold_italic_θ | bold_italic_Y ) italic_d bold_italic_θ (17)

Typically, the check is done on suitable scalar test quantities T(𝒀,𝜽)𝑇𝒀𝜽T(\bm{Y},\bm{\theta})italic_T ( bold_italic_Y , bold_italic_θ ), chosen based on the real problem being modelled. The test quantity of the replicated data T(𝒀𝒓𝒆𝒑,𝜽)𝑇superscript𝒀𝒓𝒆𝒑𝜽T(\bm{Y^{rep}},\bm{\theta})italic_T ( bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT , bold_italic_θ ) is compared with the test quantity of the observed data T(𝒀,𝜽)𝑇𝒀𝜽T(\bm{Y},\bm{\theta})italic_T ( bold_italic_Y , bold_italic_θ ) through statistical tests or graphical checks to look for systematic discrepancies between the simulated and originally observed data. For our Bayesian material flow analysis model, we choose the test quantities to be each individual observed data and mass balance conditions of child processes in the system, in other words we choose test quantities Ti(𝒀,𝜽)=Yisubscript𝑇𝑖𝒀𝜽subscript𝑌𝑖T_{i}(\bm{Y},\bm{\theta})=Y_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_Y , bold_italic_θ ) = italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each i𝑖iitalic_i. This ensures we minimally check using the posterior predictive distribution that the model is consistent with the existing data as well as mass balance of child processes. One way to compare the observed data with the posterior predictive distribution is to calculate the Bayesian posterior predictive p-values for the test statistic, in our case the marginal observations Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which are given by

pvali=P(YirepYi|𝒀)=IYirepYip(𝒀𝒓𝒆𝒑|𝜽)p(𝜽|𝒀)𝑑𝒀𝒓𝒆𝒑𝑑𝜽𝑝𝑣𝑎subscript𝑙𝑖𝑃superscriptsubscript𝑌𝑖𝑟𝑒𝑝conditionalsubscript𝑌𝑖𝒀subscriptIsuperscriptsubscript𝑌𝑖𝑟𝑒𝑝subscript𝑌𝑖𝑝conditionalsuperscript𝒀𝒓𝒆𝒑𝜽𝑝conditional𝜽𝒀differential-dsuperscript𝒀𝒓𝒆𝒑differential-d𝜽pval_{i}=P(Y_{i}^{rep}\geq Y_{i}|\bm{Y})=\int\int\mathrm{I}_{Y_{i}^{rep}\geq Y% _{i}}p(\bm{Y^{rep}}|\bm{\theta})p(\bm{\theta}|\bm{Y})\,d\bm{Y^{rep}}d\bm{\theta}italic_p italic_v italic_a italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_P ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r italic_e italic_p end_POSTSUPERSCRIPT ≥ italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_Y ) = ∫ ∫ roman_I start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r italic_e italic_p end_POSTSUPERSCRIPT ≥ italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p ( bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT | bold_italic_θ ) italic_p ( bold_italic_θ | bold_italic_Y ) italic_d bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT italic_d bold_italic_θ (18)

Very large or small p-value (e.g. greater than 0.95 or smaller than 0.05 as suggested by Gelman et al. (2013)) suggests the observed test quantity is unlikely to be replicated in repeated experiments if the model were true, implying there is inconsistency between the model and the data. We also calculate the posterior predictive marginal 95%percent9595\%95 % highest density interval for each Yi|𝒀conditionalsubscript𝑌𝑖𝒀Y_{i}|\bm{Y}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_Y to see if they contain the observed values Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (which include the conservation of mass conditions where Yi=0subscript𝑌𝑖0Y_{i}=0italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0).

In practice, the integrals in Equation 17 and Equation 18 are analytically intractable, so we again approximate the posterior predictive distribution via sampling. Specifically we simulate one sample of 𝒀𝒓𝒆𝒑superscript𝒀𝒓𝒆𝒑\bm{Y^{rep}}bold_italic_Y start_POSTSUPERSCRIPT bold_italic_r bold_italic_e bold_italic_p end_POSTSUPERSCRIPT from the posterior predictive distribution for each posterior sample of 𝜽𝜽\bm{\theta}bold_italic_θ, and we approximate p-values of Equation 18 by checking the proportion of the posterior predictive samples of Yirepsuperscriptsubscript𝑌𝑖𝑟𝑒𝑝Y_{i}^{rep}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r italic_e italic_p end_POSTSUPERSCRIPT that exceed the observed value Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for each i𝑖iitalic_i.

2.5 Hyperparameter and noise parameter selection

In general, the prior hyperparameters μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, μj,ksubscript𝜇𝑗𝑘\mu_{j,k}italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT should be specified through domain knowledge to reflect the modeller’s best estimate of the stock change and flow variables apriori, and σi2superscriptsubscript𝜎𝑖2\sigma_{i}^{2}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, σj,k2superscriptsubscript𝜎𝑗𝑘2\sigma_{j,k}^{2}italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT chosen to reflect prior uncertainty, the less confident the estimates for μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, μj,ksubscript𝜇𝑗𝑘\mu_{j,k}italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT, the larger σi2superscriptsubscript𝜎𝑖2\sigma_{i}^{2}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, σj,k2superscriptsubscript𝜎𝑗𝑘2\sigma_{j,k}^{2}italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

The noise variance parameters τi2superscriptsubscript𝜏superscript𝑖2\tau_{i^{\prime}}^{2}italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT should also ideally be chosen to reflect uncertainty in the data, and in the case the mass balance conditions, epistemic uncertainty in the system definition. We recommend using posterior predictive checking to help select suitable noise parameter values, especially if no knowledge of the degree of data uncertainty is available. One can start with a small choice of standard deviation parameters (such as 10%percent1010\%10 % of the observed data value, and a small constant for the mass balance conditions), run the model and conduct posterior predictive checks, and identify the data points and mass balance conditions which exhibit extreme Bayesian p-values. The standard deviation parameter for those data points or mass balance conditions should be increased and the model rerun until no extreme Bayesian p-values remain.

Full details of hyper parameters choice in the case studies examined in the Results section can be found in the supporting information. We give examples of how to specify prior hyperparameters in the case where there is weakly informative domain knowledge where some flows are known to the nearest order of magnitude, as well as flows where there is no prior knowledge available.

3 Results

In this section we demonstrate our Bayesian MFA method on an aluminium cycle, containing significant disaggregations of processes and flows. We evaluate our model on the aluminium cycle under a weakly informative prior for two different levels of data availability and compare the results. We also present posterior predictive checks to identify inconsistencies between the model and data, and processes which are not mass balanced by the available data.

3.1 Aluminium cycle

Refer to caption
Figure 2: Global anthropogenic metallurgical aluminium cycle in 2009, adapted from Figure 1 of Liu et al. (2013). Mass of aluminium is measured in megatonnes (Mt).

We evaluate our model on the global anthropogenic metallurgical aluminium cycle in 2009 from Liu et al. (2013). The associated system diagram Figure 2 is adapted from Figure 1 of Liu et al. (2013). The aluminium cycle contains significant disaggregations of processes. For example, the ‘Use’ Process contains subprocesses such as ‘Use_BC’ (building and construction) and ‘Use_ME’ (machinery and equipment) representing different use product categories of aluminium. Furthermore, for aggregated processes such as ‘Manufacturing’ and ‘Recycling’, Figure 2 only displays data on aggregated flows and change in stocks. For example, the flow 12.112.112.112.1 Mt from ‘Manufacturing’ is a combined flow to ‘Use_BC’ and the subprocesses ‘Remelting’ and ‘Refining’ of ‘Recycling’. So it is not clear from Figure 2 alone in what proportion this flow should split among the constituent subprocesses. In tables S5, S6 and S8 of the supplementary information, Liu et al. (2013) provides ratios specifying how certain aggregated flows should split into constituent subflows (known as transfer coefficients), specifically for the aggregate flows from ‘Semi-manufacturing’ to ‘Manufacturing’ and ‘Recycling’, and from ‘Manufacturing’ to ‘Use’ and ‘Recycling’. For the purposes of testing our model, we treat the values presented in Figure 1 of Liu et al. (2013) and transfer coefficients in tables S5,S6,S6 of the supplementary information as data. We also henceforth use ‘reported value’ to refer to any flow and change in stock values in Figure 1 of Liu et al. (2013), as well as values of disaggregated flow and change in stock values calculated through the transfer coefficients in the supplementary material of Liu et al. (2013).

We evaluate the model under two different scenarios. In scenario A, we deliberately withhold the transfer coefficients in the supplementary information of Liu et al. (2013) from the model, and only use the data displayed in Figure 2. Instead, where transfer coefficients are available to calculate the value of the disaggregated flow, we set the prior mode of the disaggregated flow variable to the nearest power of 10101010 of the reported value, otherwise an uninformative prior with mode 1.01.01.01.0 Mt is used. The purpose of scenario A is to see if our Bayesian MFA model can still produce useful estimates of flows and changes in stock under a weakly informative prior with significant amount of missing data. Scenario A also mimics a situation that could be applicable to many MFA problems, where data is available on a aggregated/parent level but not the disaggregated/child level, and a rough estimate like the order of magnitude of the flows and changes in stock on the disaggregated level is obtained from surveying domain experts or approximate calculations, which can be used to construct a weakly informative prior. In scenario B, the same weakly informative prior is used, but the flow ratios in the supplementary information of Liu et al. (2013) are provided to the model as well. Scenario B mimics a situation towards the end of an MFA analysis, when data are available for most of the flow and change in stock variables in the system. In both scenarios, we assume a low degree of epistemic uncertainty in the system diagram and set the standard deviation of mass balance conditions to a constant 0.050.050.050.05 Mt. This choice leads to well mass balanced posterior means and samples of the flow and change in stock variables, and we provide an analysis of posterior mass balance conditions in the supporting information (section S5) for full detail. Full detail of prior hyperparameters can also be found in the supporting information (section S4).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Marginal posterior distributions for a selection of flow and change in stock variables of the aluminium dataset. For each flow or change in stock variables we display both the marginal posterior for scenario A (on the left) and for the scenario B (on the right). Each marginal posterior plot displays the mean and the 95 % highest density interval (HDI). The red vertical line in each graph represents the reported value of the variable, calculated using transfer coefficients provided in the supplement of Liu et al. (2013). Underlying data for this figure can be found at https://github.com/jwang727/BayesianMFA

Figure 3 displays a selection of marginal posterior distributions for flow and change in stock variables of disaggregated/child processes in the aluminium cycle. Only a selection is displayed here for brevity as there are around 180180180180 flow or change in stock variables in the model. Under scenario A, the marginal posteriors are relatively more biased away from the reported value, which is not unexpected as data on the disaggregated level was withheld from the model in scenario A. Nevertheless, all of the reported values of flows and change in stock (when they are available in the supplement of Liu et al. (2013)) are contained within the 95%percent9595\%95 % posterior marginal highest density intervals (HDI). A small number of reported values are near the edge of HDI like the flow ‘ShapeCasting to Manuf_TAU’. This could be caused by the reported value (around 5.15.15.15.1 Mt) being close to the middle of the nearest orders of magnitudes (1.01.01.01.0 and 10.010.010.010.0 Mt), which is more difficult for a prior based on the nearest order of magnitude (1.01.01.01.0 Mt in this case) to capture without actual data on the flow. Overall under scenario A, the posterior estimates and uncertainty quantification for disaggregated flows or change in stock variables obtained under still captures the reported values reasonably well despite not given data on any disaggregated flows. With the addition of transfer coefficient data under scenario B, the posterior marginal distributions generally possess narrower HDI compared to scenario A and centred much more around the reported value, and again the reported values are contained within the 95%percent9595\%95 % posterior HDI. Quantitatively, the average length of the 95%percent9595\%95 % posterior HDI over all flow and change in stock variables under scenario A is 2.152.152.152.15 Mt, while under scenario B is 1.461.461.461.46 Mt.

In both scenarios it took the NUTS algorithm around 90 minutes to generate 24000240002400024000 posterior samples across two chains on an Intel i5-1145G7 CPU, 2.6GHz. The traceplots and convergence checks suggests the NUTS algorithm has converged and we did not observe any divergent samples. A selection of traceplots and further diagnostics can be found in section S2 of the supporting information. Posterior pairplots illustrating posterior correlation between a selection of flow variables can be found in section S3. In section S6 we include a simulation study on a zinc cycle to examine the estimation accuracy and uncertainty quantification of our model from a frequentist perspective. We demonstrate on a zinc cycle that incorporating even a weakly informative prior can significantly reduce estimation error, and the posterior credible intervals can consistently contain the true value of flows and stock changes and be interpreted as confidence intervals.

3.2 Posterior predictive checks on aluminium model

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Figure 4: Graphs of posterior predictive checks and posterior predictive HDI lengths, for scenario B. On the first row, we have the sample 95%percent9595\%95 % posterior predictive highest density intervals (red bar) and the observed values (blue dot) for the change in stock data, observed flow data, conservation of mass conditions and flow ratio data respectively. On the second row, we have the posterior predictive p-values for the observed change in stock variables, observed flow variables, conservation of mass conditions and flow ratio data respectively. On the third row, we plot the top 10101010 largest marginal posterior HDI lengths for scenario A (left), and scenario B (right). Underlying data for this figure can be found at https://github.com/jwang727/BayesianMFA

In this section we present some additional results for the aluminium model. First, we apply the posterior predictive checks described in Section 2.4 on the aluminium model under scenario B. From Figures 4(a), 4(b), 4(c) and 4(d), it can be seen that the 95%percent9595\%95 % HDIs all contain the observed values for all change in stock data, flow data, conservation of mass conditions and ratio data. Moreover, from Figures 4(e), 4(f), 4(g) and 4(h), it can be seen that the posterior predictive p-values are mostly between 0.30.30.30.3 and 0.70.70.70.7 and no p-value is smaller than 0.050.050.050.05 or greater than 0.950.950.950.95, suggesting that no p-values are extreme and the model generally fits the data reasonably well. There are a few variables close to extreme values however. In particular, the 27th flow data in Figure 4(f) has a p value of around 0.070.070.070.07, which represents the total outflow from ‘Internal remelting’ to ‘Semi-manufacturing’ of 9.39.39.39.3 Mt. Upon reviewing the data, it was found the total inflow of ‘Internal remelting’ only summed to 7.17.17.17.1 Mt. The posterior predictive check therefore helpfully highlighted a discrepancy that suggests the data for the process Internal remelting has more sizable mass imbalance. Similarly, the 49th data has a relatively high p-values of around 0.80.80.80.8, which is the data representing the outflow from Manuf_OTD of 0.80.80.80.8 Mt. Upon examining the data it was found the total inflow of Manuf_OTD is 1.01.01.01.0.

In MFA, data is expensive to collect, so it is useful to prioritise which flow and change in stock variables to collect more data on. This will likely depend on what questions the modeller is most interested in answering regarding the real system being modelled. Without specific questions however, Bayesian inference provides default strategies for prioritising which data points to collect, by ranking the variables in terms of descending posterior uncertainty. We plot the top 10101010 flow and change in stock variables in descending length of their marginal 95%percent9595\%95 % HDI for both scenarios A (Figure 4(i)) and B (Figure 4(j)). In scenario B, the most uncertain variables are mostly flows from ‘Casting’ or ‘Recycling’ to ‘Semi-manufacturing’, which is expected as those are the disaggregated flows where data is not available. For scenario A, the most uncertain variables are more scattered throughout the system, as scenario A has no data on any disaggregated flows.

4 Discussion

This paper presented a novel MFA methodology under the Bayesian framework that addresses existing challenges and expands the applicability of Bayesian inference in MFA. By relaxing the mass balance constraints with a noise term, we improved upon the computational scalability and reliability of posterior samples compared to existing methods, while still retaining well mass balanced posterior estimates of stock changes and flows. We introduced a child and parent parametrisation that can conveniently deal with MFA systems with multiple layers of disaggregation of processes and flows, providing posterior distributions on flows and stock changes on all levels of disaggregation in the system, including lower levels where data is often unavailable. We showed that even an weakly informative prior, specifically a prior based on the nearest order of magnitude of stock changes and flows, is capable of greatly improving the model’s estimation accuracy and quality of its uncertainty quantification, especially during the early stages of the analysis when there is a lack of data, reaffirming the benefit of a Bayesian approach to MFA. We also demonstrated how posterior predictive checks can be used to check if the model is consistent with the data and mass balance conditions, and help identify data inconsistencies and aid in selecting noise parameter values for the data and mass balance conditions.

Like other Bayesian approaches, our method requires prior distributions of all flow and stock change variables of interest to be manually specified, which is an additional requirement compared to traditional MFA. However, we argue that this should be a standard part of any material flow analysis, where domain knowledge should be continuously collated, and the prior distribution offers a principled, mathematical way of incorporating this knowledge into the model. The priors used in this paper are unimodal and only weakly informative at most, in order to reduce requirements on the prior so that it can be applied in a wider range of MFA settings. In principle more or less informative priors can be used based on domain knowledge of the application. In the presence of multiple expert opinions, mixture priors that combine multiple experts similar to the approach of Dong et al. (2023) can be considered in our model framework.

The Bayesian approach inevitably comes with a higher computational cost, as the full posterior distribution of each variable of interest needs to be calculated rather than just a point estimate, and may run into convergence issues Betancourt (2017). The No-U-Turn Sampler HMC algorithm used for our model took around 90909090 minutes to generate 24000240002400024000 posterior samples across two chains for the aluminium dataset, which we consider acceptable. We also note that most MFA literature do not use very large datasets so we do not anticipate computational cost to be a major issue for most MFA studies. However, for much larger material flow datasets, approximate methods such as variational Bayes can be employed to reduce the computational cost, or minimally just estimating the posterior mode directly (and forgoing uncertainty quantification) can potentially yield better point estimates than a non Bayesian method if informative priors are available.

Acknowledgements

This work was supported by the UKRI Interdisciplinary Circular Economy Centre For Mineral-based Construction Materials under the EPSRC Grant EP/V011820/1. The authors are grateful to Mohit Arora, Nicola Gambaro, Ugo Legendre and Shen Zhenxia for useful discussions.

References

  • UNE [2020] United Nations Resource Management System: An overview of concepts, objectives and requirements (ECE energy series no. 68), 2020. URL https://unece.org/sustainable-energy/publications/united-nations-resource-management-system-overview-concepts.
  • WPP [2022] World population prospects - population division, 2022. URL https://population.un.org/wpp/.
  • Bertram et al. [2009] M. Bertram, K. J. Martchek, and G. Rombach. Material Flow Analysis in the Aluminum Industry. Journal of Industrial Ecology, 13(5):650–654, 2009. doi: https://doi.org/10.1111/j.1530-9290.2009.00158.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1530-9290.2009.00158.x.
  • Betancourt [2017] M. Betancourt. A Conceptual Introduction to Hamiltonian Monte Carlo. 2017. doi: 10.48550/ARXIV.1701.02434. URL https://arxiv.org/abs/1701.02434.
  • Brunner and Rechberger [2004] P. H. Brunner and H. Rechberger. Practical handbook of material flow analysis. The International Journal of Life Cycle Assessment, 9(5):337–338, Sep 2004. ISSN 1614-7502. doi: 10.1007/BF02979426. URL https://doi.org/10.1007/BF02979426.
  • Cencic [2016] O. Cencic. Nonlinear data reconciliation in material flow analysis with software stan. Sustainable Environment Research, 26(6):291–298, 2016. ISSN 2468-2039. doi: https://doi.org/10.1016/j.serj.2016.06.002. URL https://www.sciencedirect.com/science/article/pii/S246820391630053X.
  • Cencic and Frühwirth [2015] O. Cencic and R. Frühwirth. A general framework for data reconciliation—Part I: Linear constraints. Computers & Chemical Engineering, 75:196–208, 2015. ISSN 0098-1354. doi: https://doi.org/10.1016/j.compchemeng.2014.12.004. URL https://www.sciencedirect.com/science/article/pii/S0098135414003330.
  • Cencic and Frühwirth [2018] O. Cencic and R. Frühwirth. Data reconciliation of nonnormal observations with nonlinear constraints. Journal of Applied Statistics, 45(13):2411–2428, 2018. doi: 10.1080/02664763.2017.1421916. URL https://doi.org/10.1080/02664763.2017.1421916.
  • Cullen et al. [2012] J. M. Cullen, J. M. Allwood, and M. D. Bambach. Mapping the Global Flow of Steel: From Steelmaking to End-Use Goods. Environmental Science & Technology, 46(24):13048–13055, Dec 2012. ISSN 0013-936X. doi: 10.1021/es302433p. URL https://doi.org/10.1021/es302433p.
  • da Costa Reis et al. [2019] D. da Costa Reis, Y. Mack-Vergara, and V. M. John. Material flow analysis and material use efficiency of Brazil’s mortar and concrete supply chain. Journal of Industrial Ecology, 23(6):1396–1409, 2019. doi: https://doi.org/10.1111/jiec.12929. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/jiec.12929.
  • Dong et al. [2023] J. Dong, J. Liao, X. Huan, and D. Cooper. Expert elicitation and data noise learning for material flow analysis using bayesian inference. Journal of Industrial Ecology, 27(4):1105–1122, 2023. doi: 10.1111/jiec.13399.
  • Fischer-Kowalski et al. [2011] M. Fischer-Kowalski, F. Krausmann, S. Giljum, S. Lutter, A. Mayer, S. Bringezu, Y. Moriguchi, H. Schütz, H. Schandl, and H. Weisz. Methodology and Indicators of Economy-wide Material Flow Accounting. Journal of Industrial Ecology, 15(6):855–876, 2011. doi: https://doi.org/10.1111/j.1530-9290.2011.00366.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1530-9290.2011.00366.x.
  • Gelman et al. [2013] A. Gelman, J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin. Bayesian Data Analysis, Third Edition. Chapman and Hall/CRC, 2013. ISBN 9781439840955.
  • Gottschalk et al. [2010] F. Gottschalk, R. W. Scholz, and B. Nowack. Probabilistic material flow modeling for assessing the environmental exposure to compounds: Methodology and an application to engineered nano-TiO2 particles. Environmental Modelling & Software, 25(3):320–332, 2010. ISSN 1364-8152. doi: https://doi.org/10.1016/j.envsoft.2009.08.011. URL https://www.sciencedirect.com/science/article/pii/S1364815209002394.
  • Graedel et al. [2005] T. E. Graedel, D. van Beers, M. Bertram, K. Fuse, R. B. Gordon, A. Gritsinin, E. M. Harper, A. Kapur, R. J. Klee, R. Lifset, L. Memon, and S. Spatari. The Multilevel Cycle of Anthropogenic Zinc. Journal of Industrial Ecology, 9(3):67–90, 2005. doi: https://doi.org/10.1162/1088198054821573. URL https://onlinelibrary.wiley.com/doi/abs/10.1162/1088198054821573.
  • Hastie et al. [2001] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001.
  • Hoerl and Kennard [2000] A. E. Hoerl and R. W. Kennard. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 42(1):80–86, 2000. ISSN 00401706. URL http://www.jstor.org/stable/1271436.
  • Hoffman and Gelman [2014] M. Hoffman and A. Gelman. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15:1953–1623, 2014.
  • Horn and Johnson [1985] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
  • Johnstone [2010] I. M. Johnstone. High dimensional bernstein-von mises: Simple examples. Institute of Mathematical Statistics Collections, page 87–98, 2010. doi: 10.1214/10-imscoll607.
  • Jowitt [2020] S. M. Jowitt. COVID-19 and the Global Mining Industry. SEG Discovery, (122):33–41, 07 2020. ISSN 1550-297X. doi: 10.5382/SEGnews.2020-122.fea-02. URL https://doi.org/10.5382/SEGnews.2020-122.fea-02.
  • Jowitt et al. [2020] S. M. Jowitt, G. M. Mudd, and J. F. H. Thompson. Future availability of non-renewable metal resources and the influence of environmental, social, and governance conflicts on metal production. Nature News, Sep 2020. URL https://www.nature.com/articles/s43247-020-0011-0.
  • Kruschke [2015] J. K. Kruschke. Doing Bayesian Data Analysis: A tutorial with R, Jags, and Stan. Academic Press, 2015.
  • Lan and Kang [2023] S. Lan and L. Kang. Sampling constrained continuous probability distributions: A review. WIREs Computational Statistics, 15(6):e1608, 2023. doi: https://doi.org/10.1002/wics.1608. URL https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wics.1608.
  • Laner et al. [2015] D. Laner, J. Feketitsch, H. Rechberger, and J. Fellner. A novel approach to characterize data uncertainty in material flow analysis and its application to plastics flows in austria. Journal of Industrial Ecology, 20(5):1050–1063, 2015. doi: 10.1111/jiec.12326.
  • Lederer [2022] J. C. Lederer. Fundamentals of high-dimensional statistics: With exercises and R labs. Springer Texts in Statistics. Springer, 1 edition, 2022.
  • Liu et al. [2013] G. Liu, C. E. Bangs, and D. B. Müller. Stock dynamics and emission pathways of the global aluminium cycle. Nature Climate Change, 3(4):338–342, Apr 2013. ISSN 1758-6798. doi: 10.1038/nclimate1698. URL https://doi.org/10.1038/nclimate1698.
  • Lupton and Allwood [2018] R. C. Lupton and J. M. Allwood. Incremental Material Flow Analysis with Bayesian Inference. Journal of Industrial Ecology, 22(6):1352–1364, 2018. doi: https://doi.org/10.1111/jiec.12698. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/jiec.12698.
  • Mannan and Al-Ghamdi [2020] M. Mannan and S. G. Al-Ghamdi. Environmental impact of water-use in buildings: Latest developments from a life-cycle assessment perspective. Journal of Environmental Management, 261:110198, 2020. ISSN 0301-4797. doi: https://doi.org/10.1016/j.jenvman.2020.110198. URL https://www.sciencedirect.com/science/article/pii/S030147972030133X.
  • Matsubae-Yokoyama et al. [2009] K. Matsubae-Yokoyama, H. Kubo, K. Nakajima, and T. Nagasaka. A Material Flow Analysis of Phosphorus in Japan. Journal of Industrial Ecology, 13(5):687–705, 2009. doi: https://doi.org/10.1111/j.1530-9290.2009.00162.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1530-9290.2009.00162.x.
  • Miatto et al. [2017] A. Miatto, H. Schandl, T. Fishman, and H. Tanikawa. Global Patterns and Trends for Non-Metallic Minerals used for Construction. Journal of Industrial Ecology, 21(4):924–937, 2017. doi: https://doi.org/10.1111/jiec.12471. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/jiec.12471.
  • Mirsky [1975] L. Mirsky. A trace inequality of John von Neumann. Monatshefte für Mathematik, 79(4):303–306, Dec 1975. ISSN 1436-5081. doi: 10.1007/BF01647331. URL https://doi.org/10.1007/BF01647331.
  • Mudd [2021] G. M. Mudd. Assessing the Availability of Global Metals and Minerals for the Sustainable Century: From Aluminium to Zirconium. Sustainability, 13(19), 2021. ISSN 2071-1050. doi: 10.3390/su131910855. URL https://www.mdpi.com/2071-1050/13/19/10855.
  • Myers et al. [2019a] R. Myers, B. Reck, and T. Graedel. YSTAFDB, a unified database of material stocks and flows for sustainability science. Scientific Data, 6, 2019a. doi: 10.1038/s41597-019-0085-7. URL http://dx.doi.org/10.1038/s41597-019-0085-7.
  • Myers et al. [2019b] R. J. Myers, T. Fishman, B. K. Reck, and T. E. Graedel. Unified Materials Information System (UMIS): An Integrated Material Stocks and Flows Data Structure. Journal of Industrial Ecology, 23(1):222–240, 2019b. doi: https://doi.org/10.1111/jiec.12730. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/jiec.12730.
  • Reck et al. [2008] B. K. Reck, D. B. Müller, K. Rostkowski, and T. E. Graedel. Anthropogenic nickel cycle: Insights into use, trade, and recycling. Environmental Science & Technology, 42(9):3394–3400, May 2008. ISSN 0013-936X. doi: 10.1021/es072108l. URL https://doi.org/10.1021/es072108l.
  • Salvatier et al. [2016] J. Salvatier, T. V. Wiecki, and C. Fonnesbeck. Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2:e55, apr 2016. doi: 10.7717/peerj-cs.55. URL https://doi.org/10.7717/peerj-cs.55.
  • Schwab and Rechberger [2017] O. Schwab and H. Rechberger. Information content, complexity, and uncertainty in material flow analysis. Journal of Industrial Ecology, 22(2):263–274, 2017. doi: 10.1111/jiec.12572.
  • Schwab et al. [2016a] O. Schwab, D. Laner, and H. Rechberger. Quantitative evaluation of data quality in regional material flow analysis. Journal of Industrial Ecology, 21(5):1068–1077, 2016a. doi: 10.1111/jiec.12490.
  • Schwab et al. [2016b] O. Schwab, O. Zoboli, and H. Rechberger. A data characterization framework for material flow analysis. Journal of Industrial Ecology, 21(1):16–25, 2016b. doi: 10.1111/jiec.12399.
  • Tibshirani [1996] R. Tibshirani. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996. ISSN 00359246. URL http://www.jstor.org/stable/2346178.
  • Vaart [1998] A. W. v. d. Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998. doi: 10.1017/CBO9780511802256.
  • Wainwright [2019] M. J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. doi: 10.1017/9781108627771.
  • Westbroek et al. [2021] C. D. Westbroek, J. Bitting, M. Craglia, J. M. C. Azevedo, and J. M. Cullen. Global material flow analysis of glass: From raw materials to end of life. Journal of Industrial Ecology, 25(2):333–343, 2021. doi: https://doi.org/10.1111/jiec.13112. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/jiec.13112.
  • Zou and Hastie [2005] H. Zou and T. Hastie. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2):301–320, 2005. ISSN 13697412, 14679868. URL http://www.jstor.org/stable/3647580.

Appendix A Supplementary Information

A.1 Parent and child processes

In this section we elaborate on the parent and child process framework used to model disaggregation of processes or flows in MFA systems, and provide a simple illustrative example. Recall that a child process is any process which does not contain subprocesses, and a parent process is any process which does contain subprocesses. This by definition partitions all processes in the system into either child or parent processes, and all parent processes can be broken down into a set of constituent child processes. This partition of parent and child processes is useful because it allows multiple levels of disaggregation to be modelled using just two types of processes with a bottom up approach. In particular, only the child processes need to be modelled, and the parent processes variables are simply the sum of its constituent child processes variables. To illustrate this, consider the (non mass conserved) simple example of Figure 5:

Refer to caption
Figure 5: A simple example of a MFA system with aggregation

From a typical top down approach, this example system contains 3 levels of disaggregation, since ‘Process B’ contains ‘Process A’, and ‘Process A’ in turn contains ‘Process 1’ and ‘Process 2’. In our framework however, the child processes are ‘Process 1’,‘Process 2’,‘Process 3’,‘Process 4’ and ‘Process 5’ while the parent processes are ‘Process A’, ‘Process B’ and ‘Process C’. The data in Figure 5 are the flows U1,3=1.7subscript𝑈131.7U_{1,3}=1.7italic_U start_POSTSUBSCRIPT 1 , 3 end_POSTSUBSCRIPT = 1.7, U4,B=2.3subscript𝑈4𝐵2.3U_{4,B}=2.3italic_U start_POSTSUBSCRIPT 4 , italic_B end_POSTSUBSCRIPT = 2.3, UB,C=10.4subscript𝑈𝐵𝐶10.4U_{B,C}=10.4italic_U start_POSTSUBSCRIPT italic_B , italic_C end_POSTSUBSCRIPT = 10.4 and UA,5=5.8subscript𝑈𝐴55.8U_{A,5}=5.8italic_U start_POSTSUBSCRIPT italic_A , 5 end_POSTSUBSCRIPT = 5.8, and the change of stock SC=11.6subscript𝑆𝐶11.6S_{C}=11.6italic_S start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = 11.6. The change in stock or flow data involving parent or aggregated processes can all be expressed in terms of flows involving child processes:

SCsubscript𝑆𝐶\displaystyle S_{C}italic_S start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT =S4+S5absentsubscript𝑆4subscript𝑆5\displaystyle=S_{4}+S_{5}= italic_S start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT (19)
U4,Bsubscript𝑈4𝐵\displaystyle U_{4,B}italic_U start_POSTSUBSCRIPT 4 , italic_B end_POSTSUBSCRIPT =U4,1+U4,2+U4,3absentsubscript𝑈41subscript𝑈42subscript𝑈43\displaystyle=U_{4,1}+U_{4,2}+U_{4,3}= italic_U start_POSTSUBSCRIPT 4 , 1 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 4 , 2 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 4 , 3 end_POSTSUBSCRIPT (20)
UB,Csubscript𝑈𝐵𝐶\displaystyle U_{B,C}italic_U start_POSTSUBSCRIPT italic_B , italic_C end_POSTSUBSCRIPT =U1,4+U2,4+U3,4+U1,5+U2,5+U3,5absentsubscript𝑈14subscript𝑈24subscript𝑈34subscript𝑈15subscript𝑈25subscript𝑈35\displaystyle=U_{1,4}+U_{2,4}+U_{3,4}+U_{1,5}+U_{2,5}+U_{3,5}= italic_U start_POSTSUBSCRIPT 1 , 4 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 2 , 4 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 3 , 4 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 1 , 5 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 2 , 5 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 3 , 5 end_POSTSUBSCRIPT (21)
UA,5subscript𝑈𝐴5\displaystyle U_{A,5}italic_U start_POSTSUBSCRIPT italic_A , 5 end_POSTSUBSCRIPT =U1,5+U2,5absentsubscript𝑈15subscript𝑈25\displaystyle=U_{1,5}+U_{2,5}= italic_U start_POSTSUBSCRIPT 1 , 5 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT 2 , 5 end_POSTSUBSCRIPT (22)

The data in this example can therefore be formulated with the following design matrix:

(001000000000110000000000000000000111000111111000000010100000)(S4S5U1,3U1,4U1,5U2,4U2,5U3,4U3,5U4,1U4,2U4,3)=(1.711.62.310.45.8)matrix001000000000110000000000000000000111000111111000000010100000matrixsubscript𝑆4subscript𝑆5subscript𝑈13subscript𝑈14subscript𝑈15subscript𝑈24subscript𝑈25subscript𝑈34subscript𝑈35subscript𝑈41subscript𝑈42subscript𝑈43matrix1.711.62.310.45.8\setcounter{MaxMatrixCols}{12}\begin{pmatrix}0&0&1&0&0&0&0&0&0&0&0&0\\ 1&1&0&0&0&0&0&0&0&0&0&0\\ 0&0&0&0&0&0&0&0&0&1&1&1\\ 0&0&0&1&1&1&1&1&1&0&0&0\\ 0&0&0&0&1&0&1&0&0&0&0&0\end{pmatrix}\begin{pmatrix}S_{4}\\ S_{5}\\ U_{1,3}\\ U_{1,4}\\ U_{1,5}\\ U_{2,4}\\ U_{2,5}\\ U_{3,4}\\ U_{3,5}\\ U_{4,1}\\ U_{4,2}\\ U_{4,3}\end{pmatrix}=\begin{pmatrix}1.7\\ 11.6\\ 2.3\\ 10.4\\ 5.8\end{pmatrix}( start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ) ( start_ARG start_ROW start_CELL italic_S start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_S start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 1 , 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 1 , 4 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 1 , 5 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 2 , 4 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 2 , 5 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 3 , 4 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 3 , 5 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 4 , 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 4 , 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_U start_POSTSUBSCRIPT 4 , 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) = ( start_ARG start_ROW start_CELL 1.7 end_CELL end_ROW start_ROW start_CELL 11.6 end_CELL end_ROW start_ROW start_CELL 2.3 end_CELL end_ROW start_ROW start_CELL 10.4 end_CELL end_ROW start_ROW start_CELL 5.8 end_CELL end_ROW end_ARG ) (23)

A.2 Model convergence diagnostics

In this section we describe model diagnostic checks used to examine the convergence of the Markov Chains Monte Carlo (MCMC) sampling algorithm (No-U Turn Sampler of Hoffman and Gelman [2014]). We present trace plots for a selection of flow and change in stock variables in the aluminium model for illustration. Using the No-U-Turn Sampler, we sampled 2 independent chains of 12000120001200012000 samples, with the first 2000200020002000 samples used for tuning and discarded from the final posterior samples. The purpose of the tuning samples are to give the Markov Chains iterations to converge to the target posterior distribution before extracting samples to approximate the posterior distribution. The traceplots in Figure 6 suggests both chains have converged to the same distribution. Furthermore, for both scenario A and scenario B of the aluminium model, the Gelman-Rubin diagnostic statistic (see for example Chapter 11 of Gelman et al. [2013]) for all flow and change in stock variables in the model have converged to 1.01.01.01.0 and no divergent samples were reported, indicating the posterior samples are reliable (Betancourt [2017]).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Traceplots for a selection of flow and change in stock variables in the aluminium model, scenario A.

A.3 Posterior pairplots

Here we present a small selection of pairplots to illustrate correlation between the posterior distribution of flow variables. In particular, we display all the outflows of the ‘Foil’ process, which is a subprocess of semi-manufacturing. The outflow processes of ‘Foil’ are ‘InternalRemelting’, ‘Manuf_TAU’, ‘Manuf_TOT’, ‘Manuf_POT’, ‘Manuf_CD’. Here one can see from the pairplots that it’s very uncommon for the total of any two outflows to exceed around 6666 Mt, which is consistent with the data which has the total outflow of ‘Foil’ at 6.46.46.46.4 Mt.

Refer to caption
Figure 7: Pairplots for the outflows of the Foil process, scenario A. Axis units are in Mt.

A.4 Detail of prior parameters for the aluminium model

In this section we give detail on the choice of prior parameters for the model evaluation of the aluminium dataset, A summary of the parameter values described in this section can be found in Table 1. Below we describe and motivate the choices:

The prior mode μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or μj,ksubscript𝜇𝑗𝑘\mu_{j,k}italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT of flow and change in stock variables respectively are chosen to be equal to the nearest power of 10101010 of the reported value when it is available. For example, the flow from ‘Mining’ to ‘Refining’ with a reported value 35.435.435.435.4 Mt is assigned a prior mode of μ1,2=10.0subscript𝜇1210.0\mu_{1,2}=10.0italic_μ start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT = 10.0 Mt. If the flow or change in stock variable has no reported value, it is assigned an uninformative prior mode of 1.01.01.01.0 Mt instead.

The prior standard deviation σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for the change in stock variables are chosen to be σi=max(40|μi|,0.1)subscript𝜎𝑖40subscript𝜇𝑖0.1\sigma_{i}=\max(\sqrt{40}|\mu_{i}|,0.1)italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_max ( square-root start_ARG 40 end_ARG | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | , 0.1 ), or 10.010.010.010.0 if there is no reported value for the flow or change in stock variable. The corresponding parameter σj,ksubscript𝜎𝑗𝑘\sigma_{j,k}italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT for the flow variables are chosen similarly. For example, σ1,2=4010.0=63.25subscript𝜎124010.063.25\sigma_{1,2}=\sqrt{40}*10.0=63.25italic_σ start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT = square-root start_ARG 40 end_ARG ∗ 10.0 = 63.25 for the flow from ‘Mining’ to ‘Refining’. This choice is motivated by when μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is specified to the correct order of magnitude, the true value of the variable is at most 4.5|μi|4.5subscript𝜇𝑖4.5|\mu_{i}|4.5 | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | away, and so σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is chosen on a similar scale but somewhat larger to ensure the prior is not over-confident. The maximum here is to ensure a minimum prior standard deviation of 0.10.10.10.1.

Note we are not advocating for this choice of prior distribution to be uniquely correct. The overall rationale of this choice for the prior is to assign a weakly informative prior distribution with a prior mode that captures the order of magnitude of the flow or change in stock, coupled with a relatively large prior variance, in order to reflect a realistic baseline of domain knowledge to aim for when conducting MFA.

For lack of better information, the standard deviation parameters for change in stock data noise τisubscript𝜏superscript𝑖\tau_{i^{\prime}}italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are chosen to be τi=max(0.1|Yi|,0.1)subscript𝜏superscript𝑖0.1subscript𝑌superscript𝑖0.1\tau_{i^{\prime}}=\max(0.1|Y_{i^{\prime}}|,0.1)italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = roman_max ( 0.1 | italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | , 0.1 ), where Yisubscript𝑌superscript𝑖Y_{i^{\prime}}italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are the observed change in stock data. The corresponding parameters τjsubscript𝜏superscript𝑗\tau_{j^{\prime}}italic_τ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for flow data noise is chosen similarly. In other words, the data standard deviation is equal to 10%percent1010\%10 % of the observed data as in Lupton and Allwood [2018], but with a minimum of 0.10.10.10.1 to account for a minimum degree of uncertainty from rounding or other measurement errors. For the ratio data noise, the standard deviation parameters τksubscript𝜏superscript𝑘\tau_{k^{\prime}}italic_τ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are chosen with a somewhat smaller minimum τk=max(0.1Yk,0.01)subscript𝜏superscript𝑘0.1subscript𝑌superscript𝑘0.01\tau_{k^{\prime}}=\max(0.1Y_{k^{\prime}},0.01)italic_τ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = roman_max ( 0.1 italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 0.01 ), where Yksubscript𝑌superscript𝑘Y_{k^{\prime}}italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the observed ratio. While the choice of standard deviation parameters here is somewhat arbitrary, the posterior predictive checks performed on the aluminium model do not exhibit extreme Bayesian p-values (smaller than 0.05 or greater than 0.95), which suggests the standard deviation parameters chosen are reasonable. This also suggests one way to tune standard deviation parameters is to start with a small choice of standard deviation parameters (such as 10%percent1010\%10 % of the observed data value), run the model, identify the data points which exhibit extreme Bayesian p-values in the posterior predictive checks, and increase the standard deviation parameter for those points and rerun the model until no extreme Bayesian p-values remain.

For the mass conservation conditions, a small constant standard deviation of τl=0.05subscript𝜏superscript𝑙0.05\tau_{l^{\prime}}=0.05italic_τ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0.05 is chosen for all mass balance conditions. Ideally these parameters should be chosen based on the modeller’s confidence regarding epistemic uncertainty in the system, giving higher standard deviations to processes where there is more uncertainty in the system definition. Like with the standard deviation parameters for the data, posterior predictive checks can help select suitable parameter values, where the standard deviation parameters for mass balance conditions with extreme Bayesian p-values should be increased until the Bayesian p-values are no longer extreme. Without prior confidence of epistemic uncertainty in the system, we recommend starting with a small constant standard deviation for the mass conservation conditions, but not too small so that the mass conservation conditions are sufficiently relaxed to ensure the NUTS algorithm converges well. In this case it turned out the choice of τl=0.05subscript𝜏superscript𝑙0.05\tau_{l^{\prime}}=0.05italic_τ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0.05 did not produce any extreme Bayesian p-values, produces well converged posterior samples and well mass balanced flows (see subsequent section for more detail).

Parameter Parameter prior value
μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT nearest power of 10101010 of reported value, or 1.01.01.01.0 if unavailable
μj,ksubscript𝜇𝑗𝑘\mu_{j,k}italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT nearest power of 10101010 of reported value, or 1.01.01.01.0 if unavailable
σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT max(40|μi|,0.1)40subscript𝜇𝑖0.1\max(\sqrt{40}|\mu_{i}|,0.1)roman_max ( square-root start_ARG 40 end_ARG | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | , 0.1 ), or 10.010.010.010.0 if reported value is unavailable
σj,ksubscript𝜎𝑗𝑘\sigma_{j,k}italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT max(40μj,k,0.1)40subscript𝜇𝑗𝑘0.1\max(\sqrt{40}\mu_{j,k},0.1)roman_max ( square-root start_ARG 40 end_ARG italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT , 0.1 ), or 10.010.010.010.0 if reported value is unavailable
τisubscript𝜏superscript𝑖\tau_{i^{\prime}}italic_τ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT max(0.1|Yi|,0.1)0.1subscript𝑌superscript𝑖0.1\max(0.1|Y_{i^{\prime}}|,0.1)roman_max ( 0.1 | italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | , 0.1 )
τjsubscript𝜏superscript𝑗\tau_{j^{\prime}}italic_τ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT max(0.1Yj,0.1)0.1subscript𝑌superscript𝑗0.1\max(0.1Y_{j^{\prime}},0.1)roman_max ( 0.1 italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 0.1 )
τksubscript𝜏superscript𝑘\tau_{k^{\prime}}italic_τ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT max(0.1Yk,0.01)0.1subscript𝑌superscript𝑘0.01\max(0.1Y_{k^{\prime}},0.01)roman_max ( 0.1 italic_Y start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 0.01 )
τlsubscript𝜏superscript𝑙\tau_{l^{\prime}}italic_τ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 0.05
Table 1: Table of parameters for the aluminium model

A.5 Assessment of posterior mass balance for the aluminium model

In this section we analyse how well the mass balance conditions are satisfied by the aluminium model, given the parameters chosen in the previous section, in particular the standard deviation parameters τl=0.05subscript𝜏superscript𝑙0.05\tau_{l^{\prime}}=0.05italic_τ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0.05 for the mass balance conditions. We examine the posterior distribution of the mass balance conditions to see how well they are centred around 00.

Refer to caption
Refer to caption
Figure 8: posterior HDI lengths for the mass conservation conditions. On the top we have scenario A and on the bottom we have scenario B. The 95%percent9595\%95 % posterior highest density intervals are shown with a red bar and the posterior mean is shown with a blue dot

From the Figure above, it can be seen that the posterior mean of the mass balance conditions is very close to 00 in both cases, having a mean of 0.0010.001-0.001- 0.001 Mt for both scenario A and scenario B when averaged over all processes, which is orders of magnitudes smaller than the average posterior flow magnitude of around 2.32.32.32.3 Mt for both scenario A and B. This ensures that the posterior mean of the flow and stock change variables in the system, which is also the point estimate we report, is very well balanced. The individual posterior samples are somewhat less mass balanced, exhibiting a standard deviation of around 0.050.050.050.05 Mt for all processes. However, this is still a small level of mass imbalance and it is not necessary for every posterior sample to be perfectly mass balanced for the model to produce useful point estimation or uncertainty quantification.

A.6 Zinc cycle simulation study

Refer to caption
Figure 9: Zinc cycle for China, ca. 1994-1998, adapted from Figure 2d of Graedel et al. [2005]. The unit of mass for zinc is displayed in 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT metric tons per year. we explicitly define an ‘Unknown’ Process to account for flows with an unknown source or destination.

We conduct a simulation study to investigate the estimation accuracy and uncertainty quantification of our model. For this end we consider the zinc cycle in China, ca. 1994-1998, from Graedel et al. [2005]. The zinc cycle contains no missing data values, so for the purposes of this simulation, we treat the reported data values as the true parameter values of the flow and stock change variables. This allows us to assess our model by withholding some data from it and checking the model output against the true values of the stock change and flows. We examine how our model performs in terms of estimation accuracy and uncertainty quantification under different levels of prior knowledge, as data is incrementally added into the model. The zinc model considered is linear for ease of comparison with other models that require data to be in the form of a design matrix. However, we note that common MFA data types including flows, flow ratios and stock changes can be expressed under a linear model.

To assess estimation accuracy of the model, we calculate how the error of the model evolves as data is randomly added into the model one at a time, starting from no data, until the full dataset becomes available to the model. We consider two types of error, the root mean square error (rmse) and the maximum error, of the model’s predictions compared to the reported values of the flow and change in stocks variables. The posterior mode is taken as the model prediction. To account for the random order in which data is added into the model, we repeat this process 50 times and calculate the average error (for both rmse and the maximum error) over the 50 runs for each possible data size from 00 to 20202020.

For comparison, we consider several alternate models. First, we consider an alternate Bayesian Gaussian model which has the advantage of a closed form solution, but carries the disadvantage of assigning significant probability to negative values for flow variables in the posterior distribution, which should be nonnegative as negative flows are not physically meaningful. Detail of the Gaussian model is provided in Section A.8. In addition, we also investigate the effect of different levels of prior knowledge has on the error. We utilise two different levels of prior knowledge, the first is an ‘uninformative’ prior which sets the prior mode of all variables to the average absolute value of the full data, and the correct sign for change in stock variables. This intuitively represents a prior that knows the average order of magnitude of the system but has little visibility on the individual magnitude of each flow or change in stock. The second is a ‘weakly informative’ prior which sets the prior mode of each flow and change in stock variable to the nearest power of 10101010 of the true value. This intuitively represents a prior that has knowledge of each flow and change in stock variable to the nearest order of magnitude. We also provide results from non Bayesian regression models for comparison, specifically ridge regression and multilayer perceptron (see e.g. Hastie et al. [2001]). For each run, both algorithms are trained on the same datasets as the proposed and Gaussian model, and outputs a prediction on the vector of all flows and change in stock variables. While these models do not have a principled way of incorporating prior information, one can nevertheless train those models on the data and use them to estimate the value of all flow and change in stock variables in a system, which makes a comparison with Bayesian MFA models meaningful.

Figure 10 shows the results of our analysis, where the rmse and maximum error of the model are plotted against the number of data points available to the model, averaged over 50 runs. For ridge regression and multilayer perceptron, the starting rmse at no data is around 5.4, whereas for the uninformative prior it is around 4.0 and for the weakly informative prior around 1.5. As data is added, all models exhibit the trend of monotonically decreasing error, but the proposed and Gaussian models exhibit noticeably lower error until almost all the data becomes available to the model, indicating incorporating prior information leads to more accurate estimates. Notably, it takes around 10 data points for the uninformative prior models and 12 data points for ridge regression to achieve the starting rsme (around 1.5) of the weakly informative prior model. Therefore, one can consider only knowing the order of magnitude of each flow or change in stock to be roughly equivalent to 10101010 data points in terms of rmse in this model, demonstrating the utility of being able to incorporate knowledge via a Bayesian prior in such an underdetermined system. For the Gaussian model, theoretical bounds on the mean square error provided in Section A.8 also support the trend exhibited in Figure 10, and the intuition that having an informative prior can greatly increase the model’s accuracy for MFA problems when there is a shortage of data but available expert domain knowledge.

Refer to caption
Refer to caption
Figure 10: Zinc flow data error analysis: (left) Root mean squared error RMSE of the model plotted against number of data points available to the model, averaged over 50 runs or datasets. (right) Maximum error of the model plotted against the number of data points available to the model, averaged over 50 runs or datasets. For each error type, our model and the Gaussian model are considered under uninformative and informative priors. Ridge regression is provided as a non Bayesian baseline.

To assess uncertainty quantification of the posterior distribution, we examine whether the 95%percent9595\%95 % posterior marginal highest density intervals (HDIs) of our model have similar level of frequentist coverage, i.e. form correct confidence intervals. Unlike credible intervals which have a designated posterior probability of containing the unknown parameter, confidence intervals are constructed such that upon repeated experiments (i.e. generate samples of data), a designated proportion (known as coverage) of the confidence intervals constructed will contain the true value of the unknown parameter. Bayesian credible intervals are in general not guaranteed to have good coverage, particularly in high-dimensional settings that are typical in MFA, but nevertheless it is desirable for them to have good coverage. Coverage is a property of the model procedure and does not require the repeated experiments to be the same experiment or dataset. It is therefore desirable for uncertainty quantification generated by a model procedure to contain the true value of the unknown parameter most of the time, over many runs for possibly different datasets. We investigate the coverage of our model’s credible intervals empirically in order to understand whether they can be used reliably as confidence sets.

The high dimensional nature of MFA models means the dimension of parameter space p𝑝pitalic_p (corresponding to the number of flow and change in stock variables in the model) is usually greater or similar in size compared to the number of available data n𝑛nitalic_n, and n𝑛nitalic_n is typically not large in magnitude. Therefore, results which guarantees asymptotic frequentist coverage of credible regions, such as the Bernstein-von Mises theorem (see e.g. Vaart [1998]), may not apply in MFA problems. We therefore study in simulations the frequentist coverage of our model’s posterior credible intervals, in an effort to understand whether these can be used as reliable confidence intervals. Specifically, we sample 300300300300 sets of the zinc dataset 𝒀1,𝒀2,,𝒀300subscript𝒀1subscript𝒀2subscript𝒀300\bm{Y}_{1},\bm{Y}_{2},\dots,\bm{Y}_{300}bold_italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_Y start_POSTSUBSCRIPT 300 end_POSTSUBSCRIPT from the model likelihood p(𝒀|𝜽0)𝑝conditional𝒀subscript𝜽0p(\bm{Y}|\bm{\theta}_{0})italic_p ( bold_italic_Y | bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), where we take the ‘true values’ 𝜽0subscript𝜽0\bm{\theta}_{0}bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to be the originally reported flow and change in stock values reported in Graedel et al. [2005] . For each dataset sampled, we add data to the model in batches of 5555 datapoints at a time (in the order listed from top to bottom in Table 2), and calculate the posterior marginal HDIs to see if they contain the true value for each dataset size n=5,10,15,20𝑛5101520n=5,10,15,20italic_n = 5 , 10 , 15 , 20. The frequencies at which the posterior marginal HDIs contain the true value over the 300300300300 runs, known as the coverage probability, are reported in Table 2.

Coverage probability(%), average width of 95%percent9595\%95 % HDI
Weakly informative prior Uninformative prior
Variable Name 5 data 10 data 15 data 20 data 5 data 10 data 15 data 20 data
ΔΔ\Deltaroman_ΔStock: Use 95.3, 3.60 96.3, 3.44 98.0, 3.16 96.3, 3.15 96.0, 3.76 95.3, 3.63 97.7, 3.23 95.7, 3.21
Production to ImportExport 90.7, 3.56 96.3, 3.40 96.7, 3.41 98.0, 2.92 91.3, 3.78 95.7, 3.65 97.7, 3.60 96.0, 3.15
ΔΔ\Deltaroman_ΔStock: Environment 98.0, 3.50 97.0, 3.35 97.3, 3.35 98.7, 3.18 95.0, 3.66 97.3, 3.44 96.0, 3.46 98.3, 3.25
WM to Production 94.7, 3.39 96.0, 3.15 96.7, 3.12 93.0, 3.02 94.7, 3.54 94.0, 3.47 94.7, 3.35 91.3, 3.24
ImportExport to WM 100.0, 0.83 100.0, 0.82 100.0, 0.82 100.0, 0.79 96.0, 2.29 99.3, 2.16 97.0, 2.12 98.7, 1.82
Unknown to WM 100.0, 5.07 99.0, 2.48 99.3, 2.18 100.0, 2.19 100.0, 8.52 99.0, 2.53 98.0, 2.42 97.7, 2.48
Lithosphere to Production 100.0, 7.73 96.3, 3.37 95.3, 3.38 93.7, 2.87 0.00, 11.3 96.0, 3.63 94.7, 3.55 94.7, 2.97
ΔΔ\Deltaroman_ΔStock: Production 100.0, 0.39 100.0, 0.39 100.0, 0.39 100.0, 0.39 100.0, 16.1 95.0, 3.67 95.3, 3.59 96.0, 3.50
Production to F&M 100.0, 7.69 94.7, 3.32 99.0, 3.19 93.7, 3.06 3.33, 8.25 89.0, 3.55 91.7, 3.39 87.0, 3.21
WM to Environment 100.0, 4.19 98.0, 2.69 98.7, 2.50 99.0, 2.55 100.0, 4.88 96.3, 2.98 97.3, 2.72 95.7, 2.72
F&M to Use 100.0, 7.54 99.7, 7.20 96.3, 3.03 93.0, 2.95 100.0, 9.55 100.0, 9.57 96.3, 3.14 93.3, 3.02
ΔΔ\Deltaroman_ΔStock: Unknown 100.0, 6.58 100.0, 4.62 100.0, 2.97 98.7, 2.97 100.0, 13.3 100.0, 7.20 98.0, 3.16 99.3, 3.16
Production to Unknown 100.0, 0.20 100.0, 0.20 100.0, 0.20 100.0, 0.20 100.0, 9.31 97.7, 4.90 93.7, 2.03 96.7, 2.01
F&M to WM 100.0, 6.44 100.0, 6.52 93.3, 3.30 94.3, 3.12 100.0, 10.6 100.0, 11.7 93.3, 3.45 93.7, 3.21
Use to WM 100.0, 5.91 100.0, 5.93 98.3, 2.60 98.0, 2.54 100.0, 7.32 100.0, 7.71 98.0, 2.63 98.7, 2.57
WM to F&M 100.0, 7.79 100.0, 7.29 100.0, 3.79 98.3, 2.63 95.0, 14.1 100.0, 12.4 100.0, 4.16 97.7, 2.74
ΔΔ\Deltaroman_ΔStock: ImportExport 97.0, 5.21 99.3, 5.10 99.7, 5.10 96.0, 3.09 84.0, 9.63 90.3, 7.30 95.3, 6.55 93.0, 3.26
ΔΔ\Deltaroman_ΔStock: Lithosphere 100.0, 8.21 99.0, 4.95 99.3, 4.95 95.3, 3.04 0.00, 12.2 99.0, 5.24 98.7, 5.18 95.7, 3.11
ImportExport to F&M 100.0, 0.86 100.0, 0.85 100.0, 0.82 100.0, 0.79 100.0, 7.19 100.0, 4.39 100.0, 3.17 98.0, 1.80
Production to Environment 100.0, 3.77 100.0, 3.74 99.3, 3.92 95.0, 2.90 100.0, 4.20 99.0, 3.74 99.7, 4.14 96.0, 2.97
Table 2: Table of coverage probability and average width of 95% highest density intervals (HDI) of the marginal posterior distributions of all flow and change in stock variables in the zinc data under different amounts of data and prior information, computed over 300300300300 runs. Data was added in batches of 5555 to the model in the order that they are listed in this table (from top to bottom).

From Table 2, it can be seen that the 95% HDI possess similar levels of coverage probability in most cases. The only notable exception being the uninformative prior at 5555 data points, where the coverage probability are close to 00 for the flow variables ‘Lithosphere to Production’ and ‘Production to F& M’, and the change in stock variable for ‘Lithosphere’. This is likely caused by the first 5555 data points not containing any data on the ‘Lithosphere’ process, as well as containing little data on ‘Production’, so the model was unable deduce the true magnitude of how much material was taken from the ‘Lithosphere’ with an uninformative prior. On the other hand, the weakly informative prior does not possess this issue as the prior has sufficient coverage over the true value. In addition, the average width of the 95%percent9595\%95 % HDI are generally lower for the weakly informative prior until data for that variable becomes available, which is expected from a more informative prior.

A.7 Detail of parameters for the zinc models

In this section we give detail on the choice of model parameters for all models used for the zinc case study, including how the prior parameters were chosen. Recall that we considered two different priors for the zinc model, an uninformative prior and a weakly informative prior. For the uninformative prior, the prior modes μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the change in stock variables are chosen to be 3.65753.65753.65753.6575 (measured in 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT tonnes/Yr), equal to the mean of the absolute value of all the stock change and flow variables in the data, to represent a prior that only has a rough knowledge of the average order of magnitude of the system, but not the order of magnitude of each individual flow or change in stock. For weakly informative prior, μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are chosen the same way as the aluminium case, where the prior mode is set to the nearest power of 10101010 of the ‘true’ reported value. The prior standard deviation parameters σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are chosen to be a constant 4040\sqrt{40}square-root start_ARG 40 end_ARG in the uninformative case, and σi=max(min(4|μi|,4),0.1)subscript𝜎𝑖4subscript𝜇𝑖40.1\sigma_{i}=\max(\min(4|\mu_{i}|,4),0.1)italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_max ( roman_min ( 4 | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | , 4 ) , 0.1 ) in the weakly informative case. Like in the aluminium model, σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for the weakly informative prior is chosen at a similar scale compared to 4.5|μi|4.5subscript𝜇𝑖4.5|\mu_{i}|4.5 | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | but somewhat smaller and capped at a maximum of 4.04.04.04.0 to reflect the fact that the size of flows are relatively smaller in the zinc data (when measured in 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT tonnes/Yr, compared to the aluminium data which is measured in Mt). The corresponding prior parameters for the flow variables μj,ksubscript𝜇𝑗𝑘\mu_{j,k}italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT and σj,ksubscript𝜎𝑗𝑘\sigma_{j,k}italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT are chosen similarly.

The noise standard deviation parameters of the observed data is chosen to be a constant τ=1.0𝜏1.0\tau=1.0italic_τ = 1.0 throughout for all data, and similarly for the mass conservation conditions. Here the noise parameters are chosen to be somewhat larger than the aluminium model to speed up the computation of the No-U-Turn Sampler MCMC algorithm, as the model needed to be computed 2400240024002400 times over the varying degrees of prior knowledge and available data to generate the coverage probability results in Table 1 of the main text. In particular, small values of the noise parameter induces high curvature which makes the sampling algorithm take longer to explore the posterior distribution. Similarly, for assessing point estimation (root mean squared error and maximum error) of the model on the zinc data, the posterior mode of the model is estimated directly via optimisation for faster computational speed, rather than through approximating the entire posterior distribution using MCMC.

Parameter Parameter prior value (weakly informative prior) Parameter prior value (uninformative prior)
μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT nearest power of 10101010 of reported value ±3.6575plus-or-minus3.6575\pm 3.6575± 3.6575 (sign depending if the change in stock is positive or negative)
μj,ksubscript𝜇𝑗𝑘\mu_{j,k}italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT nearest power of 10101010 of reported value 3.65753.65753.65753.6575
σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT max(min(4|μi|,4),0.1)4subscript𝜇𝑖40.1\max(\min(4|\mu_{i}|,4),0.1)roman_max ( roman_min ( 4 | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | , 4 ) , 0.1 ) 4040\sqrt{40}square-root start_ARG 40 end_ARG
σj,ksubscript𝜎𝑗𝑘\sigma_{j,k}italic_σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT max(min(4μj,k,4),0.1)4subscript𝜇𝑗𝑘40.1\max(\min(4\mu_{j,k},4),0.1)roman_max ( roman_min ( 4 italic_μ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT , 4 ) , 0.1 ) 4040\sqrt{40}square-root start_ARG 40 end_ARG
τ𝜏\tauitalic_τ 1.0 1.0
Table 3: Table of parameters for the zinc model

The ridge regression model was trained with a regularization parameter of λ=1.0/40.0=0.025𝜆1.040.00.025\lambda=1.0/40.0=0.025italic_λ = 1.0 / 40.0 = 0.025, which was chosen to match the prior variance and noise standard deviation in the uninformative prior case for the Gaussian model. To see this, note the solution of the ridge regression is equal to the posterior mode of a model with prior 𝜽𝒩(0,σ2I)similar-to𝜽𝒩0superscript𝜎2𝐼\bm{\theta}\sim\mathcal{N}(0,\sigma^{2}I)bold_italic_θ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) and likelihood 𝒀|𝜽𝒩(X𝜽,τ2I)similar-toconditional𝒀𝜽𝒩𝑋𝜽superscript𝜏2𝐼\bm{Y}|\bm{\theta}\sim\mathcal{N}(X\bm{\theta},\tau^{2}I)bold_italic_Y | bold_italic_θ ∼ caligraphic_N ( italic_X bold_italic_θ , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ), and the regularization parameter can be shown to be equal to τ2/σ2superscript𝜏2superscript𝜎2\tau^{2}/\sigma^{2}italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. So effectively the ridge regression solution is similar to the uninformative prior case, except the prior mean is set to 00 instead which is arguably less informative.

The multilayer perceptron regressor was trained with the default settings in the sklearn Python library, except with the maximum number of iterations increased to 1000100010001000. In particular the default loss function is the squared loss and default number of hidden layers equal to 100100100100.

A.8 Gaussian model

For model comparison purposes and to gain intuition into how Bayesian models perform for MFA problems, it is useful to examine a model with simplified assumptions. In particular, we consider a Gaussian model, where both the prior and the likelihood function are assumed to be normally distributed, and the data is assumed to be linear in terms of 𝜽𝜽\bm{\theta}bold_italic_θ (recall from the Methods section in the main text of the paper that the most common forms of MFA data can be expressed linearly in 𝜽𝜽\bm{\theta}bold_italic_θ):

𝜽𝜽\displaystyle\bm{\theta}bold_italic_θ 𝒩(𝝁,Σ)similar-toabsent𝒩𝝁Σ\displaystyle\sim\mathcal{N}(\bm{\mu},\Sigma)∼ caligraphic_N ( bold_italic_μ , roman_Σ ) (24)
Yi|𝜽conditionalsubscript𝑌𝑖𝜽\displaystyle Y_{i}|\bm{\theta}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_θ 𝒩(𝒙𝒊𝜽,τi2)similar-toabsent𝒩superscriptsubscript𝒙𝒊top𝜽superscriptsubscript𝜏𝑖2\displaystyle\sim\mathcal{N}(\bm{x_{i}}^{\top}\bm{\theta},\tau_{i}^{2})∼ caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ , italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (25)

where 𝝁𝝁\bm{\mu}bold_italic_μ is the prior mean and ΣΣ\Sigmaroman_Σ is the (positive definite) prior covariance matrix. This gives a mathematically convenient conjugate posterior that is also normally distributed, with its posterior mean 𝝁nsuperscript𝝁𝑛\bm{\mu}^{n}bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and covariance ΣnsuperscriptΣ𝑛\Sigma^{n}roman_Σ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT available in closed form:

𝝁nsuperscript𝝁𝑛\displaystyle\bm{\mu}^{n}bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT =𝝁+ΣX(XΣX+T)1(𝒀X𝝁)absent𝝁Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋top𝑇1𝒀𝑋𝝁\displaystyle=\bm{\mu}+\Sigma X^{\top}(X\Sigma X^{\top}+T)^{-1}(\bm{Y}-X\bm{% \mu})= bold_italic_μ + roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_T ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_Y - italic_X bold_italic_μ ) (26)
ΣnsuperscriptΣ𝑛\displaystyle\Sigma^{n}roman_Σ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT =ΣΣX(XΣX+T)1XΣabsentΣΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋top𝑇1𝑋Σ\displaystyle=\Sigma-\Sigma X^{\top}(X\Sigma X^{\top}+T)^{-1}X\Sigma= roman_Σ - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_T ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X roman_Σ (27)

where T𝑇Titalic_T is a n𝑛nitalic_n by n𝑛nitalic_n diagonal matrix with entries τi2superscriptsubscript𝜏𝑖2\tau_{i}^{2}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n. An advantage of the Gaussian model is that it allows the posterior mean and covariance to be calculated directly without potentially computationally intensive MCMC algorithms. This makes the Gaussian model an easy to compute baseline to compare with more complicated Bayesian models. Furthermore, owing to the closed form of the posterior, we can bound the mean squared error (MSE) between the posterior mean 𝝁nsuperscript𝝁𝑛\bm{\mu}^{n}bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and the true parameter (stock changes and flows) values 𝜽superscript𝜽\bm{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. A discussion on this is provided below, which gives some intuition why Bayesian methods can perform better than non Bayesian methods in the low data setting npmuch-less-than𝑛𝑝n\ll pitalic_n ≪ italic_p.

While computationally convenient, an disadvantage with the Gaussian conjugate model is that the posterior distributions for flow variables can contain non negligible probability on negative values when there is insufficient data or uninformative priors, which is not physically meaningful as flow quantities should be strictly positive, making it not fully Bayesian in this sense. While the posterior mean of the Gaussian model can perform adequately in terms of point estimation, we do not recommend it for uncertainty quantification without very informative priors.

Nevertheless, the Gaussian model usefully gives some intuition into why Bayesian methods perform better than non Bayesian methods in the low data setting npmuch-less-than𝑛𝑝n\ll pitalic_n ≪ italic_p. Below we provide some theoretical analysis for the mean squared error of the Gaussian model.

Owing to the closed form of the posterior of the Gaussian model described in the Methods section of the main text, we can bound the mean squared error (MSE) between the posterior mean 𝝁nsuperscript𝝁𝑛\bm{\mu}^{n}bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and the true parameter (stocks changes and flows) values 𝜽superscript𝜽\bm{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as follows:

Theorem 1.

Let τi2=τ2superscriptsubscript𝜏𝑖2superscript𝜏2\tau_{i}^{2}=\tau^{2}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, in other words T=τ2I𝑇superscript𝜏2𝐼T=\tau^{2}Iitalic_T = italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I, I𝐼Iitalic_I the identity matrix. The mean squared error between the posterior mean of the Gaussian posterior 𝛍nsuperscript𝛍𝑛\bm{\mu}^{n}bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and the true stock change and flow values 𝛉superscript𝛉\bm{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can be bounded above in the following way:

𝔼(𝜽𝝁n22)=𝔼(j=1p(θjμjn)2)𝔼subscriptsuperscriptnormsuperscript𝜽superscript𝝁𝑛22𝔼superscriptsubscript𝑗1𝑝superscriptsuperscriptsubscript𝜃𝑗superscriptsubscript𝜇𝑗𝑛2\displaystyle\mathbb{E}(\|\bm{\theta}^{*}-\bm{\mu}^{n}\|^{2}_{2})=\mathbb{E}(% \sum_{j=1}^{p}(\theta_{j}^{*}-\mu_{j}^{n})^{2})blackboard_E ( ∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
Tr(Σ1/2(𝜽𝝁)(𝜽𝝁)Σ1/2)λ1(pn+j=1nτ4(dj2+τ2)2)+τ2j=1nλ1dj2(dj2+τ2)2absentTrsuperscriptΣ12superscript𝜽𝝁superscriptsuperscript𝜽𝝁topsuperscriptΣ12subscript𝜆1𝑝𝑛superscriptsubscript𝑗1𝑛superscript𝜏4superscriptsuperscriptsubscript𝑑𝑗2superscript𝜏22superscript𝜏2superscriptsubscript𝑗1𝑛subscript𝜆1superscriptsubscript𝑑𝑗2superscriptsuperscriptsubscript𝑑𝑗2superscript𝜏22\displaystyle\leq\operatorname{Tr}(\Sigma^{-1/2}(\bm{\theta}^{*}-\bm{\mu})(\bm% {\theta}^{*}-\bm{\mu})^{\top}\Sigma^{-1/2})\lambda_{1}(p-n+\sum_{j=1}^{n}\frac% {\tau^{4}}{(d_{j}^{2}+\tau^{2})^{2}})+\tau^{2}\sum_{j=1}^{n}\frac{\lambda_{1}d% _{j}^{2}}{(d_{j}^{2}+\tau^{2})^{2}}≤ roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p - italic_n + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_τ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (28)

where λ1λ2λp>0subscript𝜆1subscript𝜆2subscript𝜆𝑝0\lambda_{1}\geq\lambda_{2}\geq\dots\geq\lambda_{p}>0italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ ⋯ ≥ italic_λ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT > 0 are the eigenvalues of ΣΣ\Sigmaroman_Σ in descending order, and 0d1d2dn0subscript𝑑1subscript𝑑2subscript𝑑𝑛0\leq d_{1}\leq d_{2}\leq\dots\leq d_{n}0 ≤ italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ ⋯ ≤ italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are the singular values of XΣ1/2𝑋superscriptΣ12X\Sigma^{1/2}italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT in ascending order.

The bound in Equation 28 helps to give some insight into how the mean squared error of the model evolves as more data is added. This is perhaps easiest to see in the case when τ𝜏\tauitalic_τ is near 00 (meaning the data is almost noiseless) and 0τd10𝜏much-less-thansubscript𝑑10\leq\tau\ll d_{1}0 ≤ italic_τ ≪ italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , which leaves

Tr(Σ1/2(𝜽𝝁)(𝜽𝝁)Σ1/2)λ1(pn)TrsuperscriptΣ12superscript𝜽𝝁superscriptsuperscript𝜽𝝁topsuperscriptΣ12subscript𝜆1𝑝𝑛\operatorname{Tr}(\Sigma^{-1/2}(\bm{\theta}^{*}-\bm{\mu})(\bm{\theta}^{*}-\bm{% \mu})^{\top}\Sigma^{-1/2})\lambda_{1}(p-n)roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p - italic_n )

as the dominant term. The Tr(Σ1/2(𝜽𝝁)(𝜽𝝁)Σ1/2)TrsuperscriptΣ12superscript𝜽𝝁superscriptsuperscript𝜽𝝁topsuperscriptΣ12\operatorname{Tr}(\Sigma^{-1/2}(\bm{\theta}^{*}-\bm{\mu})(\bm{\theta}^{*}-\bm{% \mu})^{\top}\Sigma^{-1/2})roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) part of this term can be interpreted as a measurement of how close the prior mean is to the true values of the parameters (it is 00 when 𝜽=𝝁superscript𝜽𝝁\bm{\theta}^{*}=\bm{\mu}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = bold_italic_μ) and does not depend on the data, while the factor λ1(pn)subscript𝜆1𝑝𝑛\lambda_{1}(p-n)italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p - italic_n ) decreases as n𝑛nitalic_n increases, meaning as more data is added. This indicates that in a high dimensional setting npmuch-less-than𝑛𝑝n\ll pitalic_n ≪ italic_p which is inherent in MFA studies, the availability of informative priors is key to obtaining more accurate estimates of the true stock change and flow values.

Furthermore, in the case when Σ=λ1IΣsubscript𝜆1𝐼\Sigma=\lambda_{1}Iroman_Σ = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_I, Equation 28 can be directly compared with the ridge regression case. Since ridge regression is equivalent to the case when 𝝁=𝟎𝝁0\bm{\mu}=\bm{0}bold_italic_μ = bold_0 and Σ=λ1IΣsubscript𝜆1𝐼\Sigma=\lambda_{1}Iroman_Σ = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_I, Equation 28 is smaller for the Gaussian model compared to ridge regression whenever Tr((𝜽𝝁)(𝜽𝝁))=𝜽𝝁22Trsuperscript𝜽𝝁superscriptsuperscript𝜽𝝁topsubscriptsuperscriptnormsuperscript𝜽𝝁22\operatorname{Tr}((\bm{\theta}^{*}-\bm{\mu})(\bm{\theta}^{*}-\bm{\mu})^{\top})% =\|\bm{\theta}^{*}-\bm{\mu}\|^{2}_{2}roman_Tr ( ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = ∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is smaller than 𝜽22subscriptsuperscriptnormsuperscript𝜽22\|\bm{\theta}^{*}\|^{2}_{2}∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. In other words whenever the prior mean 𝝁𝝁\bm{\mu}bold_italic_μ is closer to the true value in (L2) distance than the zero vector 𝟎0\bm{0}bold_0, which is consistent with the simulations presented in Section A.6.

Proof of Theorem 1.

Recall that the posterior mean of the conjugate Gaussian model can be written as:

𝝁n=𝝁+ΣX(XΣX+τ2I)1(X(𝜽𝝁)+ϵ)superscript𝝁𝑛𝝁Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋superscript𝜽𝝁bold-italic-ϵ\displaystyle\bm{\mu}^{n}=\bm{\mu}+\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I)% ^{-1}(X(\bm{\theta}^{*}-\bm{\mu})+\bm{\epsilon})bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = bold_italic_μ + roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_X ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) + bold_italic_ϵ )

This can be derived from looking at the joint distribution [𝜽,X𝜽+ϵ]superscript𝜽𝑋𝜽bold-italic-ϵtop[\bm{\theta},X\bm{\theta}+\bm{\epsilon}]^{\top}[ bold_italic_θ , italic_X bold_italic_θ + bold_italic_ϵ ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and using standard Gaussian conditioning formulae:

[𝜽X𝜽+ϵ]𝒩([𝝁X𝝁],[ΣΣXXΣXΣX+τ2I])similar-tomatrix𝜽𝑋𝜽bold-italic-ϵ𝒩matrix𝝁𝑋𝝁matrixΣΣsuperscript𝑋top𝑋Σ𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼\displaystyle\begin{bmatrix}\bm{\theta}\\ X\bm{\theta}+\bm{\epsilon}\end{bmatrix}\sim\mathcal{N}\left(\begin{bmatrix}\bm% {\mu}\\ X\bm{\mu}\end{bmatrix},\begin{bmatrix}\Sigma&\Sigma X^{\top}\\ X\Sigma&X\Sigma X^{\top}+\tau^{2}I\end{bmatrix}\right)[ start_ARG start_ROW start_CELL bold_italic_θ end_CELL end_ROW start_ROW start_CELL italic_X bold_italic_θ + bold_italic_ϵ end_CELL end_ROW end_ARG ] ∼ caligraphic_N ( [ start_ARG start_ROW start_CELL bold_italic_μ end_CELL end_ROW start_ROW start_CELL italic_X bold_italic_μ end_CELL end_ROW end_ARG ] , [ start_ARG start_ROW start_CELL roman_Σ end_CELL start_CELL roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_X roman_Σ end_CELL start_CELL italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_CELL end_ROW end_ARG ] ) (29)

Suppose dim(X)=(n,p)𝑑𝑖𝑚𝑋𝑛𝑝dim(X)=(n,p)italic_d italic_i italic_m ( italic_X ) = ( italic_n , italic_p ), dim(Σ)=(p,p)𝑑𝑖𝑚Σ𝑝𝑝dim(\Sigma)=(p,p)italic_d italic_i italic_m ( roman_Σ ) = ( italic_p , italic_p ), dim(𝝁)=dim(𝜽)=(p,1)𝑑𝑖𝑚𝝁𝑑𝑖𝑚superscript𝜽𝑝1dim(\bm{\mu})=dim(\bm{\theta}^{*})=(p,1)italic_d italic_i italic_m ( bold_italic_μ ) = italic_d italic_i italic_m ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( italic_p , 1 ), dim(ϵ)=(n,1)𝑑𝑖𝑚bold-italic-ϵ𝑛1dim(\bm{\epsilon})=(n,1)italic_d italic_i italic_m ( bold_italic_ϵ ) = ( italic_n , 1 ), and we’ll assume n<p𝑛𝑝n<pitalic_n < italic_p, since material flow analysis typically has less data than parameters. Assume ΣΣ\Sigmaroman_Σ is symmetric positive definite and X𝑋Xitalic_X has full rank. The mean squared error (MSE) between the posterior mean 𝝁nsuperscript𝝁𝑛\bm{\mu}^{n}bold_italic_μ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and the true value 𝜽superscript𝜽\bm{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is equal to:

MSE𝑀𝑆𝐸\displaystyle MSEitalic_M italic_S italic_E =𝔼𝜽𝝁ΣX(XΣX+τ2I)1(X(𝜽𝝁)+ϵ)22absent𝔼subscriptsuperscriptnormsuperscript𝜽𝝁Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋superscript𝜽𝝁bold-italic-ϵ22\displaystyle=\mathbb{E}\|\bm{\theta}^{*}-\bm{\mu}-\Sigma X^{\top}(X\Sigma X^{% \top}+\tau^{2}I)^{-1}(X(\bm{\theta}^{*}-\bm{\mu})+\bm{\epsilon})\|^{2}_{2}= blackboard_E ∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_X ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ ) + bold_italic_ϵ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
=𝔼(𝜷ΣX(XΣX+τ2I)1(X𝜷+ϵ))(𝜷ΣX(XΣX+τ2I)1(X𝜷+ϵ))absent𝔼superscript𝜷Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋𝜷bold-italic-ϵtop𝜷Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋𝜷bold-italic-ϵ\displaystyle=\mathbb{E}(\bm{\beta}-\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I% )^{-1}(X\bm{\beta}+\bm{\epsilon}))^{\top}(\bm{\beta}-\Sigma X^{\top}(X\Sigma X% ^{\top}+\tau^{2}I)^{-1}(X\bm{\beta}+\bm{\epsilon}))= blackboard_E ( bold_italic_β - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_X bold_italic_β + bold_italic_ϵ ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_β - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_X bold_italic_β + bold_italic_ϵ ) )
=𝔼(𝜷(IΣX(XΣX+τ2I)1X)(IΣX(XΣX+τ2I)1X)𝜷)absent𝔼superscript𝜷topsuperscript𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋top𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋𝜷\displaystyle=\mathbb{E}(\bm{\beta}^{\top}(I-\Sigma X^{\top}(X\Sigma X^{\top}+% \tau^{2}I)^{-1}X)^{\top}(I-\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}X)% \bm{\beta})= blackboard_E ( bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) bold_italic_β )
+𝔼(ϵ(ΣX(XΣX+τ2I)1)ΣX(XΣX+τ2I)1ϵ)𝔼superscriptbold-italic-ϵtopsuperscriptΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1topΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1bold-italic-ϵ\displaystyle+\mathbb{E}(\bm{\epsilon}^{\top}(\Sigma X^{\top}(X\Sigma X^{\top}% +\tau^{2}I)^{-1})^{\top}\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}\bm{% \epsilon})+ blackboard_E ( bold_italic_ϵ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_ϵ )

Where 𝜷=𝜽𝝁𝜷superscript𝜽𝝁\bm{\beta}=\bm{\theta}^{*}-\bm{\mu}bold_italic_β = bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_μ. At this point we decompose the MSE into the bias and variance terms. Let

MSE1=𝔼(𝜷(IΣX(XΣX+τ2I)1X)(IΣX(XΣX+τ2I)1X)𝜷)𝑀𝑆subscript𝐸1𝔼superscript𝜷topsuperscript𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋top𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋𝜷MSE_{1}=\mathbb{E}(\bm{\beta}^{\top}(I-\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{% 2}I)^{-1}X)^{\top}(I-\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}X)\bm{% \beta})italic_M italic_S italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_E ( bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) bold_italic_β )

denote the bias term. and

MSE2=𝔼(ϵ(ΣX(XΣX+τ2I)1)ΣX(XΣX+τ2I)1ϵ)𝑀𝑆subscript𝐸2𝔼superscriptbold-italic-ϵtopsuperscriptΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1topΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1bold-italic-ϵMSE_{2}=\mathbb{E}(\bm{\epsilon}^{\top}(\Sigma X^{\top}(X\Sigma X^{\top}+\tau^% {2}I)^{-1})^{\top}\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}\bm{\epsilon})italic_M italic_S italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = blackboard_E ( bold_italic_ϵ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_ϵ )

denote the variance term.

For the bias term, using the fact that the trace of a scalar is equal to itself and the cyclic property of trace, we have:

MSE1=Tr(𝜷(IΣX(XΣX+τ2I)1X)(IΣX(XΣX+τ2I)1X)𝜷)𝑀𝑆subscript𝐸1Trsuperscript𝜷topsuperscript𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋top𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋𝜷\displaystyle MSE_{1}=\operatorname{Tr}(\bm{\beta}^{\top}(I-\Sigma X^{\top}(X% \Sigma X^{\top}+\tau^{2}I)^{-1}X)^{\top}(I-\Sigma X^{\top}(X\Sigma X^{\top}+% \tau^{2}I)^{-1}X)\bm{\beta})italic_M italic_S italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Tr ( bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) bold_italic_β )
=Tr(𝜷𝜷(IΣX(XΣX+τ2I)1X)(IΣX(XΣX+τ2I)1X))absentTr𝜷superscript𝜷topsuperscript𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋top𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋\displaystyle=\operatorname{Tr}(\bm{\beta}\bm{\beta}^{\top}(I-\Sigma X^{\top}(% X\Sigma X^{\top}+\tau^{2}I)^{-1}X)^{\top}(I-\Sigma X^{\top}(X\Sigma X^{\top}+% \tau^{2}I)^{-1}X))= roman_Tr ( bold_italic_β bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) )
=Tr(𝜷𝜷Σ1/2Σ1/2(IX(XΣX+τ2I)1XΣ)Σ1/2ΣΣ1/2(IΣX(XΣX+τ2I)1X)Σ1/2Σ1/2)absentTr𝜷superscript𝜷topsuperscriptΣ12superscriptΣ12𝐼superscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋ΣsuperscriptΣ12ΣsuperscriptΣ12𝐼Σsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋superscriptΣ12superscriptΣ12\displaystyle=\operatorname{Tr}(\bm{\beta}\bm{\beta}^{\top}\Sigma^{-1/2}\Sigma% ^{1/2}(I-X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}X\Sigma)\Sigma^{-1/2}\Sigma% \Sigma^{-1/2}(I-\Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}X)\Sigma^{1/2}% \Sigma^{-1/2})= roman_Tr ( bold_italic_β bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( italic_I - italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X roman_Σ ) roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_Σ roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_I - roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT )
=Tr(Σ1/2𝜷𝜷Σ1/2(IΣ1/2X(XΣX+τ2I)1XΣ1/2)Σ(IΣ1/2X(XΣX+τ2I)1XΣ1/2))absentTrsuperscriptΣ12𝜷superscript𝜷topsuperscriptΣ12𝐼superscriptΣ12superscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋superscriptΣ12Σ𝐼superscriptΣ12superscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋superscriptΣ12\displaystyle=\operatorname{Tr}(\Sigma^{-1/2}\bm{\beta}\bm{\beta}^{\top}\Sigma% ^{-1/2}(I-\Sigma^{1/2}X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}X\Sigma^{1/2})% \Sigma(I-\Sigma^{1/2}X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}X\Sigma^{1/2}))= roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_β bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_I - roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) roman_Σ ( italic_I - roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) )

Notice Πτ=Σ1/2X(XΣX+τ2I)1XΣ1/2subscriptΠ𝜏superscriptΣ12superscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1𝑋superscriptΣ12\Pi_{\tau}=\Sigma^{1/2}X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1}X\Sigma^{1/2}roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT is symmetric. Similarly IΠτ𝐼subscriptΠ𝜏I-\Pi_{\tau}italic_I - roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT is symmetric and so (IΠτ)2superscript𝐼subscriptΠ𝜏2(I-\Pi_{\tau})^{2}( italic_I - roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is positive semidefinite.

Let XΣ1/2=UDV𝑋superscriptΣ12𝑈𝐷superscript𝑉topX\Sigma^{1/2}=UDV^{\top}italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = italic_U italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT be the singular value decomposition of XΣ1/2𝑋superscriptΣ12X\Sigma^{1/2}italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, where d1d2dnsubscript𝑑1subscript𝑑2subscript𝑑𝑛d_{1}\leq d_{2}\leq\dots\leq d_{n}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ ⋯ ≤ italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are the diagonal elements of D𝐷Ditalic_D (also known as the singular values of XΣ1/2𝑋superscriptΣ12X\Sigma^{1/2}italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT) arranged in ascending order, then we have:

ΠτsubscriptΠ𝜏\displaystyle\Pi_{\tau}roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT =VDU(UDDU+τ2I)1UDV=VD(DD+τ2I)1DVabsent𝑉superscript𝐷topsuperscript𝑈topsuperscript𝑈𝐷superscript𝐷topsuperscript𝑈topsuperscript𝜏2𝐼1𝑈𝐷superscript𝑉top𝑉superscript𝐷topsuperscript𝐷superscript𝐷topsuperscript𝜏2𝐼1𝐷superscript𝑉top\displaystyle=VD^{\top}U^{\top}(UDD^{\top}U^{\top}+\tau^{2}I)^{-1}UDV^{\top}=% VD^{\top}(DD^{\top}+\tau^{2}I)^{-1}DV^{\top}= italic_V italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_U italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_V italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT

which has the same eigenvalues as D(DD+τ2I)1Dsuperscript𝐷topsuperscript𝐷superscript𝐷topsuperscript𝜏2𝐼1𝐷D^{\top}(DD^{\top}+\tau^{2}I)^{-1}Ditalic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_D by similarity (Horn and Johnson [1985]), which are di2/(di2+τ2)superscriptsubscript𝑑𝑖2superscriptsubscript𝑑𝑖2superscript𝜏2d_{i}^{2}/(d_{i}^{2}+\tau^{2})italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and 00 (repeated pn𝑝𝑛p-nitalic_p - italic_n times). Likewise the eigenvalues of IΠτ𝐼subscriptΠ𝜏I-\Pi_{\tau}italic_I - roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT are therefore 1111 (repeated pn𝑝𝑛p-nitalic_p - italic_n times) and τ2/(di2+τ2)superscript𝜏2superscriptsubscript𝑑𝑖2superscript𝜏2\tau^{2}/(d_{i}^{2}+\tau^{2})italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )

returning to the bias term, we have:

MSE1𝑀𝑆subscript𝐸1\displaystyle MSE_{1}italic_M italic_S italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =Tr(Σ1/2𝜷𝜷Σ1/2(IΠτ)Σ(IΠτ))absentTrsuperscriptΣ12𝜷superscript𝜷topsuperscriptΣ12𝐼subscriptΠ𝜏Σ𝐼subscriptΠ𝜏\displaystyle=\operatorname{Tr}(\Sigma^{-1/2}\bm{\beta}\bm{\beta}^{\top}\Sigma% ^{-1/2}(I-\Pi_{\tau})\Sigma(I-\Pi_{\tau}))= roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_β bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_I - roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) roman_Σ ( italic_I - roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) )
Tr(Σ1/2𝜷𝜷Σ1/2)Tr((IΠτ)2Σ)absentTrsuperscriptΣ12𝜷superscript𝜷topsuperscriptΣ12Trsuperscript𝐼subscriptΠ𝜏2Σ\displaystyle\leq\operatorname{Tr}(\Sigma^{-1/2}\bm{\beta}\bm{\beta}^{\top}% \Sigma^{-1/2})\operatorname{Tr}((I-\Pi_{\tau})^{2}\Sigma)≤ roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_β bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) roman_Tr ( ( italic_I - roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Σ )
Tr(Σ1/2𝜷𝜷Σ1/2)(j=1pnλj+j=1nλpn+jτ4/(dj2+τ2)2)absentTrsuperscriptΣ12𝜷superscript𝜷topsuperscriptΣ12superscriptsubscript𝑗1𝑝𝑛subscript𝜆𝑗superscriptsubscript𝑗1𝑛subscript𝜆𝑝𝑛𝑗superscript𝜏4superscriptsuperscriptsubscript𝑑𝑗2superscript𝜏22\displaystyle\leq\operatorname{Tr}(\Sigma^{-1/2}\bm{\beta}\bm{\beta}^{\top}% \Sigma^{-1/2})(\sum_{j=1}^{p-n}\lambda_{j}+\sum_{j=1}^{n}\lambda_{p-n+j}\tau^{% 4}/(d_{j}^{2}+\tau^{2})^{2})≤ roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_β bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_p - italic_n + italic_j end_POSTSUBSCRIPT italic_τ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT / ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
Tr(Σ1/2𝜷𝜷Σ1/2)λ1(pn+j=1nτ4/(dj2+τ2)2)absentTrsuperscriptΣ12𝜷superscript𝜷topsuperscriptΣ12subscript𝜆1𝑝𝑛superscriptsubscript𝑗1𝑛superscript𝜏4superscriptsuperscriptsubscript𝑑𝑗2superscript𝜏22\displaystyle\leq\operatorname{Tr}(\Sigma^{-1/2}\bm{\beta}\bm{\beta}^{\top}% \Sigma^{-1/2})\lambda_{1}(p-n+\sum_{j=1}^{n}\tau^{4}/(d_{j}^{2}+\tau^{2})^{2})≤ roman_Tr ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_β bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p - italic_n + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT / ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (30)

Where we used Von Neumann’s trace inequality (Mirsky [1975]) in the penultimate line, on the matrices (IΠτ)2superscript𝐼subscriptΠ𝜏2(I-\Pi_{\tau})^{2}( italic_I - roman_Π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and ΣΣ\Sigmaroman_Σ. So provided τ2superscript𝜏2\tau^{2}italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is small compared to the dj2superscriptsubscript𝑑𝑗2d_{j}^{2}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, MSE1𝑀𝑆subscript𝐸1MSE_{1}italic_M italic_S italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT decreases approximately linearly in n𝑛nitalic_n, as n𝑛nitalic_n increases to p𝑝pitalic_p.

For the variance term MSE2𝑀𝑆subscript𝐸2MSE_{2}italic_M italic_S italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we once again use the singular value decomposition XΣ1/2=UDV𝑋superscriptΣ12𝑈𝐷superscript𝑉topX\Sigma^{1/2}=UDV^{\top}italic_X roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = italic_U italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT:

MSE2𝑀𝑆subscript𝐸2\displaystyle MSE_{2}italic_M italic_S italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =Tr(𝔼(ϵ(ΣX(XΣX+τ2I)1)ΣX(XΣX+τ2I)1ϵ))absentTr𝔼superscriptbold-italic-ϵtopsuperscriptΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1topΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1bold-italic-ϵ\displaystyle=\operatorname{Tr}(\mathbb{E}(\bm{\epsilon}^{\top}(\Sigma X^{\top% }(X\Sigma X^{\top}+\tau^{2}I)^{-1})^{\top}\Sigma X^{\top}(X\Sigma X^{\top}+% \tau^{2}I)^{-1}\bm{\epsilon}))= roman_Tr ( blackboard_E ( bold_italic_ϵ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_ϵ ) )
=Tr(𝔼(ϵϵ(ΣX(XΣX+τ2I)1)ΣX(XΣX+τ2I)1))absentTr𝔼bold-italic-ϵsuperscriptbold-italic-ϵtopsuperscriptΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1topΣsuperscript𝑋topsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼1\displaystyle=\operatorname{Tr}(\mathbb{E}(\bm{\epsilon}\bm{\epsilon}^{\top}(% \Sigma X^{\top}(X\Sigma X^{\top}+\tau^{2}I)^{-1})^{\top}\Sigma X^{\top}(X% \Sigma X^{\top}+\tau^{2}I)^{-1}))= roman_Tr ( blackboard_E ( bold_italic_ϵ bold_italic_ϵ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) )
=τ2Tr((XΣX+τ2I)2XΣ2X)absentsuperscript𝜏2Trsuperscript𝑋Σsuperscript𝑋topsuperscript𝜏2𝐼2𝑋superscriptΣ2superscript𝑋top\displaystyle=\tau^{2}\operatorname{Tr}((X\Sigma X^{\top}+\tau^{2}I)^{-2}X% \Sigma^{2}X^{\top})= italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Tr ( ( italic_X roman_Σ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_X roman_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT )
=τ2Tr((UDDU+τ2I)2UDVΣVDU)absentsuperscript𝜏2Trsuperscript𝑈𝐷superscript𝐷topsuperscript𝑈topsuperscript𝜏2𝐼2𝑈𝐷superscript𝑉topΣ𝑉superscript𝐷topsuperscript𝑈top\displaystyle=\tau^{2}\operatorname{Tr}((UDD^{\top}U^{\top}+\tau^{2}I)^{-2}UDV% ^{\top}\Sigma VD^{\top}U^{\top})= italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Tr ( ( italic_U italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_U italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ italic_V italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT )
=τ2Tr(U(UDDU+τ2I)1UU(UDDU+τ2I)1UDVΣVD)absentsuperscript𝜏2Trsuperscript𝑈topsuperscript𝑈𝐷superscript𝐷topsuperscript𝑈topsuperscript𝜏2𝐼1𝑈superscript𝑈topsuperscript𝑈𝐷superscript𝐷topsuperscript𝑈topsuperscript𝜏2𝐼1𝑈𝐷superscript𝑉topΣ𝑉superscript𝐷top\displaystyle=\tau^{2}\operatorname{Tr}(U^{\top}(UDD^{\top}U^{\top}+\tau^{2}I)% ^{-1}UU^{\top}(UDD^{\top}U^{\top}+\tau^{2}I)^{-1}UDV^{\top}\Sigma VD^{\top})= italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Tr ( italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_U italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_U italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ italic_V italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT )
=τ2Tr(VD(DD+τ2I)2DVΣ)absentsuperscript𝜏2Tr𝑉superscript𝐷topsuperscript𝐷superscript𝐷topsuperscript𝜏2𝐼2𝐷superscript𝑉topΣ\displaystyle=\tau^{2}\operatorname{Tr}(VD^{\top}(DD^{\top}+\tau^{2}I)^{-2}DV^% {\top}\Sigma)= italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Tr ( italic_V italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ )
τ2j=1nλ1dj2/(dj2+τ2)2absentsuperscript𝜏2superscriptsubscript𝑗1𝑛subscript𝜆1superscriptsubscript𝑑𝑗2superscriptsuperscriptsubscript𝑑𝑗2superscript𝜏22\displaystyle\leq\tau^{2}\sum_{j=1}^{n}\lambda_{1}d_{j}^{2}/(d_{j}^{2}+\tau^{2% })^{2}≤ italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Where the last line again follows from Von Neumann’s trace inequality, on the matrices VD(DD+τ2I)2DV𝑉superscript𝐷topsuperscript𝐷superscript𝐷topsuperscript𝜏2𝐼2𝐷superscript𝑉topVD^{\top}(DD^{\top}+\tau^{2}I)^{-2}DV^{\top}italic_V italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_D italic_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_D italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and ΣΣ\Sigmaroman_Σ.

\blacksquare

A.9 Discussion of model parametrisation

In this section we discuss some additional limitations with the parametrisation of Gottschalk et al. [2010] which led us to use a mass based parametrisation. These limitations do not apply in all MFA systems but could potentially be problematic in some. In order to discuss them we provide a summary of the parametrisation: let Ui,jsubscript𝑈𝑖𝑗U_{i,j}italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT denote the value of the flow from process i𝑖iitalic_i to process j𝑗jitalic_j. Let zi=kUi,ksubscript𝑧𝑖subscript𝑘subscript𝑈𝑖𝑘z_{i}=\sum_{k}U_{i,k}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT denote the total outflow of process i𝑖iitalic_i, and ϕi,j=Ui,j/zisubscriptitalic-ϕ𝑖𝑗subscript𝑈𝑖𝑗subscript𝑧𝑖\phi_{i,j}=U_{i,j}/z_{i}italic_ϕ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT / italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the transfer coefficient for the flow from process i𝑖iitalic_i to process j𝑗jitalic_j, equal to the fraction of the total outflow zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of process i𝑖iitalic_i. In addition, an optional external flow qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is allowed for each process i𝑖iitalic_i. Applying conservation of mass at process i𝑖iitalic_i, we obtain:

qi+jUj,i=kUi,ksubscript𝑞𝑖subscript𝑗subscript𝑈𝑗𝑖subscript𝑘subscript𝑈𝑖𝑘q_{i}+\sum_{j}U_{j,i}=\sum_{k}U_{i,k}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT (31)

which can be rewritten as

qi+jzjϕj,i=zisubscript𝑞𝑖subscript𝑗subscript𝑧𝑗subscriptitalic-ϕ𝑗𝑖subscript𝑧𝑖q_{i}+\sum_{j}z_{j}\phi_{j,i}=z_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (32)

the conservation of mass equations 32 can therefore be written in the matrix form

(IΦ)𝒛=𝒒𝐼superscriptΦtop𝒛𝒒(I-\Phi^{\top})\bm{z}=\bm{q}( italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) bold_italic_z = bold_italic_q (33)

where ΦΦ\Phiroman_Φ is a matrix with elements ϕi,jsubscriptitalic-ϕ𝑖𝑗\phi_{i,j}italic_ϕ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT. Lupton and Allwood [2018] and Gottschalk et al. [2010] then notes that this equation can be inverted to give (IΦ)1𝒒=𝒛superscript𝐼superscriptΦtop1𝒒𝒛(I-\Phi^{\top})^{-1}\bm{q}=\bm{z}( italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_q = bold_italic_z, the vector of the total outflow of every process, which can then be used to retrieve each individual flow variables Ui,jsubscript𝑈𝑖𝑗U_{i,j}italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT via the relationship Ui,j=ϕi,jzisubscript𝑈𝑖𝑗subscriptitalic-ϕ𝑖𝑗subscript𝑧𝑖U_{i,j}=\phi_{i,j}z_{i}italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Using this, Lupton and Allwood [2018] and Gottschalk et al. [2010] assigns priors on the transfer coefficients ϕi,jsubscriptitalic-ϕ𝑖𝑗\phi_{i,j}italic_ϕ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT and the external flow variables qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which gives an implicit prior over the flow variables Ui,jsubscript𝑈𝑖𝑗U_{i,j}italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT.

The first limitation is that it is unclear if IΦ𝐼superscriptΦtopI-\Phi^{\top}italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is always invertible. From ϕi,j=Ui,j/kUi,ksubscriptitalic-ϕ𝑖𝑗subscript𝑈𝑖𝑗subscript𝑘subscript𝑈𝑖𝑘\phi_{i,j}=U_{i,j}/\sum_{k}U_{i,k}italic_ϕ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT / ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT we must have kϕi,k=1subscript𝑘subscriptitalic-ϕ𝑖𝑘1\sum_{k}\phi_{i,k}=1∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = 1, which means the matrix ΦΦ\Phiroman_Φ has each row summing to 1111 (making it a stochastic matrix), which implies it must have eigenvalue 1111 (this can be easily shown by checking the vector with entries all equal to 1111 is an eigenvector). IΦ𝐼superscriptΦtopI-\Phi^{\top}italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT therefore must have 00 as an eigenvalue and is not invertible.

It appears the way this issue is currently handled is to have at least one process in the system that has no outflows (see for example Figure 1 of Dong et al. [2023], which has processes 6,7,8 and 9 without outflows), essentially fixing one or more rows of ΦΦ\Phiroman_Φ to equal to the 𝟎0\bm{0}bold_0 vector so that it is no longer a stochastic matrix. However, it is unclear if a process that has no outflows always exists (for example Figure 2 of Mannan and Al-Ghamdi [2020]) and such processes must be interpretable as not mass balanced (e.g. a stock or outflow of some kind).

Additionally, if 𝒒𝒒\bm{q}bold_italic_q is no longer nonnegative (e.g. if a stock change, external outflow, or mass balance relaxation term is added to the right hand side of 33 ) and assuming IΦ𝐼superscriptΦtopI-\Phi^{\top}italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is invertible, it will cause the total flow output variables 𝒛=(IΦ)1𝒒𝒛superscript𝐼superscriptΦtop1𝒒\bm{z}=(I-\Phi^{\top})^{-1}\bm{q}bold_italic_z = ( italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_q and by extension the flow variables Ui,jsubscript𝑈𝑖𝑗U_{i,j}italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT to no longer be nonnegative. However, flow variables should be nonnegative to be physically meaningful. To see this, note that (IΦ)1superscript𝐼superscriptΦtop1(I-\Phi^{\top})^{-1}( italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT has Taylor expansion m=0(Φ)msuperscriptsubscript𝑚0superscriptsuperscriptΦtop𝑚\sum_{m=0}^{\infty}(\Phi^{\top})^{m}∑ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, which is a limit of a sum of matrices with nonnegative entries, so the entries of (IΦ)1superscript𝐼superscriptΦtop1(I-\Phi^{\top})^{-1}( italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT must all be nonnegative. Therefore in general 𝒛=(IΦ)1𝒒𝒛superscript𝐼superscriptΦtop1𝒒\bm{z}=(I-\Phi^{\top})^{-1}\bm{q}bold_italic_z = ( italic_I - roman_Φ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_q is nonnegative if and only if 𝒒𝒒\bm{q}bold_italic_q is nonnegative. This leaves how to incorporate mass balance relaxations (in order to take into account epistemic uncertainty) and other nonnegative terms under Gottschalk et al. [2010]’s parametrisation a possible direction for future work.