Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

Lecture 03_Uncertain Models and Modelling Uncertainty

Uploaded by

Mifta Ul Jannah
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lecture 03_Uncertain Models and Modelling Uncertainty

Uploaded by

Mifta Ul Jannah
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Lecture-03: Uncertain Models and

Modelling Uncertainty
Dr. Md. Nazrul Islam
Ph.D. (University of Tokyo, Japan) (MEXT Fellow)
JSPS Postdoctoral Research Fellow (2012-2014)
NFP Fellow for M.Sc. ESA Credits (Netherlands)
ISEM Fellow for Ecological Modeling (USA)
Associate Professor
Department of Geography and Environment
Jahangirnagar University, Savar, Dhaka, Bangladesh

Executive Editor-in-Chief
Modeling Earth Systems and Environment
Outline of presentation
 Model building and testing- is the environment special?
 Statistical models vs physical/process based models
 What is sensitivity/uncertainty analysis?
 Quantifying and apportioning variation in model and
data.
 General comments- relevance and implementation.
 Sensitivity analysis (SA)
 Uncertainty analysis (UA)
All models are wrong but some
are useful
(and some are more useful than others)

(All data are useful, but some are more


varied than others.)
Questions we ask about models

 Is the model valid?  Is the model credible?


 Are the assumptions  Do the model predictions
reasonable? match the observed
 Does the model make data?
sense based on best  How uncertain are the
scientific knowledge results?

What is a good model?


Simple, realistic, efficient, useful, reliable, valid etc
Statistical models

 Always includes an  term to describe random


variation
 Empirical
 Descriptive and predictive
 Model building goal: simplest model which is
adequate
 used for inference
Physical/process based models

 Uses best scientific knowledge


 May not explicitly include , or any random
variation
 Descriptive and predictive
 Goal may not be simplest model
 Not used for inference
Models

Mathematical (deterministic/process based)


models tend
 to be complex
 to ignore important sources of uncertainty

Statistical models tend


 to be empirical
 To ignore much of the
biological/physical/chemical knowledge
Stages in modelling

 Design and conceptualisation:


– Visualisation of structure
– Identification of processes (variable selection)
– Choice of parameterisation
 Fitting and assessment
– parameter estimation (calibration)
– Goodness of fit
Model evaluation tools
 Graphical procedures
 % variation explained in response
 Statistical model comparisons (F-tests,
ANOVA, GLRT)

 well designed for statistical models, but what of


the physical, process-driven models?
– Comparability to measurements
The story of randomness and
uncertainty
 Randomness as the  Uncertainty due to lack of
knowledge
source of variability
– A source of variation,
– conflicting evidence
different animals range – ignorance
over different territory, eat
different sources of ….
– effects of scale

 The effect is that we – lack of observations


cannot be certain
 Uncertainty due to variability
– Natural randomness

– behavioural variability
Effect of uncertainties
 Uncertainty in model  Lack of observations
quantities/parameters/ contribute to
inputs – uncertainties in input data
 – parameter uncertainties
Uncertainty about model
 Conflicting evidence
form
 Uncertainty about model contributes to
– uncertainty about model
completeness
form
– Uncertainty about validity
of assumptions

Making it difficult to judge how good a model is!!


Modelling tools - SA/UA

 Sensitivity analysis
determining the amount and kind of change produced in
the model predictions by a change in a model
parameter

 Uncertainty analysis
an assessment/quantification of the uncertainties
associated with the parameters, the data and the
model structure.
Modellers conduct SA to determine
(a) if a model resembles the system or processes under
study,
 (b) the factors that mostly contribute to the output
variability,
 (c) the model parameters (or parts of the model itself) that
are insignificant,
 (d) if there is some region in the space of input factors for
which the model variation is maximum,
 and
 (e) if and which (group of) factors interact with each
other.
SA flow chart (Saltelli, Chan and Scott,
2000)
Design of the SA experiment
 Simple factorial designs (one at a time)
 Factorial designs (including potential
interaction terms)
 Fractional factorial designs
 Important difference: design in the context of
computer code experiments – random variation
due to variation in experimental units does not
exist.
SA techniques

 Screening techniques
– O(ne) A(t) T(ime), factorial, fractional factorial
designs used to isolate a set of important factors
 Local/differential
analysis
 Sampling-based (Monte Carlo) methods
 Variance based methods
– variance decomposition of output to compute
sensitivity indices
Screening

 screening experiments can be used to


identify the parameter subset that controls
most of the output variability with low
computational effort.
Screening methods

 Vary one factor at a time (NOT particularly


recommended)
 Morris OAT design (global)
– Estimate the main effect of a factor by computing a
number r of local measures at different points x1,
…,xr in the input space and then average them.
– Order the input factors
Local SA
 Local SA concentrates on the local impact of the
factors on the model. Local SA is usually carried out by
computing partial derivatives of the output functions
with respect to the input variables.
 The input parameters are varied in a small interval
around a nominal value. The interval is usually the
same for all of the variables and is not related to the
degree of knowledge of the variables.
Global SA

 Global SA apportions the output uncertainty to


the uncertainty in the input factors, covering
their entire range space.
 A global method evaluates the effect of x while
j
all other xi,ij are varied as well.
How is a sampling (global) based
SA implemented?

Step 1: define model, input factors and outputs


Step 2: assign input parameters/factors and if
necessary covariance structure. DIFFICULT
Step 3:simulate realisations from the parameter
pdfs to generate a set of model runs giving the
set of output values.
Choice of sampling method
 S(imple) or Stratified R(andom) S(ampling)
– Each input factor sampled independently many times from
marginal distbns to create the set of input values (or randomly
sampled from joint distbn.)
Expensive (relatively) in computational effort if model has
many input factors, may not give good coverage of the
entire range space
 L(atin) H(ypercube) S(sampling)
– The range of each input factor is categorised into N equal
probability intervals, one observation of each input factor
made in each interval.
SA -analysis

 Atthe end of the computer experiment, data is


of the form (yij, x1i,x2i,….,xni), where x1,..,xn are
the realisations of the input factors.
 Analysis includes regression analysis (on raw
and ranked values), standard hypothesis tests
of distribution (mean and variance) for sub-
samples corresponding to given percentiles of
x and Analysis of Variance.
Some ‘new’ methods of analysis

 Measures of importance
VarXi(E(Y|Xj =xj))/Var(Y)
HIM(Xj) =yiyi’/N
Sobol sensitivity indices
 Fourier Amplitude Sensitivity test (FAST)
So far so good

 but how useful are these techniques in some


real life problems?
 Are there other complicating factors?

 Do statisticians have too simple/complex a


view of the world?
Common features of environmental
modelling and observations

 Knowledge of the processes creating the


observational record may be incomplete
 The observational records may be incomplete
(observed often irregularly in space and time)
 involve extreme events
 involve quantification of risk
Issues and purpose of analysis
 Global and local pollutant  Decision making- Which areas
mapping from Chernobyl should be restricted?
 Global carbon cycle –  Prediction-What is the trend in
greenhouse gases, CO2 temperature? Predict its level
levels and global warming in 2050?

 Ocean modelling
 Decision making-is it safe to
eat fish?
 Regulatory- Have emission
 Air pollution modelling (local control agreements reduced
and regional scale) air pollutants?
 Chronologies for past  Understanding -when did
environment studies things happen in the past
Questions we ask about
observations
 Do they result from observational or designed;
laboratory or field experiments?
 What scale are they collected over (time and space)?
 Are they representative?
 Are they qualitative or quantitative?
 How are they connected to processes, how well
understood are these connections?
 How varied are they?
Example 1: are atmospheric SO2
concentrations declining?
 Measurements made at a monitoring station
over a 20 year period: processes involve
meteorology (local and long-range, source
distribution, chemistry of sulphur)
 Complex statistical model developed to
describe the pattern, the model portions the
variation to ‘trend’, seasonality, residual
variation
 Main objective
so2 monitored in GB02
10
8
6
so2

4
2
0

0 50 100 150 200 250


observations
Plot of so2 against time, monitored in GB02
10
8
6 Lines = Model 3
so2

4
2
0

1980 1985 1990 1995


months
SO4 in air, monitored at Lough Navar (GB06)
2.5
2.0
SO4 in air

1.5
1.0
0.5
0.0

0 50 100 150
observations
Example 2
 Discovery of radioactive particles on the foreshore of a
nuclear facility since 1983
 Is the rate of finds falling off?
 Are the particle characteristics changing with time?
 Processes: transport in the marine environment,
chemistry of the particles in the sea, interaction with
source
 What can we infer about the size of the source and its
distribution?
Log activity and trend

Trend Analysis Plot for logactivity


Linear Trend Model
Yt = 14.9899 - 0.00712072* t
20.0 Variable
Actual
17.5 Fits

Accuracy Measures
15.0
MAPE 11.8851
logactivity

MAD 1.4229
12.5 MSD 3.8787

10.0

7.5

5.0

Date
Trend in number of finds

Trend Analysis Plot for number of finds


Linear Trend Model
Yt = 14.7476 - 0.401299* t

25 Variable
Actual
Fits

20 Accuracy Measures
MAPE 108.951
ber of finds

MAD 4.025
MSD 28.222
15
num

10

0
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
year
Cumulative number of finds

Scatterplot of cumulative finds pre 1998 and post 1997


0 4 8 12 16
200

150

100

50

0 4 8 12 16
Example 3: how well should models
agree?
 6 ocean models (process based-transport, sedimentary
processes, numerical solution scheme, grid size) used
to predict the dispersal of a pollutant
 Results to be used to determine a remediation policy

 The models differ in their detail and also in their spatial


scale
Model agreement
 Three different sites
Sensitivity measures for each model
(local, regional and
global relative to a 6 site 1
site 2
5
source) 4
site 3

level of agreement
 6 different models 3

 Level of agreement (high 2

values are poor). 1

0
1 2 3 4 5 6
Model
Predictions of levels of cobalt-60
 Different models, same
input data
CV(%) for location 7 CV(%) for location 10

250 250

CV(%)

CV(%)
150 150

 Predictions vary by 50 50
bcs bcw bis biw tcs tcw tis tiw bcs bcw bis biw tcs tcw tis tiw

considerable margins
Simulation condition Simulation condition

CV(%) for location 8 CV(%) for location 11

250 250

 Magnitude of variation a

CV(%)

CV(%)
150 150

function of spatial
50 50
bcs bcw bis biw tcs tcw tis tiw bcs bcw bis biw tcs tcw tis tiw
Simulation condition Simulation condition

distribution of sites
CV(%) for location 9

250

CV(%)

150

50
bcs bcw bis biw tcs tcw tis tiw
Simulation condition
Environmental modelling

 Modelling may involve


– Understanding and handling variation
– Dealing with unusual observations
– Dealing with missing observations
– Evaluating uncertainties
How well should the model
reproduce the data?

 anecdotal comments ‘agreement between


model and measurement better than 1 (2 )
orders of magnitude is acceptable’.
 But this needs to be moderated by the
measurement variation and uncertainties
 It also depends on the purpose (model fit for
purpose)
How can SA/UA help?

SA/UA have a role to play in all modelling stages:


– We learn about model behaviour and ‘robustness’ to
change;
– We can generate an envelope of ‘outcomes’ and
see whether the observations fall within the
envelope;
– We can ‘tune’ the model and identify
reasons/causes for differences between model and
observations
On the other hand - Uncertainty
analysis

 Parameter uncertainty
– usually quantified in form of a distribution.
 Model structural uncertainty
– more than one model may be fit, expressed as a
prior on model structure.
 Scenario uncertainty
– uncertainty on future conditions.
Tools for handling uncertainty

 Parameter uncertainty
– Probability distributions and Sensitivity analysis
 Structural uncertainty
– Bayesian framework
– one possibility to define a discrete set of models,
other possibility to use a Gaussian process
Conclusions
 The world is rich and varied in its complexity
 Modelling is an uncertain activity

 Model assessment is a difficult process


 SA/UA are an important tools in model assessment
 The setting of the problem in a unified Bayesian
framework allows all the sources of uncertainty to be
quantified, so a fuller assessment to be performed.
Challenges
Some challenges:
 different terminologies in different subject areas.
 need more sophisticated tools to deal with multivariate
nature of problem.
 challenges in describing the distribution of input
parameters.
 challenges in dealing with the Bayesian formulation of
structural uncertainty for complex models.
 Computational challenges in simulations for large
and complex computer models with many factors.

You might also like