Statistical Procedures For Developing Earthquake Damage Fragility Curves
Statistical Procedures For Developing Earthquake Damage Fragility Curves
Statistical Procedures For Developing Earthquake Damage Fragility Curves
SUMMARY
This paper describes statistical procedures for developing earthquake damage fragility functions. Although
fragility curves abound in earthquake engineering and risk assessment literature, the focus has generally
been on the methods for obtaining the damage data (i.e., the analysis of structures), and little emphasis is
placed on the process for fitting fragility curves to this data. This paper provides a synthesis of the most com-
monly used methods for fitting fragility curves and highlights some of their significant limitations. More
novel methods are described for parametric fragility curve development (generalized linear models and
cumulative link models) and non-parametric curves (generalized additive model and Gaussian kernel
smoothing). An extensive discussion of the advantages and disadvantages of each method is provided, as
well as examples using both empirical and analytical data. The paper further proposes methods for treating
the uncertainty in intensity measure, an issue common with empirical data. Finally, the paper describes
approaches for choosing among various fragility models, based on an evaluation of prediction error for a
user-defined loss function. Copyright © 2015 John Wiley & Sons, Ltd.
KEY WORDS: fragility curves; maximum likelihood estimation; generalized linear model; generalized
additive model; kernel smoothing
1. INTRODUCTION
Earthquake damage to ground motion relationships is a key component for earthquake loss estimation
and the performance-based analysis of the risk of structures. Also called fragility curves, they describe
the probability of experiencing or exceeding a particular level of damage as a function of ground-
shaking intensity. Depending on the source of data used to develop these relationships, they can be
categorized as (i) empirical fragility curves, based on post-earthquake damage evaluation data [1–4];
(ii) analytical fragility curves, based on structural modeling and response simulations [5–8]; or (iii)
heuristic fragility curves, based on expert opinion [9, 10]. The resulting fragility relationships are
represented either in discrete form, known as damage probability matrices [2], or continuous
fragility functions.
This study describes several statistical procedures for developing continuous earthquake damage
fragility functions. Both empirical and analytical data are used to demonstrate the methods. The
fragility curves describe the probability of exceeding a particular damage state as a function of
ground motion intensity. Although real data are used, the examples shown are meant only to
illustrate options for fitting fragility curves to data, as well as approaches to choosing among them.
*Correspondence to: David Lallemant, Department of Civil and Environmental Engineering, Stanford University,
Stanford, CA, 94305, USA.
†
E-mail: davidcbl@stanford.edu
Two data sets are used in this study. The first consists of field assessments from the main shock of the
January 12, 2010 earthquake in Haiti, conducted by the Haitian Ministry of Public Works [11]. Over
47,000 buildings are used in this data set, corresponding to infill frame buildings each categorized
within one of the seven damage states following the Applied Technology Council-13 nomenclature
[10]: none, slight, light, moderate, heavy, major, and destroyed. These buildings are distributed within
nearly 1000 sites, each having 500 m × 500 m resolution. The ground motion intensity measure (IM)
used is the peak ground acceleration (PGA). Because no ground motion recordings were made in
Haiti at the time of the earthquake, the ground motion intensity was estimated at each site using
the Boore–Atkinson 2008 ground motion prediction equation (GMPE), appropriate for shallow
crustal earthquakes in active tectonic regions [12]. The source model for the January 12, 2010
event was produced by the United States Geological Survey [13]. The ground motion intensity at
each site is estimated and therefore uncertain. The uncertainty in IM is addressed later in the paper.
For each IM in the database, we can obtain the fraction of buildings exceeding a particular damage
state, as shown in Figure 1(a). The size of the dots in the figure corresponds to the number of
observations (buildings) at that IM. The fitted fragility curves attempt to predict the probabilities of
exceeding a particular damage state in a way that is most consistent with the observed fraction of
buildings above that damage state.
The second data set is obtained from the collapse performance assessment of an eight-story infill
frame building using the Incremental Dynamic Analysis (IDA) technique [14]. Nonlinear dynamic
analyses are conducted on a two-dimensional (planar) model developed in OpenSees [15] using the
22 far-field ground motion set and scaling method outlined in Federal Emergency Management
Agency (FEMA) P695 [16]. The first mode spectral acceleration (Sa) is used as the ground motion
IM. The overall analysis approach is based on the methodology developed by Burton and Deierlein
[17] for simulating seismic collapse in non-ductile reinforced concrete frame buildings with infill.
The results from this analytical model take the form of a set Sa values corresponding to onset of
collapse (one Sa data point for each ground motion time history). These results are described in
Table I for any researcher wanting to reproduce results or test further methods for comparison.
Both data sets are for the collapse damage state, but the methods are applicable for any other damage
state. Therefore, any other damage state can be substituted for the collapse and used in the development
of the fragility functions presented in this paper. The fragility functions for damage states other than
collapse, however, will represent the probability of being or exceeding that damage state as a
function of the ground motion.
The lognormal CDF has often been used to model earthquake damage fragility (Equation 1). It is a
simple parametric model, which historically has been found to provide good representation of
1.00 1.00
Probability of Collapse
Probability of Collapse
0.75 0.75
0.50 0.50
0.25 0.25
0.00 0.00
0.0 0.2 0.4 0.6 0.8 0.0 0.5 1.0 1.5 2.0 2.5
PGA Sa (in g)
(a) (b)
Figure 1. Damage data used for fitting fragility models: (a) real data from the 2010 Haiti earthquake (the size
of points reflects the number of buildings at that IM level and (b) incremental dynamic analysis results for an
eight-story building.
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
Table I. Analytical data from incremental dynamic analysis of an eight-story concrete frame building with
infill.
Ground motion 1 2 3 4 5 6 7 8 9 10 11
Sa at collapse 0.53 0.58 1.00 1.00 0.79 0.79 0.74 0.47 1.05 0.53 0.79
Ground motion 12 13 14 15 16 17 18 19 20 21 22
Sa at collapse 1.16 1.05 1.21 0.79 1.05 1.00 0.63 2.21 1.26 0.63 0.89
Ground motion 23 24 25 26 27 28 29 30 31 32 33
Sa at collapse 1.58 0.79 0.53 0.95 1.94 0.63 1.21 0.79 0.74 0.84 0.84
Ground motion 34 35 36 37 38 39 40 41 42 43 44
Sa at collapse 0.95 0.74 1.31 1.89 1.05 1.52 1.05 0.58 1.26 0.84 1.68
Sa, spectral acceleration.
earthquake damage fragility [5, 7, 18–20], and has significant precedent for seismic risk modeling [21–
23]. The lognormal cumulative distribution further has numerous convenient characteristics for
modeling fragility. As is the case with any CDF, it is bounded between 0 and 1 on the y-axis,
satisfying the constraint that the probability of collapse (or any other damage state) is likewise
bounded between 0 and 1. It has a lower bound of 0 on the x-axis, which satisfies the expectation of
non-negative IMs and of no damage at an IM of 0. It also has the mathematical convenience that
when a lognormally distributed random variable is multiplied or divided by factors (factors of safety
for instance), which are themselves uncertain and lognormally distributed, the resulting fragility
curve is still lognormally distributed [24]. This characteristic of multiplicative reproducibility of the
lognormal distribution has made it a powerful tool for developing code-oriented reliability metrics,
including nuclear power plant design [25].
where Φ is the standard cumulative normal distribution function, and the sample mean and standard
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
deviation are calculated as μ ¼ E ln IM collapse and β ¼ Var ln IM collapse .
This same procedure can also be used for experimental data if all specimens are tested until failure (or
other threshold) [26]. It is indeed noted that the MM approach can be used only for limited types of data. It
cannot be used for empirical data nor for truncated IDA (where a limit is imposed on the scaling of ground
motions) or multiple stripes analysis (where dynamic analysis is conducted only for a few IMs) [27].
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
D. LALLEMANT, A. KIREMIDJIAN AND H. BURTON
squares regression is indeed a common method to treat heteroscedastic data, where data points have
different levels of uncertainty. Conducting weighted least squares regression allows for the larger
points in Figure 1(a) to have greater influence on the resulting curve, because they represent more
buildings. In this approach, a lognormal CDF can be fitted to the damage data using parameters
estimated from the following weighted least-squared optimization procedure:
ni lnðIM i Þ μ 2
^ ; β^ ¼ arg min ∑ N i
μ Φ (2)
μ;β i¼1 Ni β
For a fragility function following the form of a lognormal CDF, pi is replaced by the equation of the
lognormal CDF, and the parameter estimates μ ^ and β^ (logarithmic mean and standard deviation) of the
lognormal CDF are estimated such that they maximize the likelihood function. It should be noted that
any other functional form can be substituted for pi if preferred to the lognormal CDF. Replacing p by
the equation of the binomial distribution, the parameters are estimated by maximizing the logarithm of
the likelihood, which is equivalent and computationally more efficient than maximizing the likelihood
function itself
^ lnðIM i Þ μ lnðIM i Þ μ
^ ; β ¼ arg max ∑ ni ln Φ
μ þ ðN i ni Þln 1 Φ (5)
μ;β i¼1 β β
Ni
where μ ^
^ and β are the estimates of μ and β, and the ln term has been removed because it is a
ni
constant, therefore having no impact on the maximization.
Standard software packages such as Microsoft Excel, MATLAB, or R can be used to obtain the
parameters μ̂ and β ̂ that maximize Equation 5. These parameters are used in Equation 1 to obtain
the fitted lognormal CDF fragility function.
Although simplest to implement given the required data, the MM often provides poor fit to data. As
seen in Figure 2(b), it underestimates the probability of collapse for most IMs. This is a typical finding,
because the MM simply matches the central distribution characteristics (median and variance) rather
than the entire distribution.
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
(a)
(b)
Figure 2. Lognormal CDF fragility curves fitted to with MM, MLE ,and SSE optimization to (a) Haiti dam-
age survey data and (b) analytical IDA results.
Generalized linear models are commonly used for regression analysis of dichotomous data (zeros and
ones, such as collapsed or non-collapsed structures). GLMs are a variation of ordinary linear
regression, in which the predictor variables are linearly related to the response via a link function.
The GLM is made up of three parts: (i) a conditional probability distribution of the exponential
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
D. LALLEMANT, A. KIREMIDJIAN AND H. BURTON
family; (ii) a linear predictor; and (iii) a link function, through which the linear predictor is related to
the response [30]. GLM models can be written as follows:
where μ is the expected response given predictor variables X1, X2, …, Xn. The term η is the linear
predictor, which is related to the expected response through the link function g().
For developing fragility curves, Equation 6 reduces to a single independent variable (typically, the
logarithm of IM) and two linear coefficients (intercept α and a single coefficient β), and μ is the
expected probability exceeding a particular damage state (DS) threshold
The process of fitting a GLM then involves finding the coefficients that maximize the likelihood
function based on assumptions of a conditional distribution of the exponential family. In the case of
fragility, the binomial distribution is the most natural choice.
One advantage of GLMs is that they all use MLE for fitting the model. The MLE is found by
solving the score function by means of iteratively re-weighted least squares, which as described
earlier provide weights on the error that are inversely proportional to the conditional variance. This
solves the issue of non-independent conditional variance. In this section, we describe GLMs for
fitting fragility curves using the probit and logit link functions, which are the most commonly used
for dichotomous response data. We also explore two assumptions on the conditional distribution
(binomial and Gaussian).
3.1. Probit GLM model with binomial distribution and log input variable (IM)
An exactly equivalent result to fitting a lognormal CDF by MLE (as described in Section 2) can be
obtained by fitting a GLM assuming a binomial response and using a probit link function with the
logarithm of the predictor variable. This is because the probit model fits a standard normal
cumulative distribution, and therefore, a probit model fit to the logarithm of the input variable
describes a lognormal cumulative distribution [27, 31]. The optimization of GLMs is furthermore
conducted through MLE, and therefore, the models are equivalent. The GLM model takes the form
as follows:
This model can therefore be considered simply a different formulation of the MLE-based lognormal
CDF curve fitting in Section 2. An advantage to the GLM formulation is that these models are often
pre-existing in common software packages (such as R or MATLAB), which automatically also
compute useful outputs for evaluating goodness of fit (Akaike information criteria, R-squared
measure, etc.) and confidence on the estimates.
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
In addition to the probit and logit link functions for binomial data, the complementary log–log
(commonly known as ‘clog-log’) link can also be used.
Generalized additive models are an extension of GLMs in which the requirement of linear combination
of parameters is relaxed. They are often used to identify non-linear effects. These methods are more
complex and require more background but can often provide very good results [34, 35]. GAM
models were recently introduced for modeling earthquake fragility by Rosetto et al. [36]. Recalling
that generalized linear models relate the mean response to predictors through a linear regression
model and a link function, generalized additive models simply replace the linear coefficients with
unspecified smoothing functions
gðμÞ ¼ η ¼ α þ f 1 X 1 þ f 2 X 2 þ … þf n X n (9)
where f1, f2, …, fn are non-parametric scatterplot smoothers, usually cubic smoothing splines. These
splines fit cubic polynomial curves within sections, connected at points (called knots), creating a
continuous curve. The general effect of the smoothers is to relax the constraints of the standard
GLM models. During the fitting process for additive models, a smoothing parameter λ can be specified.
Generalized additive models are semi-parametric and therefore cannot be summarized by linear
coefficient parameters and a link function. Compared with GLM models, GAM models are more
flexible and provide very good fit to data when significant nonlinearities in predictor variables are
present. The smoothing parameter is used to balance between bias and variance (over-smoothing
and over-fitting), as seen in Figure 3. An infinite smoother reduces to a GLM.
Gaussian kernel smoothing is a non-parametric regression method that helps to smoothen otherwise
erratic data and can be used to develop non-parametric fragility functions [37, 38]. It uses a kernel,
which for each data assigns a weight that is inversely related to the distance between the data value
and the input of the fragility function of interest. In GKS, the weights are equal to the evaluation at
all data points of the Gaussian density distribution centered at the IM value of interest and with
variance chosen depending on the smoothing requirements. The probability of exceeding damage
state ds given IM = imi can therefore be computed as follows:
n
im im
∑ I dsp ⩾ ds Φ1 i h p
p¼1
PðDS ⩾ dsjIM ¼ imi Þ ¼ (10)
1 imi imp
n
∑Φ
p¼1 h
where I(dsp ⩾ ds) is the indicator function, equal to 1 for dsp ⩾ ds and 0 otherwise, h is the kernel
bandwidth (standard deviation of the Gaussian kernel), subscript p refers to observed data, and
subscript i refers to the IM level of interest (Figure 3).
Gaussian kernel smoothing has the advantage of avoiding any assumption on the shape of the
fragility function. Because the fragility curve is no longer constrained to a particular functional
shape, it will tend to more closely fit the data, with the possible danger of over-fitting. The kernel
bandwidth is used to balance between variance (over-fitting) and bias (over-smoothing), as
demonstrated in Figure 4(a). GKS can also be conducted on the logarithm of IM, often producing
better results.
A common issue with kernel weighted-based smoothing is that the resulting functions exhibit bias at
the boundaries of the domain, because the kernel becomes asymmetric at the boundary (data exist only
on one side of the kernel). The issue of localized bias at the boundary results in non-zero probabilities
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
D. LALLEMANT, A. KIREMIDJIAN AND H. BURTON
(a)
(b)
Figure 3. Generalized additive model-based fragility curve with probit link function, binomial distribution,
and logarithmic IM variables and lognormal CDF fragility fitted with MLE for (a) Haiti damage survey data
and (b) analytical IDA results.
of damage at zero IM intensity as shown in Figure 4(b), which clearly does not reflect underlying
physical constraints. It can therefore be forced to zero through zero-padding at the boundary. This
process simply adds artificial values of zero at the zero IM level. Because of the kernel weighting of
every data point, the effect of zero-padding is localized. Another approach to address the localized
bias is through kernel-weighted local linear regression. This method fits straight lines rather than
constants locally. Finally, it is noted that GKS can also be used to develop conditional probabilities
of collapse given IM, through two-dimensional GKS [38]. The development and application of
kernel-weighted linear regression and two-dimensional kernel smoothing used in fragility modeling
is described by Noh et al. [38].
All models described previously treat damage data as nominal (i.e., that damage states are unrelated).
However, one useful characteristic of damage state data is that it can be ordered (no damage, slight
damage, moderate damage, etc.) even though the ‘distances’ between damage categories are
unknown (is slight damage half as much as moderate damage?). These data are therefore an
example of ordinal data, as opposed to nominal data, where data categories are assumed to have no
relation other than being mutually exclusive. The ordinality of variables can be used to develop
fragility curves for all damage states simultaneously using cumulative link models. These models
further take advantage of the ordinal characteristics of data to improve the parsimony and power of
models [31]. They ensure the proper treatment of cumulative damage probabilities (fragility curves
will never cross) and take advantage of data from all damage states to create fragility curves for
each individual damage state.
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
(a)
(b)
Figure 4. Gaussian kernel smoothing non-parametric fragility curves and lognormal CDF fragility fitted with
MLE for (a) Haiti damage survey data and (b) analytical IDA results.
Cumulative link models are an extension of GLMs described in Section 3, applied to ordinal data.
For earthquake fragility modeling, the damage state ordering forms cumulative probabilities for each
damage state j
P DS ⩾ DSj jIM ¼ g1 αj þ βlogðIM Þ ; j ¼ 1; …j 1 (11)
Each cumulative probability has its own intercept αj but shares the same effect β. This common
effect β is justified when considering that discrete ordinal damage states are coarsened versions of
an underlying continuous latent variable of damage [31]. Common link functions used are the logit
Light
DS)
0.75
Moderate
P(Damage State
heavy
0.50
major
destroyed
0.25
Data treated as:
Nominal
0.00 Ordinal
Figure 5. Fragility curves for all damage state fit with a cumulative probit model (blue) and individual probit
GLMs (black).
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
D. LALLEMANT, A. KIREMIDJIAN AND H. BURTON
and probit link functions, resulting in the proportional odds model and ordered probit model,
respectively. As in the GLM models, using the probit link in Equation 11 fits MLE-based lognormal
CDF fragility curves.
Although the common effect β leads to simpler models and therefore less flexibility (fewer
parameters), the cumulative link models have two significant advantages. First, cumulative link
models never violate the proper ordering among cumulative probabilities. This addresses the issue of
‘crossing fragility curves’, which often arise when treating the data as nominal [39]. This is
exemplified when fitting fragility curves to the Haiti data shown in Figure 5. Second, cumulative
link models are particularly useful for developing fragility curves when data for a specific damage
state are sparse. This is because all damage data (for all damage states) are used for each fragility
curve. Hence, even when very few collapses are observed, the model uses information from other
damage states to inform and estimate a collapse fragility curve, consistent with cumulative damage
state probabilities. In fact, this is very common, as there is usually little data on extreme damages in
empirical data sets.
7. BAYESIAN METHODS
In addition to the methods described here, Bayesian methods can be used to update fragility curves
based on new data. Typically, analytical or expert-based fragility curves can be updated based on
new data including field-based damage data or experimental testing results. This poses advantages
when empirical data only cover a small range of IMs or in order to refine general (non-
geographically specific) fragility curves for local building vulnerability. Such methods are described
in several other references [40–42].
Uncertainty in IM is a common issue for the development of empirical fragility curves. Indeed, for
most empirical damage data, the ground motion is not measured at each site but rather inferred
based on GMPE or a combination of recordings and GMPE-based inferences. Modern GMPEs
typically take the form
where σ and τ are period dependents and reflect the intra-event (within event) and inter-event (event-to-
event) uncertainties, respectively.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The combined uncertainty σ T ¼ σ 2 þ τ 2 can be significant. For example, the logarithmic
uncertainty is typically around 0.5 for PGA. This leads to issues when performing a regression,
because the explanatory variable (IM) is not observed but estimated. In fact, such error usually leads
to the attenuation of estimated effect (near-systematic negative bias in the effect estimator) [43].
This is demonstrated in the following text and shown in Figure 7.
Ideally, it would be possible to develop an ‘error-in-variables’ model (also known as ‘measurement
error model’) for fragility curves based on inferred ground motion IMs. Most such models require a so-
called ‘instrumental variable’, which is correlated with the true IM but not correlated with its
measurement error. In practice, such instrumental variable rarely exists, although perhaps modified
Mercalli intensity measured through United States Geological Survey (USGS’s) ‘did-you-feel-it’
platform could be investigated. In addition, measurement error models require that the measurement
error in the independent variable be independent from the true variable (classical error) and mean
zero. The first condition can be met when regressing on the logarithmic IM (classical error shown in
Equation 12). However, by definition, the event-to-event term ετ in Equation 12 is perfectly
correlated for any given data set from a single earthquake. Furthermore, the intra-event term εσ is
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
Probability of Collapse
Updated IM conditioned
0.75 on recordings
Updated & weighted IM
Logarithmic IM Variance
0.50 0.3
0.2
0.25
0.1
0.00
0.0 0.2 0.4 0.6 0.8
PGA
Figure 6. Fragility curves updated by conditioning ground motion intensity fields to known intensity mea-
sure at recording stations and weighted inversely to the variance of intensity measure.
spatially correlated, so it is possible for a significant portion of damage observations (possibly all) to be
at sites with correlated intensity. Both facts lead to violation of the mean zero measurement error
requirement for measurement–error regression models.
Therefore, the only clear approach to reducing the bias introduced by uncertainty in the IM is to
reduce this uncertainty. This is possible if ground motion recordings are available. Even a single
recording can be used to update expected IM at all sites, as well as reduce the inter-event
uncertainty. Because ground motions are spatially correlated, the intra-event uncertainty can also be
reduced in the region within the correlation distance of the recording.
This is demonstrated using the Haiti data. One million spatially correlated ground motion fields are
produced using the Boore–Atkinson 2008 GMPE [12] and the spatial correlation model for PGA
developed by Jayaram and Baker [44]. Without any other constraint, the median and logarithmic
standard deviation of the one million spatially correlated IM simulations at each site match those
obtained from the GMPE. However, we assume that ground motion records were captured at two
recording stations in proximity to the event. These recordings are used to condition the ground
motion fields. Hence, a subset of ground motion fields are selected, which match the recorded IM at
the recording sites. The median IM and logarithmic standard deviations for this updated set of
ground motion fields are computed.
Fragility curve can then be fit to the updated expected IM. In addition, each point is weighted
according to the remaining uncertainty in IM. Weights used are inversely proportional to the
logarithmic variance of the simulated IMs for each site (further weight can be added by taking the
square of the logarithmic variance). This has the effect of putting larger weights on data in
proximity to the recording stations.
The plot in the succeeding text shows the lognormal CDF-based fragility curve fit by MLE with the
original IM estimates, the updated IMs, and the updated IMs with weights inversely proportional to the
IM variance. The color of the points describes the variance of the IM.
Figure 7. Boxplots of normalized error in parameter estimates of fragility curve due to uncertainty in the in-
tensity measure.
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
D. LALLEMANT, A. KIREMIDJIAN AND H. BURTON
Figure 6 demonstrates the process of updating empirical fragility curves by conditioning the ground
motion intensity fields to recordings. However, the ‘true’ fragility curve is still unknown. In order to
demonstrate the performance of this procedure, we simulate artificial ‘true’ damage to the Haiti
portfolio. A single ground motion field is selected as the true ground motion field experienced. The
damage is then simulated for this exposure based on a ‘known’ fragility curve. This simulated
damage is then used to fit fragility curves as described earlier and conditioning the ground motion at
two arbitrary locations where the intensity matches that of the ‘true’ ground motion intensity field.
We use a parametric fragility curve of the lognormal CDF form, using the probit GLM model
described in Section 3. This allows for easy comparison of true versus estimated fragility curve
parameters. Because damage is artificially generated, any deviation in the fragility curve parameter
estimated is due to the uncertainty in the IM.
The process described earlier is repeated iteratively, selecting randomly the ‘true’ ground motion
intensity fields, random parameters for the ‘true’ fragility curve, and random pairs of locations for
conditioning the IM. The statistics for 2500 such iterations are described in Figure 7.
As observed in Figure 7, the uncertainty in IM results in uncertainty in the parameter estimates of
the fragility curve from the empirical damage data. In particular, and as described previously, the
uncertainty in IM biases the estimators, which are easily observable from the boxplots of Figure 7.
Conditioning the ground motion intensity fields and weighting the IMs based on their variance
significantly reduce this bias (to near zero bias), as well as the variability in parameter estimate
error. The use of updated median IM and data weighted according to IM variance is easily
replicated for all fragility curve fitting methods presented in this paper.
The methods described in this paper each result in different fragility curves. The MM and least-squared
error methods for fitting the lognormal CDF model are not recommended, because the former merely
matches the central moments of the distribution, and the latter assumes Gaussian error and constant
variance independent of response, which is inconsistent with the data. All other methods presented
have no theoretical reason for disqualification for fitting earthquake fragility. An approach for
selection among these methods is therefore necessary.
where k is the number of parameters in the model, and the likelihood is the probability of observing the
data given the model. The model with the smallest AIC is the best model. Given that all models
considered here have the same number of parameters, an AIC-based selection reduces to select the
model with the highest likelihood. Because all the parametric models proposed use MLE as their
optimization method, the likelihood (and hence, the AIC) is a product of the fitting procedure and is
therefore readily extracted from any software package used for fitting.
However, there may be good reasons to opt instead for a non-parametric model or at least compare
the predictive power of the parametric and non-parametric models. Non-parametric models have the
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
Table II. Advantages and disadvantages of various models for fitting fragility curves to empirical or
analytical data.
Advantages Disadvantages
Lognormal CDF-based Very simple to implement given Parametric functions constrain the
curve fit by method-of- appropriate data. shape of fragility curves.
moments (MM) Can only be used for limited data
types (full scale IDA).
Provides fit to central values of the
sample but not necessarily to the
full distribution.
Lognormal CDF-based Easily implemented for various data SSE method assumes normally
curve fit by sum of types, including empirical data and distributed error with independent
squared error (SSE) analytical data. variance, which can result in bias,
particularly at low IMs.
Parametric functions constrain the
shape of fragility curves.
Lognormal CDF-based Lognormal CDF curves generally Parametric functions constrain the
curve fit by maximum provide good representations of shape of fragility curves.
likelihood estimation earthquake damage fragility.
(MLE) or probit GLM Iteratively re-weighted least squares
and log(IM) algorithm used for MLE enables
non-independent error variance.
Easily implemented for various data
types, including empirical data and
analytical data.
Logistic GLM and log(IM) Commonly used statistical method for Parametric functions constrain the
dichotomous data. shape of fragility curves.
Regression coefficients are estimated
using MLE assuming
non-independent error variance.
Results easy to interpret, as they
provide the increase in odds of the
response (e.g., odds of collapse) due
to a unit increase in input variable
(e.g., log(IM)).
Cumulative link models Commonly used statistical method for Parametric functions constrain the
ordinal data. shape of fragility curves.
Ensure proper ordering among The common effect β shared by
cumulative probabilities (no crossing fragility curves at different
fragility curves) damage states further constrains
Use all data to fit curves to each the flexibility of fragility curves.
damage state, a particular advantage
when there is sparse data for a
particular damage states (typically, the
case for extreme damage or collapse).
Generalized additive Commonly used statistical method for Can be prone to ‘over-fitting’,
models dichotomous data. because the resulting curve has
Allow for flexibility and deviation less strict functional constraints.
from parametric curves. Non-parametric curves are more
Can provide very good fit to data. difficult to generalize and interpret.
Gaussian kernel smoothing Makes no assumption on the Can be prone to ‘over-fitting’,
distribution of the response and is because the resulting curve has no
therefore most true to data functional constraints.
observations. Non-parametric curves are more
Can provide very good fit depending difficult to generalize and interpret.
on the data. May necessitate correction of
localized bias at the boundaries.
IDA, Incremental Dynamic Analysis.
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
D. LALLEMANT, A. KIREMIDJIAN AND H. BURTON
0.07
LogCDF MLE
Logistic Log(IM)
Cloglog Log(IM)
GAM − overfit
GAM − adjusted
GKS − overfit
GKS − adjusted
Figure 8. Cross-validation results for fragility models fit to Haiti data.
advantage of being more flexible. In the case of kernel smoothing, no assumptions are made as to the
functional form of the data, and it is therefore more ‘true to the observed data’.
The use of AIC as criterion for model selection is not always possible for non-parametric models.
The goodness of fit (or likelihood) for the non-parametric models can be arbitrarily good by using
more knots in the GAM model (more splines to characterize the curve) or using a smaller bandwidth
in GKS. However, computing AIC requires a penalization term related to the number of parameters.
For non-parametric models, computing AIC requires estimating the effective number of parameters,
which is often impractical.
Cross-validation can be used for model selection of both parametric and non-parametric models. It
can also be used to select the smoothing parameters in GAM and GKS models, addressing common
issues of over-fitting.
9.2. Cross-validation
Cross-validation is a very common method for estimating expected out-of-sample error. The basic
process is to iteratively fit a model on a training sample and measure error against a separate
validation sample. By splitting the sample iteratively into training and validation sets, the full
distribution of prediction error is obtained.
We first define a loss function. The loss function for error between the true response Y and the
prediction model ^f ðIM Þ is denoted L Y; ^f ðIM Þ . Typical loss functions measure the SSE
2
L Y; ^f ðIM Þ ¼ ∑ Y i ^f ðIM i Þ or the sum of absolute errors L Y; ^f ðIM Þ ¼ ∑
Y i ^f ðIM i Þ
.
i¼1 i¼1
For earthquake damage prediction, the squared error has little meaning, whereas the absolute error
directly propagates to errors in loss prediction, fatality prediction, and so on. Hence, the sum of
absolute error loss function will be used. Depending on the specific study, other loss functions could
be used. For instance, for IDA curves with evenly spaced IMs, a loss function can be developed for
measuring the error in annual rate of collapse, after integration with the hazard curve L Y; ^f ðIM Þ ¼
∑
Y i ^f ðIM i Þ
λðIM i Þ , where λ(IMi) is the rate of IMi. This would have the effect to add more
i¼1
weight to error at the IMs, which most contribute to collapse.
The data are then split into K equal-sized samples (typically 5 or 10 samples). For the kth sample, we
fit the model to the remaining data (all but the kth part) and calculate prediction error on the kth sample.
This is done iteratively for every k sample and the cross-validation error computed as the average
prediction error
1 K
CV err ^f ¼ ∑ L Y; ^f ki ðIM ki Þ (14)
K i¼1
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
where ^f ki ðIM ki Þ is the fragility prediction for IM data in the ki th samples based on the model fit to the
data with the ki th sample removed.
This same approach can be used to select the appropriate smoothing and bandwidth parameters that
minimized cross-validation error, respectively, for GAM and GKS models. The optimal smoothing
parameter λ minimizes the cross-validation error as shown in Equation 15.
λ ¼ arg min L Y; ^f ðIM; λÞ ¼ min ∑
Y i ^f ðIM i ; λÞ
(15)
λ λ i¼1
In this example, the adjusted GKS model displays the least prediction error and is therefore the
‘best’ model because it has the most predictive power. The training error for the overfitted GAM
and GKS models is very low. Indeed, they fit the data very well but have much less predictive
power on new samples, as is shown from their high prediction error. This is the main advantage of
cross-validation, as it provides the out-of-sample error.
Table II provides the general descriptions of the advantages and disadvantages of each method.
Cross-validation is a powerful method for evaluating fragility model performance. The results will
depend on the specific data used, as well as the loss function for quantifying prediction error.
Therefore, the performance of the various models shown in Figure 8 is not absolute, but the process
can be repeated for new data or loss function.
This paper describes various methods for developing earthquake damage to ground motion intensity
relationships. It discusses the commonly used MM and least-squared approaches to fitting lognormal
CDF fragility curves, pointing to some of the fundamental flaws with such methods. Instead, equally
simple parametric models can be fit with GLMs, which rely on MLE, proper assumptions of
binomially distributed data and non-constant error variance. Cumulative link models are used to
develop fragility curves for all damage states simultaneously. Taking advantage of the ordinality of
damage states, they ensure the proper treatment of cumulative distributions, so that no two fragility
curves can cross. They further make use of all damage data for each fragility curves, which are a
significant advantage when there are few observations of some damage states. Finally, semi-
parametric GAM and non-parametric kernel-based regression are described. These provide
significantly more flexibility to fragility models, with the possible danger of overfitting the data.
Therefore, cross-validation is used to select smoothing parameters that minimize prediction error.
Furthermore, cross-validation is an easily implementable and rigorous method for selecting among
competing models based on a user-defined loss criterion.
When developing empirical fragility curves from observed damage data, it is unusual to have actual
ground motion recordings at all sites of interest. More commonly, the intensity measures are inferred
from GMPEs, combined with few recordings. This leads to issues when fitting a regression model with
uncertain independent variable. The paper discusses some of the main challenges with using typical
measurement–error models to address this uncertainty, because GMPE-based IM inferences violate
many of their fundamental assumptions. However, if even just a few ground motion recordings are
available, spatially correlated ground motion intensity fields can be conditioned on such recordings.
Given enough ground motion fields, conditional median and standard deviations of IM can be
computed at each site. Fragility curves can then be fit to updated IM and each data point weighted
inversely to its uncertainty. This simultaneously fits the fragility curve to better predictions and adds
more weight to those predictions with least uncertainty (near recording stations).
ACKNOWLEDGEMENTS
The material presented in this paper was developed in part for the Global Earthquake Model foundation
(GEM). The partial support provided by GEM is gratefully acknowledged. The research was partially sup-
ported by the National Science Foundation Grant NSF CMMI 106756 by the Shah Family Fellowship and
the John A. Blume Fellowship. Discussions with various members of the GEM vulnerability consortium, in
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
D. LALLEMANT, A. KIREMIDJIAN AND H. BURTON
particular Dr. Tiziana Rossetto and Dr. Ioanna Ioannou, were very valuable, and the authors express their
appreciation for their comments and input.
REFERENCE
1. Colombi M, Borzi B, Crowley H, Onida M, Meroni F, Pinho R. Bulletin of Earthquake Engineering (6th edn).
Deriving vulnerability curves using Italian earthquake damage data. Springer: Netherlands, 2008; 485–504. DOI:
10.1007/s10518-008-9073-6
2. Lantada N, Irizarry J, Barbat AH, Goula X, Roca A, Susagna T, Pujades LG. Seismic hazard and risk scenarios
for Barcelona, Spain, using the Risk-UE vulnerability index method. Bulletin of Earthquake Engineering 2009; 8:
201–229. DOI: 10.1007/s10518-009-9148-z
3. Braga F, Dolce M, Liberatore D. A statistical study on Damaged Buildings in the 23.11.1980 earthquake, and an
ensuing review of the MSK76 scale. Proc 7 European Conference on Earthquake Engineering, 1982.
4. Sabetta F, Goretti A, Lucantoni A. Empirical Fragility Curves from Damage Surveys and Estimated Strong Ground
Motion. Proceedings of the 11th European Conference on Earthquake Engineering, 1998.
5. Singhal A, Kiremidjian AS. Method for probabilistic evaluation of seismic structural damage. Journal of Structural
Engineering 1996; 122: 1459–1467. DOI:10.1061/(ASCE)0733-9445(1996)122:12(1459)
6. Rossetto T, Elnashai A. A new analytical procedure for the derivation of displacement-based vulnerability curves for
populations of RC structures. Engineering Structures 2005; 27: 397–409. DOI: 10.1016/j.engstruct.2004.11.002
7. Ibarra LF, Krawinkler H. Global collapse of frame structures under seismic excitations. Blume Center Technical
Report, 2005.
8. Federal Emergency Management Agency (FEMA). HAZUS Earthquake Loss Estimation Methodology. US Federal
Emergency Management Agency, 1999.
9. Jaiswal, KS, Aspinall W, Perkins D. Use of Expert Judgment Elicitation to Estimate Seismic Vulnerability of
Selected Building Types. Proc 15th World Conference on Earthquake Engineering, 2012; 1–10.
10. Applied Technology Council. Earthquake damage evaluation data for California (ATC-13). Applied Technology
Council. Redwood City, CA, 1985.
11. Ministere des Travaux Publics, Transports et Communications, 2013. http://www.mtptc.gouv.ht.
12. Boore DM, Atkinson GM. Ground-motion prediction equations for the average horizontal component of PGA, PGV,
and 5%-damped PSA at spectral periods between 0.01 s and 10.0 s. Earthquake Spectra 2008; 24: 99–138. DOI:
10.1193/1.2830434
13. Hayes G. Finite Fault Model: Updated Result of the Jan 12, 2010 Mw 7.0 Haiti Earthquake. National Earthquake
Information Center (NEIC) of United States Geological Survey, 2014.
14. Vamvatsikos D, Allin Cornell C. Incremental dynamic analysis. Earthquake Engineering & Structural Dynamics
2002; 31: 491–514. DOI: 10.1002/eqe.141
15. Mazzoni S, McKenna F, Scott MH, Fenves GL. Open System for Earthquake Engineering Simulation (OpenSees).
Pacific Earthquake Engineering Research Center 2006.
16. Federal Emergancy Management Agency. Quantification of building seismic performance factors (FEMA P695), 2009.
17. Burton H, Deierlein GG. Simulation of seismic collapse in non-ductile reinforced concrete frame buildings with
masonry infills. Journal of Structural Engineering 2014; 140(8): A4014016.
18. Bradley BA, Dhakal RP. Error estimation of closed-form solution for annual rate of structural collapse. Earthquake
Engineering & Structural Dynamics 2008; 37: 1721–1737. DOI: 10.1002/eqe.833
19. Sarabandi P, Pachakis D, King S. Empirical fragility functions from recent earthquakes. 13th World Conference on
Earthquake Engineering, 2004.
20. Bird JF, Bommer JJ, Bray JD, Sancio R, Spence RJS. Comparing loss estimation with observed damage in a zone
of ground failure: a study of the 1999 Kocaeli Earthquake in Turkey. Bulletin of Earthquake Engineering 2004; 2:
329–360. DOI: 10.1007/s10518-004-3804-0
21. Kennedy RP, Cornell CA, Campbell RD, Kaplan S, Perla HF. Probabilistic seismic safety study of an existing
nuclear power plant. Nuclear Engineering and Design 1980; 59: 315–338. DOI: 10.1016/0029-5493(80)90203-4
22. Kennedy RP, Ravindra MK. Seismic fragilities for nuclear power plant risk studies. Nuclear Engineering and Design
1984; 79: 47–68. DOI: 10.1016/0029-5493(84)90188-2
23. Kircher CA, Aladdin A, Nassar OK, Holmes WT. Development of building damage functions for earthquake loss
estimation. Earthquake Spectra 1997; 13: 663–682. DOI: 10.1193/1.1585974
24. Shinozuka M, Feng MQ, Lee J, Naganuma T. Statistical analysis of fragility curves. Journal of Engineering Mechan-
ics 2000; 126: 1224–1231. DOI: 10.1061/(ASCE)0733-9399(2000)126:12(1224)
25. US Nuclear Regulatory Commission. PRA Procedures Guide: A Guide to the Performance of Probabilistic Risk Assess-
ments for Nuclear Power Plants (NUREG-2300). Office of Nuclear Regulatory Research: Washington, DC, 1983.
26. Porter K, Kennedy R, Bachman R. Creating fragility functions for performance-based earthquake engineering. Earth-
quake Spectra 2007; 23: 471–489. DOI: 10.1193/1.2720892
27. Baker JW. Efficient analytical fragility function fitting using dynamic structural analysis. Earthquake Spectra,
p.141208072728004, 2014.
28. Rota M, Penna A, Strobbia CL. Processing Italian damage data to derive typological fragility curves. Soil Dynamics
and Earthquake Engineering 2008; 28: 933–947. DOI: 10.1016/j.soildyn.2007.10.010
29. Straub D, Der Kiureghian A. Improved seismic fragility modeling from empirical data. Structural Safety 2008; 30:
320–336. DOI: 10.1016/j.strusafe.2007.05.004
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe
STATISTICAL PROCEDURES FOR DAMAGE FRAGILITY CURVES
30. McCullagh P, Nelder JA. Generalized Linear Model (2nd edn). Chapman & Hall/CRC: Boca Raton, Florida, 1989.
ISBN 0-412-31760-5
31. Agresti A. Categorical Data Analysis. Wiley-Interscience: New York, NY, 2002.
32. Basöz NI, Kiremidjian AS. Evaluation of Bridge Damage Data from the Loma Prieta and Northridge, California
Earthquakes. Technical Report MCEER. U.S. Multidisciplinary Center for Earthquake Engineering Research, 1998.
33. O’Rourke MJ, So P. Seismic fragility curves for on-grade steel tanks. Earthquake Spectra 2000; 16: 801–815. DOI:
10.1193/1.1586140
34. Wood SN. On confidence intervals for generalized additive models based on penalized regression splines. Australian
& New Zealand Journal of Statistics 2006; 48: 445–464. DOI: 10.1111/j.1467-842X.2006.00450.x
35. Hastie T, Robert J, Tibshirani J, Jerome J, Friedman H. The Elements of Statistical Learning: Data Mining, Infer-
ence, and Prediction (2nd edn). Springer, 2009. ISBN-13: 978-0387952840
36. Rossetto T, Ioannou I, Grant DN, Maqsood T. GEM guidelines for empirical vulnerability assessment. Global Earth-
quake Model report 2014; 1–108. http://www.nexus.globalquakemodel.org/gem-vulnerability/posts/guidelines-for-
empirical-vulnerability-assessment (Accessed June 2014).
37. Noh HY, Lignos DG, Nair KK. Development of fragility functions as a damage classification/prediction method for
steel moment-resisting frames using a wavelet-based damage sensitive feature. Earthquake Engineering & Structural
Dynamics 2011. DOI: 10.1002/eqe.1151
38. Noh HY, Lallemant D, Kiremidjian A. Development of empirical and analytical fragility functions using kernel
smoothing methods. Earthquake Engineering & Structural Dynamics 2014. DOI: 10.1002/eqe.2505
39. Applied Technology Council. Seismic Performance Assessment of Buildings - Volume 1 - Methodology. Applied
Technology Council: Redwood City, CA, 2012.
40. Singhal A, Kiremidjian AS. Bayesian updating of fragilities with application to RC frames. Journal of Structural
Engineering 1998; 124: 922–929. DOI: 10.1061/(ASCE)0733-9445(1998)124:8(922)
41. Gardoni P, Der Kiureghian A, Mosalam KM. Probabilistic capacity models and fragility estimates for reinforced con-
crete columns based on experimental observations. Journal of Engineering Mechanics 2002; 128: 1024–1038. DOI:
10.1061/(ASCE)0733-9399(2002)128:10(1024)
42. Jaiswal K, Wald D, D’Ayala D. Developing empirical collapse fragility functions for global building types. Earth-
quake Spectra 2011; 27: 775–795. DOI: 10.1193/1.3606398
43. Fuller WA. Error Measurement Models. John Wiley: New York, 1987.
44. Jayaram N, Baker J. Correlation Model for Spatially Distributed Ground-Motion Intensities. Earthquake Engineer-
ing & Structural Dynamics 2009. DOI: 10.1002/eqe.922
45. Akaike H. A new look at the statistical model identification. Automatic Control, IEEE Transactions on IEEE 1974;
19: 716–723. DOI: 10.1109/TAC.1974.1100705
Copyright © 2015 John Wiley & Sons, Ltd. Earthquake Engng Struct. Dyn. (2015)
DOI: 10.1002/eqe